Econometrics Department of Economics Fall 201

Econ 301.02
Econometrics
Bilkent University
Department of Economics
Taskin
Fall 201
Study questions:
1) Consider the following models:
Yi  1   2 X 2i  ui
Yi*  1   2 X i*  ui*
where Yi  w1Yi and X i  w2 X i where
*
*
w1 and w2 are given constants.
a) Establish the relation between two sets of coefficients, ie.
b) How do the residuals of these two models compare?
c) How will the Var(b̂2 ) and Var(â 2 ) be different?
d) Will the t-stat be different between the models?
e) How will your answers change to the above questions if
f)
â 's and b̂ 's .
w1 = w2 ?
If instead, only the explanatory variable is scaled such as X i  w2 X i and there is no change
*
in the dependent variable
Yi , and you estimate the following regression
Yi = l1 + l2 Xi* + vi , how will the relation between  ' s and  ' s be?
g) What can you say about the t-stats and R2’s of the two models?
2)
Suppose that you want to estimate a regression equation to explain wage levels, wagei ; the
possible list of explanatory variables are as follows:
educi education level,
experi experience at work,
sexi a dummy variable which takes the value of 1 for male workers.
You have a sample of size 50.
a)
Write two different models that such that in comparing the success of the estimation
i. R2 can be used.
ii. R2 can not be used.
b) Write two different models that such that in comparing the success of the estimation
iii. Adjusted R2 can be used.
iv. Adjusted R2 can not be used.
3) For a sample of 570 respondents from the U.S. National Longitudinal Survey of Youth, a
researcher has data on Y, hourly earnings in 1994, measured in dollars, S, years of schooling,
measured as highest grade completed. Furthermore other education indicators, defined as dummy
variables, are constructed for the highest educational qualification obtained: (i) EDUCDO as no
qualification (high school drop-out); (ii) EDUCAA as high school diploma in associate of arts
degree (awarded by two-year colleges); (iii) EDUCBA as for bachelor of arts degree (awarded by
four-year colleges). He regresses the logarithm of Y on:
Model A: S alone
Model B: EDUCDO, EDUCAA, and EDUCBA
Model C: S, EDUCDO, EDUCAA, and EDUCBA
The results are presented in the table below.
MODEL B
MODEL C
-
EDUCDO
MODEL
A
0.079
(0.008)
-
-0.173
(0.075)
0.040
(0.019)
-0.055
(0.094)
EDUCAA
-
0.129
(0.074)
0.065
(0.080)
EDUCBA
-
0.420
(0.047)
0.246
(0.095)
Constant
1.359
(0.113)
2.321
(0.027)
1.824
(0.236)
R2
0.141
0.145
0.152
SSR
132.12
131.48
130.44
S
Standard errors in brackets, RSS = Residual Sum of Squares
a) Discuss whether it is possible to give an interpretation of the constant in model A.
b) Provide an interpretation of the coefficients of the dummy variables in model B.
c) Discuss whether it is possible to give an interpretation of the constant in model B.
d) Perform a test of the joint explanatory power of the dummy variables in model C,
explaining how the result of the test should be interpreted.
e) At a seminar someone says that the researcher ought to have used drop-outs as the omitted
category because they were the lowest educational category. How should the researcher reply
to this?
f) At the seminar the researcher says that the coefficients of EDUCAA and EDUCBA were
lower for males than for females when he fitted model B for males and females separately. He
has not tested whether they are significantly different however. Explain how you would
conduct such an exercise, writing your model as well.
4)
An economics department examines the starting salaries of its graduates. The following
regression equation is estimated where the dependent variable is starting salary ( SALARYt ).
The explanatory variables are Grade Point Average ( GPAt ), a dummy variable to indicate
sex ( SEX t
 1 if the student is female) and a dummy variable to indicate whether the
student took econometrics ( METRICS t
 1 if the student took an econometrics course). He
uses a sample of 50 students to estimate the following equation:
EQ1
SALARYt  1   2 GPAt   3 SEX t   4 METRICS t   5GPA * METRICS  et
============================================================
LS // Dependent Variable is SALARY
Date: 12/29/02
Time: 20:18
Sample: 1 50
Included observations: 50
============================================================
Variable
Coefficien
Std. Error
t-Statistic Prob.
============================================================
C
23371.83
1213.678
19.25703
0.0000
GPA
1964.180
403.4869
4.868015
0.0000
SEX
-312.2537
419.2776
-0.744742
0.4603
METRICS
8722.809
2440.863
3.573657
0.0009
GPA*METRICS
-1301.692
843.9118
-1.54245
0.1300
============================================================
R-squared
0.751379
Mean dependent var
30430.92
Adjusted R-squared
0.729280
S.D. dependent var
2739.002
S.E. of regression
1425.124
Akaike info criter
14.61867
Sum squared resid
91394065
Schwarz criterion
14.80987
Log likelihood
-431.4136
F-statistic
33.99965
Durbin-Watson stat
1.726050
Prob(F-statistic)
0.000000
a)
Interpret the estimated coefficients, the magnitudes and their signs for all variables other than
the constant.
b) Indicate which coefficients are statistically significant; ie you can reject the H 0 : i  0 .
c) Compute the 95% interval estimate for  2 .
d) Test the hypothesis that women have equal starting salaries as men do. Formulate the null and
the alternative hypothesis and test.
e) Test the hypothesis that the effect of GPAt on SALARYt is larger for students who took
METRICS t .
f)
Test the overall significance of the model. State the null and the alternative hypothesisand
conduct a formal test.
g) The following equation which also has the same dependent variable SALARYt is estimated.
Its statistical equation is:
SALARYt  1   2 GPAt   3 SEX t  et
State the restrictions imposed on the coefficients of the first equation (EQ1) to obtain this
equation. What is the economic interpretation of this restriction? Write the null and the
alternative hypothesis.Formally test this restriction.
============================================================
LS // Dependent Variable is SALARY
Date: 12/29/02
Time: 20:28
Sample: 1 50
Included observations: 50
============================================================
Variable
Coefficien
Std. Error
t-Statistic
============================================================
C
27270.75
1976.365
13.79844
GPA
1131.429
661.7033
1.709874
SEX
-399.7410
785.5255
-0.508884
============================================================
R-squared
0.060953
Mean dependent var
Adjusted R-squared
0.020993
S.D. dependent var
S.E. of regression
2710.099
Akaike info criter
Sum squared resid
3.45E+08
Schwarz criterion
Log likelihood
-464.6371
F-statistic
Durbin-Watson stat
0.607523
Prob(F-statistic)
Prob.
0.0000
0.0939
0.6132
30430.92
2739.002
15.86761
15.98233
1.525364
0.228116
============================================================
5.
The following cost function for electricity generation is estimated:




Y  AX P1 1 P2 2 P3 3 e u
where Y is the total cost of production,
X is the output in kilowatt hrs
P1 is the price of labor input
P2 is the price of capital input
P3 is the price of fuel
u is the disturbance term.
Using a sample of 29 firms the following estimation results is obtained:
ln̂ Yı  4.93  0.94 ln X i  0.31ln P1i  0.26 ln P2i  0.44 ln P3i
se  (1.96) (0.11)
(0.23)
(0.29)
(0.07)
RSS  0.336
a) What are the expected signs for the coefficients, give one sentence reason?
b) What are the economic meaning of  1 ,  2 and  3 ?
c) Test whether the estimated coefficients are statistically significant.
d) Is the elasticity of total cost to output equal to one? Conduct a formal test of the hypothesis.
e) If you want to test whether the restriction (1   2   3 )  1 is valid which equation will
f)
you estimate? What is the economic meaning of this restriction?
If you were given the following estimation result, what will be your conclusion about the
above restriction?
ln̂(Yı / P3i )  6.55  0.91ln X i  0.51ln( P1i / P3i )  0.09 ln( P2i / P3i )
SSR = 0.364
se 
6)
(0.16)
(0.11)
Consider the model which explains output,
(0.19)
(0.16)
yt as a function of labor, lt and capital, k t inputs:
yt  1   2 lt   3 k t  et .
Suppose that least square estimation on 25 observations on these variables yield the following
results:
yˆt  0.415  1.194lt  0.217kt
s 2  0.0076818
R 2  0.9451
 0.0232  0.0354 0.0124 
ˆ
Vaˆr (  )   0.0354 0.0692  0.0294
 0.0124  0.0294 0.0140 
a.
Find the 95% interval estimate for
2 .
b.
Use a t-test to test the hypothesis
H1 :  2  1 .
H 0 :  2  1 against the alternative hypothesis
H 0 :  2   3  1 against the alternative of not equal to one.
c.
Test the hypothesis that
d.
e.
Find the total variation, unexplained variation and explained variation for this model.
Test the hypothesis that H 0 :  2   3  0 against the alternative of not equal to zero.
7)
Suppose for explaining the response of variable
variables, namely
Yi we have a set of four explanatory
X 2 , X 3 , X 4 and X 5 . The following two models are considered estimated
with 20 observations:
Model 1:
Yi = b1 + b2 X 2i + b3 X 3i + b4 X 4i + b5 X 5i + ui ;
Model 2:
Yi = b1 + b2 X 2i + b3 X 3i + u'i :
When the models are estimated the following results are obtained:
Yi =14 - 0.642X 2i + 0.396X 3i
;
R = 0.837, s = 3.072
Yi =14.6 - 0.611X 2i + 0.439X 3i - 0.08X 4i - 0.064X 5i ;
2
R2 = 0.845, s = 3.190
2
a) Discuss briefly why we can not use R to compare the two models.
b) Compute the SSR, (Sum of Squared Residuals) for both models and SST (sum of squared
total).
2
2
c) Compute R , (adjusted R ) for both models. Use this to decide which model is better.
d) Use of formal test, ie a F –test to check whether the variables X 4 and X 5 contribute
sufficiently in model 1.
8)
In the regression model
yt = b1 + b2 xt + ut
a.
b.
c.
Equation 1
Write and explain the GAUSS MARKOV Theorem.
What are the necessary assumptions, regarding the error term, for this theorem to be true?
Write and explain in your own words each of these assumptions.
If you learn the following information about the error term ut
ut = a1 + a2 zt + et ,
where
z t is a fixed (nonrandom) economic variable,  ’s are the coefficients of this
e t is a disturbance term with mean zero and constant variance, i.e. E(et ) = 0,
2
and Var(et ) = s e , what can you say about the properties of the error term ut ?
equation, and
d.
9)
Given the information in (c) does the Gauss Markov Theorem hold for the parameters
estimated in Equation 1.
In the statistical model:
yt = b1 + b2 xt + ut
e t is the random error with ~ N (0,  2 ) , yt is the dependent variable and xt is the
explanatory variable which is fixed (nonstochastic).
The Ordinary least square estimator of  2 is given by the following formula:
b2 =
å(x - x )( y - y)
å(x - x )
t
t
2
t
a.
where there are T observations with
What is the condition of unbiasedness.
b.
Show that
c.
Proof that
t  1.......T
b 2 is an unbiased estimator of  2 .
b 2 , is best linear unbiased estimator, i.e. it has the minimum variance among all
the linear unbiased estimators.
10 .
Short-answer questions:
a.
For sample with each observation described by the following equation
and a population regression equation,
equation,
Yt  1   2 X t   t
E (Yt )  1   2 X t , and the sample regression
Yt  1  2 X t , describe and show graphically, the concepts of Total Sum of
Squares, (SST), Explained Sum of Squares (SSE) and Residual Sum of Squares (SSR) and R 2.
b.
In the regression
log(Yi ) = a + b log(X i ) + ui , how will you define the elasticity of Y with
respect to X.
Given the following regression equations which can be compared using R2 statistics to choose
the best regression and why? Explain.
i. Yt  1   2 X t   t estimated with 50 observations
c.
ii.
Yt  1  2 X t   Zi   t estimated with 25 observations.
iii.
log(Yt )  1  2 log( X t )   t estimated with 50 observations
iv.
Yt  1   2 X t   (1/ Zi )   t estimated with 25 observations.