Problem Set #1

Econometrics 2, Fall 2004
University of Copenhagen, Institute of Economics
Hans Christian Kongsted and Heino Bohn Nielsen
Problem Set #1
The first exercise in the problem set replicates a number of basic results associated with
the linear regression model using matrix notation. If you find the questions very difficult,
you might want to have a look at appendix D and E in Wooldridge (2003). Also, you
could have a look at Chapter 2 in Verbeek (2004) for this exercise.
The second exercise introduces the software package GiveWin/PcGive, which will be
used throughout the course. The introduction includes import and transformation of data,
construction of graphs, as well as formulation and estimation of linear regression models.
For the introduction, we use a cross-sectional data set for Belgian individual wages also
considered in Verbeek (2004, Chapter 3).
#1.1
The Linear Regression Model
Consider the linear regression model given by
yi = x0i β + i ,
i = 1, ..., N,
where xi = (1, xi2 , xi3 , ..., xiK )0 and β = (β 1 , ..., β K )0 are K × 1 vectors, and
error term. Assume that
E [ i | x1 , ..., xN ] = 0
(1.1)
i
is a scalar
(1.2)
2
V [ | x1 , ..., xN ] = σ IN .
(1.3)
(1) Write the model (1.1) in matrix notation by defining the N ×1 vectors y = (y1 , ..., yN )0
and = ( 1 , ..., N )0 as well as the N × K matrix X = (x1 , ..., xN )0 .
(2) Consider the K moment conditions E [xi (yi − x0i β)] = 0 underlying OLS and the
corresponding conditions for the sample moments
N
1 X
b = 0.
xi (yi − x0i β)
N
i=1
1
(1.4)
(3)
(4)
(5)
(6)
(7)
b of β.
Write the conditions (1.4) in matrix notation and find the OLS estimator β
b
State the implicit assumptions necessary for solving the equations for β.
b is unbiased, i.e. that E[β
b | X] = β. Which
Show that the OLS estimator β
assumption is important for this property?
b can be written as e = M y, where
Show that the estimated residuals, e = y − X β,
M = IN − X(X 0 X)−1 X 0 .
Show that M is symmetric, i.e. M 0 = M .
Show that M is idempotent, i.e. M M = M , see Wooldridge(2003, p.782).
Calculate trace(M ).
[Hint: Recall that the trace of a N × N matrix is the sum of the diagonal elements.
Also remember, that trace(ABC) = trace(CAB) = trace(BCA), i.e. that you are
allowed to rotate the factors counter-clockwise, see Wooldridge(2003, p.781). To find
trace(M ) you can use that M = IN − P and find the trace of P = X(X 0 X)−1 X 0 ].
Show that the following relation holds between the estimated residuals and the error
terms: e = M .
Show that
(1.5)
E[e0 e] = σ 2 trace(M ).
[Hint: Use the fact that trace(a) = a if a is a scalar, and exploit the rules for
calculation with the trace operator.]
(8) Use the result in (1.5) to show that
s2 =
e0 e
N −K
is an unbiased estimator of σ 2 .
b | X]. How can this quantity be
(9) Derive the variance of the OLS estimator, V [β
estimated?
#1.2
Introduction to GiveWin/PcGive
To introduce the software package GiveWin and the module PcGive we consider the Belgian
wage data also analyzed in Verbeek (2004). The data set includes observations for 1472
individuals for the year 1994 and consists of the following four variables:
WAGE
MALE
EDUC
EXPER
Hourly wage rate in Euro.
Dummy variable for Gender.
Education from 1 to 5.
Years of schooling.
see Verbeek (2004, p.68) for more explanations.
2
(1) Download the data from the home page (go to www.econ.ku.dk/metrics/ and follow
the link to Econometrics 2). The data in GiveWin format consists of two files:
bwages.In7 and bwages.Bn7. The .In7 file identifies the variables in the binary
.Bn7 file.
Start GiveWin and load the data set. Have a look at the data to see how the database
is organized.
(2) Choose [Graphics] in the [Tools] menu and construct cross-plots between the
wage and each of the individual characteristics. Do the graphs seem to indicate
strong correlations?
(3) Start the module PcGive in GiveWin. Choose [Descriptive Statistics] in the
[Packages] menu. Then choose [Formulate] in the [Models] menu and select all
variables. Calculate the means, standard deviations and correlations between the
variables. Briefly describe the data.
(4) Consider the following simple regression model
WAGEi = β 1 + β 2 · MALEi + β 3 · EDUCi + β 4 · EXPERi + i ,
(1.6)
for i = 1, ..., 1472.
Do you think it is reasonable to assume that i , i = 1, ..., 1472, are independently
and identically distributed (iid)?
What is your a priori expectation to the sign of the coefficients?
(5) Choose [Econometric Modelling] in the [Packages] menu and [Cross Section
Regression] in the [Model] menu in PcGive. Then choose [Formulate] in the
[Model] menu, set up model (1.6) and estimate the parameters using OLS.
Interpret the results.
(6) Choose [Graphic Analysis] in the [Test] menu and try the different possibilities.
Also try the [Test Summary] in the [Test] menu. Focus on the test for heteroscedasticity.
Do you think the model looks satisfactory?
(7) Now consider the alternative log-formulation
e1 + β
e2 · MALEi + β
e3 · EDUCi + β
e4 · EXPERi + ei ,
log(WAGEi ) = β
(1.7)
where log denotes the natural logarithm. To make the transformation choose the
[Algebra Editor] in the [Tools] menu and load the algebra file bwages.alg located in the home page. Run the algebra file. Take a look at the database. What
variables have been constructed? Note that the file name is now followed by an *.
What does it mean?
[Hint: Note that the algebra editor is case sensitive. One way to work with data
files in GiveWin is to use a data file for the original data and save an algebra file with
transformations. Each time you open the data file you should run the associated
algebra file to apply the transformations to the data base. Alternatively, you can
save your data including the transformed variables. If you work with the data for
3
longer periods, however, the number of variables will normally grow and it is often
difficult to remember how all the transformed variables were defined].
(8) Estimate the model (1.7) and look at the outcome and test statistics. Has the model
improved?
(9) Try to augment the model (1.7) with interaction terms. For example, you could see
if years of experience have different impacts for men and women by including the
term EXPER · MALE. To do this you have to create the new terms in the algebra
editor. For more inspiration, have a look in Section 3.5 in Verbeek (2004).
4