Lecture 1

Linear panel data models
Rob Alessie
email: [email protected]
web page: http://members.chello.nl/~r.j.m.alessie
Literature (panel data):
1.
Verbeek (2004), A guide to modern econometrics, Wiley
Chapters (2, 4), 5, (6), 10
2.
Wooldridge, J.M. (2002), Econometric analysis of cross-section
and panel data, MIT press, ISBN 0-262-23219-7
3.
Chapters 1 until 3 of Hsiao (1986) (cf. the reader)
4.
some articles
Slides downloadable from my web page (see above)
1
Lecture 1: Static linear panel data models
Hsiao, chapters 1 until 3; Verbeek (2004), chapter 10 (10.1; 10.2)
1) Prerequisites for the course: OLS and GLS
2) Panels, pseudo panels (time series of cross-sections)
3) Issues involved in utilizing panel data
Heterogeneity bias
strict exogeneity assumption
Selectivity bias
Attrition bias
4) Fixed-effects model: formulation and estimation
5) Random effects model: formulation and estimation
6) Fixed effects or random effects
7) Mundlak’s formulation
2
Prerequisites: OLS
Consider the following population model
(1)
where
:= dependent variable (regressand)
:= (kx1)-vector of explanatory variables (incl. constant
term)
:= disturbance term (error term)
,
and
are random (vector of) variables
In many (old fashioned) econometric textbooks it has been
assumed that is ‘fixed’ (and not a vector of random variables).
Only in controlled experiments the assumption of ‘fixed’
regressors is reasonable. Controlled experiments barely possible in
social sciences
Stacking the N observations gives:
(1'
)
where
(Nx1)-vectors
: (NxK)-matrix (assumed to be of full rank)
3
Gauss-Markov assumptions: (numbering of assumptions, see Verbeek)
A1)
A2)
A3)
:
all explanatory variables strictly exogenous
:
variance of the error term constant over the
observations
A4)
:
and finite (homoskedasticity)
error terms of the different observations are
stochastically independent (no
autocorrelation, no heteroskedasticity)
Comments on assumption A2: strict exogeneity
1)
{
} and {
meaning for instance that
} are uncorrelated ,
!
A7) The assumption of weak exogeneity (contemporaneous
exogeneity)
is clearly weaker then strict exogeneity.
•
In cross-section data derived from a random sample the difference
between the two assumptions (A2 and A7) not relevant.
•
However, this is not true for time series and panel datasets (see
below)
•
The assumption of weak exogeneity (A7) (and obviously strict
exogeneity) implies that
. In that case, causal
inference possible
4
Estimator: rule for using the data to estimate
OLS- estimator
:
Properties of the OLS estimator
given Gauss-Markov
assumptions
1) Estimator
is unbiased:
2) Variance-covariance matrix of
3) OLS estimator
:
is the best linear unbiased estimator (BLUE):
Let
a linear unbiased estimator, where A is a (KxN)-matrix
BLUE:
difference between cov-matrices of
and
positive semi-definite
Unbiased estimator
for
(see Johnston et al):
,
where
Given
= (Nx1)-vector of residuals
,
can be estimated by
Goodness of fit measure: R2 (see Johnston and Dinardo and Hey et al.)
5
.
•
For exact statistical inference, the following distributional
assumption is normally made:
A5)
Given assumption A5 (instead of A1, A3 and A4), one can derive
t-tests and F tests on the (joint) significance of coefficients (See
Johnston and Dinardo)
6
Asymptotic properties of the OLS estimator
A. Consistency
We know that given Gauss-Markov assumptions A1),...,A4):
1)
2)
However, distribution of
not known (unless we assume A5)
Chebycheff inequality:
for OLS estimator b this implies that each k-th element satisfies:
where
If N
ckk= (k,k)-element in
,
becomes large and ,consequently, ckk small.
If we assume that
A6)
or
, then OLS estimator b is consistent:
.
7
Slutsky’s theorem on probability limits:
If
and g(.) is a continuous function, it also holds that
•
OLS consistent under weaker condition that A1,..A4 and A6:
Given assumption (A6), Slutsky’s theorem and the law of large
numbers:
This implies OLS estimator b is consistent if
•
Notice that assumption A7 (
) implies
In other words, one only needs the weaker assumptions A6 and A7 in
order to prove the consistency of the OLS estimator instead of the
Gauss-Markov assumptions A1,..A4.
8
b. Asymptotic normality
Under the Gauss-Markov assumptions A1,...,A4 and assumption A6, one
can show that
In practice, where we have finite samples, we approximate distribution
of b as follows:
(*)
Since
can be consistently estimated by
, this approximate
distribution can be estimated as:
Assumption A2 can be relaxed as follows without affecting the validity
of (*):
A8)
and
are independent
This relaxation is relevant in dynamic regression models.
Contrary to assumption A2, assumption A8 does not rule out
dependence between and
.
Notice that assumption A8 implies assumption A7 (needed to establish
consistency)
9
When is weak exogeneity assumption violated?
Omitted variable bias
example wage regression
(2)
•
The error term
•
The assumption of weak exogeneity is:
•
However, it is likely that ‘ability’ is positively correlated with both
the wage rate (
) and years of education (
), implying
that
e.g. contains the unobserved variable ‘ability’.
(exogeneity assumption violated).
Consequently, OLS estimates of
•
and
biased.
One of the main motivations of using panel data is to address the
omitted variable bias problem!!! (see below)
Measurement error in the RHS-variables
attenuation bias (bias towards 0), see Verbeek, chapter 5)
Simultaneity
See Verbeek, chapter 5 (Keynesian model)
10
Violation of the strict exogeneity assumption while assumption weak
exogeneity is satisfied
Example 1: dynamic model
(3)
•
We assume that
is a white noise error term (no autocorrelation)
and contemporaneous exogeneity:
(assumption A7
satisfied, OLS yields consistent estimates)
is however not a strict exogenous variable. Why not?
1.
Assumption strict exogeneity implies that
only with
2.
but also with, e.g.,
According to model (3)
words,
is uncorrelated not
and
are related to each other. In other
implying that
is not a strict exogenous
variable.
Example 2: models with a feedback mechanism
(4)
:= GDP-growth rate
:= interest rate:
assumed to be contemporaneous
exogenous.
Behavior Alan Greenspan (feedback mechanism):
(5)
Equation (9) implies that
depends on
consequently, om
and,
. In other words
NOT a strict exogenous variabele in model 4!
11
os
GLS (Generalized Least Squares):
(1'
)
Gauss-Markov A1 and A2 are still assumed to hold.
However, assumptions A3 and A4 are replaced by:
:
is a positive (semi-)definite matrix not
necessarily diagonal (autocorrelation, heteroskedasticity)
OLS estimator b is still unbiased but the covariance-matrix becomes:
Consequently, OLS estimator is not BLUE.
Moreover, the routinely computed expression
is wrong
Inference (t-tests, F-tests) based on this expression is also misleading.
Strategy if
known: rewrite model (1'
) in such a way that GaussMarkov conditions hold:
(1'
'
)
where the square matrix P satisfies:
Notice that
known
, so apply OLS on transformed model (1'
’)
produces BLUE-estimator for
. This is the GLS-estimator
12
:
If
is not known, first obtain a consistent estimate
Then the Feasible GLS (FGLS) estimator
Obviously
13
for
for
.
becomes:
Advantages of (pseudo-)panel data in comparison with a crosssection
True panel data:
same individuals (households, companies) are
followed over time.
Examples: PSID 1968-....; HRS 1992-...; SEP 1984-....; IPO 1989-...
Pseudo-panel data:
1) ‘year-of-birth’ cohorts are followed over time
2) constructed from a time series of cross-sections by computing
summary statistics for individuals from the same year-of-birth
cohort.
3) British Family Expenditure Survey (FES) 1966-..
Figure 1: Home ownership rate by age in 1996 (Netherlands)
70
home ownership rate
60
50
40
30
20
10
20
30
40
50
age
60
70
80
90
Example: Do households sell the house when they become old?
•
Figure 1 no answer to this question: from one cross-section one
cannot disentangle cohort from age effects.
14
. .
. .
.
.
. .
.
. . ..
. . .
.
.
.. .
.
.
. . .43. .
. .
.
. .
53
.
.
.
.48
.38.
.
.
.
.
..
. .
. .
.
. .
. . .
33
58
.
. .
.
. .
. .
.
. . .
.
28. . .
. . .. .
23
.681542
.
.
63
18. .
. . .
. .
. . .
.13. . .. . . .
.
.
. .
..
68
.152975
20
25
30
35
40
45
50
55
age
60
65
70
75
80
85
90
Home ownership rate by age and cohort
Figure 5a
•
•
One needs pseudo-panel data or true panel data to construct figure
above
Figure above suggests strong cohort effects! Cross-section
evidence very misleading!
15
Advantages true panel in comparison with (a time series of) crosssections
•
Estimation of dynamic models (or transition models) virtually
impossible in case of time series of cross-sections
example: cross-section study suggests that female labor force
participation of woman is equal to 50%
2 extreme possibilities:
a) 50% always work (job turnover rate 0%)
b) in a homogeneous population 50% turnover rate
Using panel data one can make a distinction between spurious state
dependence (unobserved heterogeneity) versus true state
dependence.
•
It (partly) solves the problem the ‘omitted unobserved variable
bias’ problem (the effect of ‘ability’ on earnings (dependent
variable) and education (rhs-variable)
16
Issues involved in utilizing panel data
Heterogeneity bias
•
Suppose the following ‘true model’:
(3)
where
•
•
:= individual specific effect (random variable)
captures all variables which are not observed by the
econometrician, e.g. motivation (unobserved heterogeneity)
It is perfectly possible that
(e.g. in a wage
regression ‘motivation’ (subsumed in
) might be correlated with
the rhs variable ‘experience’)
•
Suppose that instead one estimates the following model by OLS:
(4)
where
•
.
In model (4) one basically assumes that
(5)
•
•
Violation of this assumption leads to biased estimate of , cf fig
1.1 till 1.3 (omitted variable bias).
primary motivation for using panel data: solution of the omitted
variables problem
17
The assumption of strict exogeneity
Consider again the following ‘true model’:
(6)
where
•
:= individual specific effect (random variable)
In most estimation procedures, one assumes that
is strictly
exogenous (conditional on the unobserved individual effect):
(7)
•
1.
Again examples of the violation of the strict exogeneity assumption
cf. The Alan Greenspan example)
Program evaluation (e.g. effect of job training)
(8)
question: is
an exogenous variable?
•
•
Evaluation panel datasets collected at two points in time.
time varying intercept (capturing macro-shocks)
•
at
•
control group:
•
in standard fixed effect estimation procedures allows for the
possibility that
(program participation might
•
,
for all
.
; treatment group:
depend on ability.
However a feedback mechanism may be going on:
(9)
•
equation (9) implies that
other. In other words,
•
and
are related with each
is not a strict exogenous variable!!!!
standard panel data estimators (fixed effect estimators, dif-in-dif
not applicable)!!!
18
Example 2: dynamic regression model
(9)
•
•
Model (9) addresses the question: how persistent are wages after
controling for unobserved heterogeneity?
we assume that
is a contemporaneous
•
exogenous variable.
Obviously, lagged wage is logically correlated with
•
Moreover,
is correlated with
meaning that
.
is
NOT a strict exogenous variable:
(10)
19
Sample selectivity bias (attrition bias)
example: New Jersey negative income experience
households with an earnings > 1.5 poverty level dropped
from the sample.
Assume that in population the following relation between
earnings and the exogenous variables holds:
Only observations
included in sample OLS estimate
inconsistent (see Hsiao, figure 1.6)
Attrition bias: individuals drop out of the panel in a non-random way.
How to address attrition bias:
1.
Selection bias models (cf. Heckman (1979), Hausman and Wise
(1979), chapter 17 Wooldridge (2002)
2.
Propensity score method (cf. e.g. Hirano, Imbens, Ridder and
Rubin (2001). Refreshment samples required!
20
Static linear panel data models
Hsiao, chapters 1-3; Verbeek (2004), chapter 10 (10.1; 10.2)
1. Static linear models: a classification
2. Fixed-effects model: formulation and estimation
3. Random effects model: formulation and estimation
4. Fixed effects or random effects?
5. Mundlak’s formulation; Hausman test
6. IV and GMM methods (chapter 5 Verbeek!!)
7. Alternative IV estimator of static panel data models: Hausman Taylor
approach.
1
Static linear models: a classification
Consider the following equation
yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T
(1)
where
- ci := unobserved individual effect
- xit := (Kx1)-vector of exogenous time varying regressors assumed to be contemporaneously exogenous: E(uit | xit , ci ) = 0.
In order to estimate model (1), one has to make assumptions about the following:
• Is there a correlation between ci and the rhs vars xit ?:
E(ci | xi ) = 0 or E(ci | xi ) 6= 0
(2)
where xi = (x0i1 , . . . , x0iT )0 .
• Is xit strictly exogenous (conditional on the unobserved individual effect)?
E(uit | xi1 , xi2 , ..., xiT , ci ) = 0
(e.g. no lagged dependent variables, no feedbacks!!!!)
2
(3)
Table 1: estimation methods under different assumptions on strict exogeneity
and on correlation between individual effect and rhs vars.
E(ci | xi ) 6= 0
E(ci | xi ) = 0
some (not all)
rhs vars corr
with ci
all
xit 1) within estimation RE estimation Hausmanprocedure; 2) first dif- (GLS)
Taylor
strictly
exogenous ferencing; 3) Mundlak
procedure
some
xit IV(GMM)
1) Pooled OLS IV(GMM)
(no
lagged
not strictly
exogenous
dependent var);
2) IV (GMM)
(lagged dep var
included)
3
Static Fixed-effect models with strict exogenous regressors
Balanced panel
yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T
(4)
where
- ci := unobserved individual effect. Model (4) allows for correlation between ci
and xi (x0i1 , . . . , x0iT )0 : E(ci | xi ) 6= 0
- xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly
exogenous (conditional on the unobserved individual effect):
E(uit | xi1 , xi2 , ..., xiT , ci ) = 0
(5)
(e.g. no lagged dependent variables, no feedbacks!!!!)
- uit := error term
Remarks
• It (partly) solves the problem the omitted unobserved variable bias problem
(the effect of ability on earnings (dependent variable) and education (rhsvariable)
• The parameter vector β (cf. equation (4) can be estimated consistently in
two ways:
1. the within estimation procedure
2. take the first difference of equation (4)
∆yit = ∆x0it β + ∆uit , t = 2, . . . , T
4
(6)
The within estimation procedure
Computation of β̂within requires the following steps:
• First average equation (4) over t=1,...,T to get the following cross-section
equation:
ȳi = x̄i β + ci + ūi , i = 1, ..., N
(7)
P
P
P
where ȳi = T −1 Tt=1 yit ; x̄i = T −1 Tt=1 xit ; ūi = T −1 Tt=1 uit
• Subtraction of equation (7) from (4) gives:
ÿit = ẍ0it β + üit , i = 1, ..., N, t = 1, ..., T
(8)
where ÿit = yit − y i ;ẍit = xit − xi ; üit = uit − ūi ;
• Like first differencing, ’time demeaning’ also removes the individual effect
ci (cf. eq. (8)).
• The within estimator β̂within can be obtained by applying OLS on eq. (8))
• If one assumes that uit is a white noise error term, i.e.:
1. Euit = 0;var(uit ) = σu2 (homoskedasticity);
2. uit independent over time and across individuals
then one can show that
Ã
[ β̂within ) = σ̂u2
Avar(
N X
T
X
!−1
ẍit ẍ0it
(9)
i=1 t=1
PN PT
σ̂u2
=
i=1
2
t=1 ûit
N (T − 1) − K
where ûit = ÿit − ẍit β̂within .
• In other words, running OLS on equation (8) with a standard statistical
package gives almost the correct standard errors. However, pay careful
attention to the denominator of (15): N(T-1)-K instead of NT-K (as used
by OLS packages).
• The Stata procedures xtreg (and areg) computes the within estimation
procedure and calculates standard errors correctly.
5
• Suppose that there is autocorrelation and heteroskedasticity in uit . In that
case, formula (9) is incorrect. One has to compute robust Newey-West
standard errors:
[ β̂within ) = Â−1 B̂ Â−1
(10)
Avar(
³P P
´
PN PT PT
N
T
0
0
where  =
ẍ
ẍ
;
B̂
=
i=1
t=1 it it
i=1
t=1
s=1 ûis ûit ẍis ẍit
• Stata procedure xtreg computes robust standard errors if one provides the
cluster option
• Estimator for ci : ĉi = ȳi − X̄i0 β̂within
Remarks
1. The parameter vector β is identified due to time variation (’within variation’) in Xit
2. Variables which are constant over time (e.g. gender, year of birth) cannot
be included in a fixed effects regression. Their parameters are not identified
(subsumed in the fixed effect).
3. Estimator ĉi and β̂within consistent if T (and N) large
4. If T is small and N large,β̂within still consistent; ĉi , however, not because it
is based on a small no. of obs (T).
5. Why is β̂within a consistent estimator? Consistency of requires the following
assumption (compare assumption A7) needed to establish consistency of
OLS estimator):
E ẍ0it üit = E(xit − x̄i )0 (uit − ūi ) = 0
(11)
Due to the assumption of strict exogeneity, condition (11) is satisfied.
6. Notice that the assumption of contemporaneous exogeneity is too weak to
prove consistency of the fixed effects estimator because it does not exclude
correlation between uit and x̄i
6
Estimation by taking first differences
One can estimate consistently the β-parameters of model (4) by a) taking first
differences:
(12)
∆yit = ∆Xit0 β + ∆uit , t = 2, . . . , T
and estimate β by means of OLS. Denote the resulting estimator as β̂f dif
Remarks
• By taking first differences, the individual effect ci is swept out of the model.
• Why is β̂f dif a consistent estimator? Consistency of requires the following
assumption (compare assumption A7) needed to establish consistency of
OLS estimator):
E∆x0it ∆uit = E(xit − xit−1 )0 (uit − uit−1 ) = 0
(13)
Due to the assumption of strict exogeneity, condition (13) is satisfied.
• Notice that the assumption of contemporaneous exogeneity is too weak
to prove consistency of the estimator β̂f dif because it does not exclude
correlation between uit−1 and xit .
• If one assumes that uit follows a random walk, i.e.:
1. uit = uit−1 + ²it
2. E²it = 0;var(²it ) = σ²2 (homoskedasticity);
3. ²it independent over time and across individuals
then one can show that
[ β̂f dif ) = σ̂²2
Avar(
à N T
XX
!−1
∆xit ∆x0it
(14)
i=1 t=2
PN PT
σ̂²2
=
i=1
2
t=2 ²̂it
N (T − 1) − K
where ²̂it = ∆yit − ∆x0it β̂f dif .
• In other words, OLS on equation (12) with a standard statistical package
gives the correct standard errors if uit follows a random walk.
7
• Again, one can compute standard errors of β̂f dif which are robust against
the presence of autocorrelation and heteroskedasticity:
[ β̂f dif ) = Â−1 B̂ Â−1
Avar(
(15)
´
³P P
PN PT PT
N
T
0
0
where  =
i=1
t=2
s=2 ²̂is ²̂it ∆xis ∆xit
i=1
t=2 ∆xit ∆xit ; B̂ =
• Stata procedure regress computes robust standard errors if one provides
the cluster option.
• If model (4) is correctly specified, the within estimation procedure and the
’first-difference estimation procedure’ should yield similar estimates for the
parameter vector β.
• Question: Which of the two estimation procedure should one prefer? Answer : This depends on the time series behavior of uit : if it is a white noise
error term, use the within estimation procedure. If it follows a radom walk,
use the ’first-difference estimation procedure.
• If the two procedures yield dramatically different estimates for β, one can
conclude that either
1. For some rhs variables the assumption of strict exogeneity does not
hold, or
2. model (4) is incorrectly specified: some important time varying regressors are missing in model (4).
• In other words, it is useful to compare the results of the two estimation
procedures: it provides a check whether or not the model (4) is correctly
specified.
8
An empirical illustration
• Youth sample of the National Longitudinal Survey held in the US.
• 545 full-time working males who have completed their schooling by 1980
and then followed over the period 1980-1987. (It concerns a balanced panel
dataset).
• The males in the sample are young, with an age in 1980 ranging from 17
to 23.
• Consequently, they entered the labor market recently, with an average 3
years of experience in the beginning of the sample period.
• Log wage is the dependent variable.
9
Sample Stata program
cap log c
clear
#delimit
set more 1;
cd\ti2003;
log using males2.log,replace t;
/* males2.log */
use males2;
/* ’tsset’ the observations */
tsset NR YEAR;
/* fixed effect estimation */
xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR);
est store fixed;
predict ee,e;
/* Breusch-Godfrey test on autocorrelation */
xtreg ee l.ee SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR);
drop ee;
/* fixed effect estimation with robust standard errors */
xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR)
cluster(NR);
est store fixed2;
/* first differences */
reg d.WAGE d.SCHOOL d.AGE d.AGE2 d.UNION
d.MAR d.BLACK d.PUB,nocon; predict ee,res;
/* Breusch-Godfrey test
10
on autocorrelation */
reg ee l.ee d.SCHOOL d.AGE d.AGE2 d.UNION
d.MAR d.BLACK d.PUB,nocon;
/* first differences with robust standard errors */
reg d.WAGE
d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK d.PUB,nocon cluster(NR);
/* random effects estimation */
xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR);
est store
random;
/* hausman test */
hausman fixed random;
/* random effects estimation with robust standard errors */
xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR)
cluster(NR);
est store random2;
hausman fixed2 random2; stop;
11
Results
> xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR);
Fixed-effects regression Number of obs
Group variable (i): NR
Number of groups
R-sq:
within = 0.1721
between = 0.1217
overall = 0.1430
corr(u_i, Xb)
= 0.0468
=
=
4360
545
Obs per group: min =
avg =
max =
8
8.0
8
F(5,3810)
Prob > F
=
=
158.45
0.0000
--------------------------------------------WAGE |
Coef.
Std. Err.
t
-------------+------------------------------SCHOOL | (dropped)
AGE |
.2029331
.0307714
6.59
AGE2 | -.0029448
.00063
-4.67
UNION |
.0815508
.0193876
4.21
MAR |
.0554758
.0182796
3.03
BLACK | (dropped)
PUB |
.0397994
.0387418
1.03
_cons | -1.565376
.3726142
-4.20
-------------+------------------------------sigma_u | .36739891
sigma_e | .35255949
rho | .52060273
(fraction of vari
--------------------------------------------F test that all u_i=0:F(544,3810)=7.89;Prob>F=0.0000
12
Test for first order autocorrelation
. predict ee,e;
. /* Breusch-Godfrey test on autocorrelation */
> xtreg ee l.ee SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR);
Fixed-effects (within) regression Number of obs
=
3815
Group variable (i): NR
Number of groups
=
545
R-sq:
within = 0.0046
between = 0.0031
overall = 0.0036
Obs per group: min
avg
max
F(6,3264)
corr(u_i, Xb) = -0.0457
Prob > F
----------------------------------------------------ee |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------ee |
L1. |
.0574774
.0159189
3.61
0.000
AGE | -.0478537
.036904
-1.30
0.195
AGE2 |
.0009327
.0007411
1.26
0.208
UNION | -.0132928
.0203034
-0.65
0.513
MAR |
.0037855
.0189592
0.20
0.842
PUB | -.0056418
.0390771
-0.14
0.885
_cons |
.6104772
.4564699
1.34
0.181
-------------+--------------------------------------sigma_u | .06463637
sigma_e | .32616935
rho |
.0377867
(fraction of variance due
-----------------------------------------------------
=
=
=
=
=
7
7.0
7
2.53
0.0190
• The error term of equation (4) might follow a first order autoregressive
process:
uit = ρuit−1 + ²it
(16)
where ²it is a white noise error term.
13
• If ρ = 0, uit is white noise
• If ρ = 1, uit is a random walk
• H0 : ρ = 0 can be tested by performing the regression presented above. We
obtain ρ̂ = 0.0575 (a low autocorrelation coefficient). However Ho : ρ = 0
is rejected!
• Therefore, it is wise to compute standard errors whichare robust to the
presence of autocorrelation (and heteroskedasticity) (see next page)
14
Results FE regression with Newey-West standard errors
. xtreg WAGE SCHOOL AGE AGE2 UNION
MAR BLACK PUB, fe i(NR) cluster (NR);
Fixed-effects regression Number of obs
Group variable (i): NR
Number of groups
R-sq:
within = 0.1721
between = 0.1217
overall = 0.1430
corr(u_i, Xb)
= 0.0468
=
=
4360
545
Obs per group: min =
avg =
max =
8
8.0
8
F(5,544)
Prob > F
=
=
84.33
0.0000
(Std. Err. adjusted for 545 clusters in NR)
--------------------------------------------|
Robust
WAGE |
Coef.
Std. Err.
t
-------------+------------------------------SCHOOL | (dropped)
AGE |
.2029331
.0377546
5.38
AGE2 | -.0029448
.000765
-3.85
UNION |
.0815508
.0229148
3.56
MAR |
.0554758
.0212522
2.61
BLACK | (dropped)
PUB |
.0397994
.0382693
1.04
_cons | -1.565376
.4625294
-3.38
-------------+------------------------------sigma_u | .36739891
sigma_e | .35255949
rho | .52060273
--------------------------------------------• Obviously, parameter estimates β̂within not affected by computation robust
standard errors
15
• Robust standard errors somewhat bigger than non-robust ones (because
there is a slight positive autocorrelation in the error term)
16
Result first differencing
. tsset NR YEAR;
panel variable:
time variable:
NR, 13 to 12548
YEAR, 1980 to 1987
. reg d.WAGE d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK
d.PUB,nocon;
Source |
SS
df
MS
-------------+-----------------------------Model |
20.131068
5 4.02621359
Residual | 748.481933 3810 .196451951
-------------+-----------------------------Total | 768.613001 3815 .201471298
Number of obs
F( 5, 3810)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
3815
20.49
0.0000
0.0262
0.0249
.44323
----------------------------------------------------D.WAGE |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------SCHOOL |
D1. | (dropped)
AGE |
D1. |
.2105551
.0693057
3.04
0.002
AGE2 |
D1. | -.0030015
.0014173
-2.12
0.034
UNION |
D1. |
.0425044
.0196675
2.16
0.031
MAR |
D1. |
.0396707
.0229176
1.73
0.084
BLACK |
D1. | (dropped)
PUB |
D1. |
.0424731
.041014
1.04
0.300
----------------------------------------------------• Notice that the variable DAGE=1 for all individuals. Therefore DAGE
17
perfect collinear with constant term.
• Drop therefore the constant term using the ’nocon’-option
• Test for autocorrelation indicates the presence of strong negative autocorrelation in the error term ∆uit of equation (6) (ρ̂ = −0.396):
. predict ee,res; (545 missing values generated)
. /* Breusch-Godfrey test on autocorrelation */
> reg ee l.ee d.AGE d.AGE2 d.UNION d.MAR d.PUB,nocon;
Source |
SS
df
MS
Number of obs
----------+------------------------------ F( 6, 3264)
Model |
104.4708
6 17.4117999 Prob > F
Residual | 467.850877 3264 .143336666 R-squared
----------+------------------------------ Adj R-squared
Total | 572.321677 3270
.17502192 Root MSE
----------------------------------------------------ee |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------ee |
L1. | -.3956867
.0146911
-26.93
0.000
AGE |
D1. | -.0970248
.071408
-1.36
0.174
AGE2 |
D1. |
.0018146
.0014331
1.27
0.206
UNION |
D1. |
.0164246
.0185744
0.88
0.377
MAR |
D1. |
.0092617
.0215252
0.43
0.667
PUB |
D1. |
.0075729
.0364785
0.21
0.836
-----------------------------------------------------
18
=
=
=
=
=
=
3270
121.47
0.0000
0.1825
0.1810
.3786
Result first differencing with robust standard errors
. reg DWAGE DSCHOOL DAGE DAGE2 DUNION DMAR
DBLACK DPUB,nocon cluster(NR);
Regression with robust standard errors
Number of obs =
3815
F( 5,
544) =
71.35
Prob > F
= 0.0000
R-squared
= 0.0262
Root MSE
= .44323
Number of clusters (NR) = 545
--------------------------------------------|
Robust
DWAGE |
Coef.
Std. Err.
t
-------------+------------------------------DAGE |
.2105551
.0473998
4.44
DAGE2 | -.0030015
.0009448
-3.18
DUNION |
.0425044
.0220135
1.93
DMAR |
.0396707
.0242298
1.64
DPUB |
.0424731
.0355365
1.20
--------------------------------------------• The robust standard errors are lower (negative autocorrelation in ∆uit )
• estimation results within regression (xtreg) and first differencing procedure
rather similar with one exception: the effect of union status on log(wage)
somewhat different 0.082 (0.024) versus 0.042 (0.022)
• This differece may be due to
1. omitted variable bias
2. UNION not a strict exogenous variable
19
Static random effects model
yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T
(17)
where
- ci := unobserved individual effect. Contrary to the fixed effects model (4),model
(17) assumes that ci and xi = (x0i1 , . . . , x0iT )0 are uncorrelated:
E(ci | xi ) = 0
(18)
- xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly
exogenous (conditional on the unobserved individual effect):
E(uit | xi1 , xi2 , ..., xiT , ci ) = 0
(e.g. no lagged dependent variables, no feedbacks!!!!)
- uit := white noise error term
1. Euit = 0; var(uit ) = σu2 ;
2. uit independent over time and across individuals
- ci := white noise error term
1. Eci = 0; var(ci ) = σc2 ;
2. ci independent across individuals
20
(19)
• Model (17) can be rewritten as follows (stacking observation of 1 individual):
yi = xi β + eT ci + ui , i = 1, ..., N
(20)
where yi = (yi1 , ..., yiT )0 ; ui = (ui1 , ..., uiT )0 ; xi = (xi1 , . . . xiT )0 : (TxK)matrix.
• Notice that
V (ci eT + ui ) = σu2 IT ‘ + σc2 eT e0T = σu2 Ω
(21)
µ
¶
σc2
Ω = IT + 2 eT e0T
σu
(22)
where
• If σu2 and σc2 known, The GLS estimator of β is BLUE:
β̂GLS
N
N
X
¡ 0 −1 ¢−1 X
=
X i Ω xi
x0i Ω−1 yi
i=1
(23)
i=1
• However, since in most cases σu2 and σc2 not known, we have to apply feasible
GLS (FGLS). For this purpose, we need estimates for σu2 and σc2 .
• The FGLS estimate for β obtained by taking the following steps:
1. Obtain estimate σ̂u2 by running within regression’:
ÿit = ẍ0it β + üit
Resulting estimator
(24)
PN PT
σ̂u2
=
i=1
2
t=1 ûit
N (T − 1) − K
where ûit = ÿit − ẍit β̂within .
2. Given σ̂u2 (see step 1), one can obtain the estimate σ̂c2 by performing
the following between’ regression:
ȳi = X̄i β + ζi
σu2
2
T . Obviously, σ̂c =
and σ̂c2 into eq. (22) to
where ζi = ci + ūi ; σζ2 = σc2 +
3. Substitute σ̂u2 (cf. step 1)
21
(25)
σ̂ζ2 −
σ̂u2
T
obtain Ω̂.
4. Perform FGLS to obtain an estimate for beta:
β̂F GLS =
N ³
X
x0i Ω̂−1 xi
N
´−1 X
x0i Ω̂−1 yi
(26)
i=1
i=1
Remarks
• The parameter vector β in model (17) can be estimated consistently by
applying the within estimation method (i.e. applying regression on equation (24). This fixed-effects estimator of β does not exploit the following
information embodied in model (17): E (ci | xi ) ‘ = ‘0. Consequently, fixedeffects estimator of β inefficient.
• Contrary to fixed effect model, variables which are constant over time (e.g.
gender, year of birth) can be included in a random effects regression. Their
parameters are identified.
• If T or N is large, beta is estimated consistently by means of FGLS.
• In program packages like STATA, random-effects estimation and fixedeffects estimation are standard options.
• If one makes the additional assumptions that
1. σc2 ∼ N ID(0, σc2 ),
2. σu2 ∼ N ID(0, σu2 ),
model (5) can be estimated by means of maximum likelihood (see Hsiao
(1986), page 38-40).
• In program packages like STATA, random-effects estimation and fixedeffects estimation are standard options.
• The parameter vector β in model (17) can also be estimated consistently
by using OLS because assumptions (18) and (19) imply that
Exit (ci + uit ) = 0
(27)
This is a sufficient condition to prove consistency of the OLS estimator.
• Condition (27) is satisfied if
22
1. xit is contemporaneously exogenous: Euit xit = 0. This is a much
weaker condition than strict exogeneity!
2. xit uncorrelated with ci
Notice that lagged endogenous variables such as yit−1 do not satisfy condition (2).
• The composite error term, cov(ci + uit , ci + uiτ 6== 0. Consequently, standard errors produced by standard OLS routines are incorrect.
• Instead, one should calculate the robust Newey-West var-cov matrix:
[ β̂OLS ) = Â−1 B̂ Â−1
Avar(
³P P
´
PN PT PT
N
T
0
0
where  =
x
x
;
B̂
=
i=1
t=1 it it
i=1
t=1
s=1 ûis ûit xis xit
(28)
• Stata procedure regress computes robust standard errors if one provides
the cluster option
23
Fixed or random effects
empirical illustration: random effect regression
. xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR);
Random-effects GLS regression Number of obs
=
4360
Group variable (i): NR
Number of groups
=
545
R-sq: within = 0.1715
Obs per group: min =
8
between = 0.1869
avg =
8.0
overall = 0.1798
max =
8
Random effects
Fixed effects
-------------------------------------------------------------------WAGE |
Coef.
Std. Err.
z
Coef.
Std. Err.
-------------+-----------------------------------------------------SCHOOL |
.0487195
.0086393
5.64 (dropped)
AGE |
.1984999
.0306503
6.48
.2029331
.0307714
AGE2 | -.0028933
.0006279
-4.61 -.0029448
.00063
UNION |
.1078025
.0179103
6.02
.0815508
.0193876
MAR |
.0707689
.016747
4.23
.0554758
.0182796
BLACK | -.1456282
.0470085
-3.10 (dropped)
PUB |
.0352122
.0365596
0.96
.0397994
.0387418
_cons | -2.057903
.3809207
-5.40 -1.565376
.3726142
-------------+-----------------------------------------------------sigma_u | .32514288
.36739891
sigma_e | .35255949
.35255949
--------------------------------------------------------------------
• Disadvantage fixed effect model: In a fixed effects model the SCHOOL and
BLACK not identified because these variables do not show time variation.
D
• In a random effects model the SCHOOL and BLACK identified because in
that case it makes the additional assumption: E (ci | xi ) = 0.
• Disadvantage RE model:In regression above, it has been assumed that all
regressors (incl. SCHOOL) are orthogonal on ci . This is an implausible
assumption
• Comparison of the FE and RE results by means of eye-balling) suggest that
results do not differ a lot. Eye-balling may lead to misleading conclusions.
• Hausman (1978) has proposed an alternative way of testing H0 : E (ci | xi ) =
0, the so-called Hausman test (H1 : E (ci | xi ) 6= 0).
• General idea: compare two estimators for beta:
1. First estimator is consistent and efficient under H0 only In this case:
first estimator β̂RE .
24
2. Second estimator is consistent under H0 and H1 . In this case: first
estimator β̂F E .
• If two estimates differ significantly, one should reject H0 . Let us consider:
ˆ RE . It can be shown that under H0 : V (β̂F E − β̂RE ) = V (β̂F E )−
ˆ F E − beta
beta
V (β̂RE )
• Hausman test statistic:
´−1
ξh = (β̂F E − β̂RE ) V (β̂F E ) − V (β̂RE )
(β̂F E − β̂RE )
0
³
(29)
• In empirical illustration Hausman test statistic indicates rejection H0 :
xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK
PUB, fe i(NR) cluster (NR);
est store fixed2;
xtreg WAGE SCHOOL AGE
AGE2 UNION MAR BLACK PUB, re i(NR) cluster(NR);
est store random2;
hausman fixed2 random2;
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
fixed2
random2
Difference
S.E.
-------------+---------------------------------------------------------------AGE |
.2029331
.1984999
.0044332
.0064048
AGE2 |
-.0029448
-.0028933
-.0000515
.0001236
UNION |
.0815508
.1078025
-.0262518
.0092935
MAR |
.0554758
.0707689
-.0152931
.0093238
PUB |
.0397994
.0352122
.0045872
.017353
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:
Ho:
difference in coefficients not systematic
chi2(5) = (b-B)’[(V_b-V_B)^(-1)](b-B)
=
27.62
Prob>chi2 =
0.0000
• I conclude that fixed effects estimation should be preferred.
• However, in many cases one likes to know more about the nature of the
individual fixed effect.
• Alternative to RE estimation: OLS with robust standard errors:
25
. reg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, cluster(NR);
Regression with robust standard errors
Number of obs
F( 7,
544)
Prob > F
R-squared
Root MSE
Number of clusters (NR) = 545
=
=
=
=
=
4360
57.39
0.0000
0.1855
.48106
-----------------------------------------------------------------------------|
Robust
WAGE |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------SCHOOL |
.0522904
.0081987
6.38
0.000
.0361855
.0683953
AGE |
.1817776
.0404779
4.49
0.000
.1022656
.2612897
AGE2 | -.0027101
.0008226
-3.29
0.001
-.0043259
-.0010943
UNION |
.1829544
.0275705
6.64
0.000
.1287967
.2371121
MAR |
.1097745
.025984
4.22
0.000
.0587332
.1608158
BLACK | -.1462183
.0490136
-2.98
0.003
-.2424973
-.0499392
PUB |
.0077676
.0499577
0.16
0.876
-.0903661
.1059013
_cons | -1.837542
.4901171
-3.75
0.000
-2.800295
-.8747878
------------------------------------------------------------------------------
• Notice that the OLS UNION coefficient (0.18) differs dramatically from
the corresponding RE (GLS)-regression coefficient (0.10)
• If RE model is correctly specified, then OLS results and RE results should
be similar
• Our findings again suggests that the RE model is incorrectly specified. Fe
model to be preferred.
26
Mundlak’s formulation
In order to allow for possible correlations between the explanatory variables
and the individual effects, Mundlak (1978) proposes to estimate the following
model:
yit = x0it β + x̄0i γ + ωi + uit
(30)
where
- ωi := unobserved individual effect assumed to be uncorrelated with xi =
(x0i1 , . . . , x0iT )0 :
E(ωi | xi ) = 0
(31)
- xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly
exogenous (conditional on the unobserved individual effect):
E(uit | xi1 , xi2 , ..., xiT , ci ) = 0
(32)
Remarks
• In eq. (30), the individual effect ci has been specified as follows:
ci = x̄0i γ + ωi
(33)
Clearly, γ = 0 explanatory variables xit are uncorrelated with the unobserved individual effect ci .
• In most cases, one should NOT attach any economic interpretation to the
coefficient vector γ.
• Eq (30) can be estimated by e.g. using the random effects routine (xtreg)
in STATA.
• Mundlak (1978) has proven that the fixed effect (within) estimate for β is
the same as the one obtained from equation (30)!!!!
• Mundlak’s approach only leads to the same estimates in case of linear
models and mostly not the case in nonlinear models (e.g.probit)
• Chamberlain (1982) has proposed analternative way to model the relation
between ci and xi :
T
X
ci =
x0it γt + ωi
(34)
t=1
27
• In linear model the approaches of Chamberlain and Mundlak lead to the
same results. This is not true for nonlinear models. In that case, equation
(34) allows for more flexibility than equation (33)
• The test H0 : E (ci | xi ) = 0 boils down to testing the hypothesis: H0 : γ =
0. This is a Wald test (F-test) after having estimated eq. (30)
• This Wald test is asymptotically equivalent to the Hausman test described
above.
• One can add to equation (30) time invariant regressors like gender. The
corresponding coefficient is identified under the assumption that these time
invariant regressors are orthogonal to ci .
28
. xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB MAGE MAGE2 MUNION MMAR MPUB, re i(NR);
Random-effects GLS regression
Number of obs
=
4360
Group variable (i): NR
Number of groups
=
545
R-sq: within = 0.1721
Obs per group: min =
8
between = 0.2161
avg =
8.0
overall = 0.1957
max =
8
Random effects u_i ~ Gaussian
Wald chi2(12)
=
940.26
corr(u_i, X)
= 0 (assumed)
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------WAGE |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------SCHOOL |
.062819
.0100176
6.27
0.000
.0431849
.0824532
AGE |
.2029331
.0307714
6.59
0.000
.1426222
.263244
AGE2 | -.0029448
.00063
-4.67
0.000
-.0041795
-.0017101
UNION |
.0815508
.0193876
4.21
0.000
.0435517
.1195498
MAR |
.0554758
.0182796
3.03
0.002
.0196484
.0913032
BLACK |
-.144793
.0482537
-3.00
0.003
-.2393686
-.0502175
PUB |
.0397994
.0387418
1.03
0.304
-.0361331
.1157319
MAGE | -.0580658
.3031591
-0.19
0.848
-.6522468
.5361152
MAGE2 |
.0005257
.0062603
0.08
0.933
-.0117443
.0127958
MUNION |
.1811375
.0504184
3.59
0.000
.0823192
.2799558
MMAR |
.0857142
.0452574
1.89
0.058
-.0029886
.1744171
MPUB | -.0971983
.1158267
-0.84
0.401
-.3242144
.1298179
_cons |
-1.26983
3.59163
-0.35
0.724
-8.309297
5.769636
-------------+---------------------------------------------------------------test MAGE MAGE2 MUNION MMAR MPUB;
chi2( 5) =
22.84
Prob > chi2 =
0.0004
• Notice that concerning the time varying regressors (age etc.) the ’Mundlak’estimation method and the FE method (see some slides earlier) lead to
exactly the same results
• In the Mundlak model we allow for correlation between the time varying
regressors and the individual effect.
• This is not true for the time invariant regressors BLACK and SCHOOL.
We still make the implausible assumption that these two variables and ci
(which includes ’ability’) are uncorrelated.
29