Linear panel data models Rob Alessie email: [email protected] web page: http://members.chello.nl/~r.j.m.alessie Literature (panel data): 1. Verbeek (2004), A guide to modern econometrics, Wiley Chapters (2, 4), 5, (6), 10 2. Wooldridge, J.M. (2002), Econometric analysis of cross-section and panel data, MIT press, ISBN 0-262-23219-7 3. Chapters 1 until 3 of Hsiao (1986) (cf. the reader) 4. some articles Slides downloadable from my web page (see above) 1 Lecture 1: Static linear panel data models Hsiao, chapters 1 until 3; Verbeek (2004), chapter 10 (10.1; 10.2) 1) Prerequisites for the course: OLS and GLS 2) Panels, pseudo panels (time series of cross-sections) 3) Issues involved in utilizing panel data Heterogeneity bias strict exogeneity assumption Selectivity bias Attrition bias 4) Fixed-effects model: formulation and estimation 5) Random effects model: formulation and estimation 6) Fixed effects or random effects 7) Mundlak’s formulation 2 Prerequisites: OLS Consider the following population model (1) where := dependent variable (regressand) := (kx1)-vector of explanatory variables (incl. constant term) := disturbance term (error term) , and are random (vector of) variables In many (old fashioned) econometric textbooks it has been assumed that is ‘fixed’ (and not a vector of random variables). Only in controlled experiments the assumption of ‘fixed’ regressors is reasonable. Controlled experiments barely possible in social sciences Stacking the N observations gives: (1' ) where (Nx1)-vectors : (NxK)-matrix (assumed to be of full rank) 3 Gauss-Markov assumptions: (numbering of assumptions, see Verbeek) A1) A2) A3) : all explanatory variables strictly exogenous : variance of the error term constant over the observations A4) : and finite (homoskedasticity) error terms of the different observations are stochastically independent (no autocorrelation, no heteroskedasticity) Comments on assumption A2: strict exogeneity 1) { } and { meaning for instance that } are uncorrelated , ! A7) The assumption of weak exogeneity (contemporaneous exogeneity) is clearly weaker then strict exogeneity. • In cross-section data derived from a random sample the difference between the two assumptions (A2 and A7) not relevant. • However, this is not true for time series and panel datasets (see below) • The assumption of weak exogeneity (A7) (and obviously strict exogeneity) implies that . In that case, causal inference possible 4 Estimator: rule for using the data to estimate OLS- estimator : Properties of the OLS estimator given Gauss-Markov assumptions 1) Estimator is unbiased: 2) Variance-covariance matrix of 3) OLS estimator : is the best linear unbiased estimator (BLUE): Let a linear unbiased estimator, where A is a (KxN)-matrix BLUE: difference between cov-matrices of and positive semi-definite Unbiased estimator for (see Johnston et al): , where Given = (Nx1)-vector of residuals , can be estimated by Goodness of fit measure: R2 (see Johnston and Dinardo and Hey et al.) 5 . • For exact statistical inference, the following distributional assumption is normally made: A5) Given assumption A5 (instead of A1, A3 and A4), one can derive t-tests and F tests on the (joint) significance of coefficients (See Johnston and Dinardo) 6 Asymptotic properties of the OLS estimator A. Consistency We know that given Gauss-Markov assumptions A1),...,A4): 1) 2) However, distribution of not known (unless we assume A5) Chebycheff inequality: for OLS estimator b this implies that each k-th element satisfies: where If N ckk= (k,k)-element in , becomes large and ,consequently, ckk small. If we assume that A6) or , then OLS estimator b is consistent: . 7 Slutsky’s theorem on probability limits: If and g(.) is a continuous function, it also holds that • OLS consistent under weaker condition that A1,..A4 and A6: Given assumption (A6), Slutsky’s theorem and the law of large numbers: This implies OLS estimator b is consistent if • Notice that assumption A7 ( ) implies In other words, one only needs the weaker assumptions A6 and A7 in order to prove the consistency of the OLS estimator instead of the Gauss-Markov assumptions A1,..A4. 8 b. Asymptotic normality Under the Gauss-Markov assumptions A1,...,A4 and assumption A6, one can show that In practice, where we have finite samples, we approximate distribution of b as follows: (*) Since can be consistently estimated by , this approximate distribution can be estimated as: Assumption A2 can be relaxed as follows without affecting the validity of (*): A8) and are independent This relaxation is relevant in dynamic regression models. Contrary to assumption A2, assumption A8 does not rule out dependence between and . Notice that assumption A8 implies assumption A7 (needed to establish consistency) 9 When is weak exogeneity assumption violated? Omitted variable bias example wage regression (2) • The error term • The assumption of weak exogeneity is: • However, it is likely that ‘ability’ is positively correlated with both the wage rate ( ) and years of education ( ), implying that e.g. contains the unobserved variable ‘ability’. (exogeneity assumption violated). Consequently, OLS estimates of • and biased. One of the main motivations of using panel data is to address the omitted variable bias problem!!! (see below) Measurement error in the RHS-variables attenuation bias (bias towards 0), see Verbeek, chapter 5) Simultaneity See Verbeek, chapter 5 (Keynesian model) 10 Violation of the strict exogeneity assumption while assumption weak exogeneity is satisfied Example 1: dynamic model (3) • We assume that is a white noise error term (no autocorrelation) and contemporaneous exogeneity: (assumption A7 satisfied, OLS yields consistent estimates) is however not a strict exogenous variable. Why not? 1. Assumption strict exogeneity implies that only with 2. but also with, e.g., According to model (3) words, is uncorrelated not and are related to each other. In other implying that is not a strict exogenous variable. Example 2: models with a feedback mechanism (4) := GDP-growth rate := interest rate: assumed to be contemporaneous exogenous. Behavior Alan Greenspan (feedback mechanism): (5) Equation (9) implies that depends on consequently, om and, . In other words NOT a strict exogenous variabele in model 4! 11 os GLS (Generalized Least Squares): (1' ) Gauss-Markov A1 and A2 are still assumed to hold. However, assumptions A3 and A4 are replaced by: : is a positive (semi-)definite matrix not necessarily diagonal (autocorrelation, heteroskedasticity) OLS estimator b is still unbiased but the covariance-matrix becomes: Consequently, OLS estimator is not BLUE. Moreover, the routinely computed expression is wrong Inference (t-tests, F-tests) based on this expression is also misleading. Strategy if known: rewrite model (1' ) in such a way that GaussMarkov conditions hold: (1' ' ) where the square matrix P satisfies: Notice that known , so apply OLS on transformed model (1' ’) produces BLUE-estimator for . This is the GLS-estimator 12 : If is not known, first obtain a consistent estimate Then the Feasible GLS (FGLS) estimator Obviously 13 for for . becomes: Advantages of (pseudo-)panel data in comparison with a crosssection True panel data: same individuals (households, companies) are followed over time. Examples: PSID 1968-....; HRS 1992-...; SEP 1984-....; IPO 1989-... Pseudo-panel data: 1) ‘year-of-birth’ cohorts are followed over time 2) constructed from a time series of cross-sections by computing summary statistics for individuals from the same year-of-birth cohort. 3) British Family Expenditure Survey (FES) 1966-.. Figure 1: Home ownership rate by age in 1996 (Netherlands) 70 home ownership rate 60 50 40 30 20 10 20 30 40 50 age 60 70 80 90 Example: Do households sell the house when they become old? • Figure 1 no answer to this question: from one cross-section one cannot disentangle cohort from age effects. 14 . . . . . . . . . . . .. . . . . . .. . . . . . .43. . . . . . . 53 . . . .48 .38. . . . . .. . . . . . . . . . . 33 58 . . . . . . . . . . . . . 28. . . . . .. . 23 .681542 . . 63 18. . . . . . . . . . .13. . .. . . . . . . . .. 68 .152975 20 25 30 35 40 45 50 55 age 60 65 70 75 80 85 90 Home ownership rate by age and cohort Figure 5a • • One needs pseudo-panel data or true panel data to construct figure above Figure above suggests strong cohort effects! Cross-section evidence very misleading! 15 Advantages true panel in comparison with (a time series of) crosssections • Estimation of dynamic models (or transition models) virtually impossible in case of time series of cross-sections example: cross-section study suggests that female labor force participation of woman is equal to 50% 2 extreme possibilities: a) 50% always work (job turnover rate 0%) b) in a homogeneous population 50% turnover rate Using panel data one can make a distinction between spurious state dependence (unobserved heterogeneity) versus true state dependence. • It (partly) solves the problem the ‘omitted unobserved variable bias’ problem (the effect of ‘ability’ on earnings (dependent variable) and education (rhs-variable) 16 Issues involved in utilizing panel data Heterogeneity bias • Suppose the following ‘true model’: (3) where • • := individual specific effect (random variable) captures all variables which are not observed by the econometrician, e.g. motivation (unobserved heterogeneity) It is perfectly possible that (e.g. in a wage regression ‘motivation’ (subsumed in ) might be correlated with the rhs variable ‘experience’) • Suppose that instead one estimates the following model by OLS: (4) where • . In model (4) one basically assumes that (5) • • Violation of this assumption leads to biased estimate of , cf fig 1.1 till 1.3 (omitted variable bias). primary motivation for using panel data: solution of the omitted variables problem 17 The assumption of strict exogeneity Consider again the following ‘true model’: (6) where • := individual specific effect (random variable) In most estimation procedures, one assumes that is strictly exogenous (conditional on the unobserved individual effect): (7) • 1. Again examples of the violation of the strict exogeneity assumption cf. The Alan Greenspan example) Program evaluation (e.g. effect of job training) (8) question: is an exogenous variable? • • Evaluation panel datasets collected at two points in time. time varying intercept (capturing macro-shocks) • at • control group: • in standard fixed effect estimation procedures allows for the possibility that (program participation might • , for all . ; treatment group: depend on ability. However a feedback mechanism may be going on: (9) • equation (9) implies that other. In other words, • and are related with each is not a strict exogenous variable!!!! standard panel data estimators (fixed effect estimators, dif-in-dif not applicable)!!! 18 Example 2: dynamic regression model (9) • • Model (9) addresses the question: how persistent are wages after controling for unobserved heterogeneity? we assume that is a contemporaneous • exogenous variable. Obviously, lagged wage is logically correlated with • Moreover, is correlated with meaning that . is NOT a strict exogenous variable: (10) 19 Sample selectivity bias (attrition bias) example: New Jersey negative income experience households with an earnings > 1.5 poverty level dropped from the sample. Assume that in population the following relation between earnings and the exogenous variables holds: Only observations included in sample OLS estimate inconsistent (see Hsiao, figure 1.6) Attrition bias: individuals drop out of the panel in a non-random way. How to address attrition bias: 1. Selection bias models (cf. Heckman (1979), Hausman and Wise (1979), chapter 17 Wooldridge (2002) 2. Propensity score method (cf. e.g. Hirano, Imbens, Ridder and Rubin (2001). Refreshment samples required! 20 Static linear panel data models Hsiao, chapters 1-3; Verbeek (2004), chapter 10 (10.1; 10.2) 1. Static linear models: a classification 2. Fixed-effects model: formulation and estimation 3. Random effects model: formulation and estimation 4. Fixed effects or random effects? 5. Mundlak’s formulation; Hausman test 6. IV and GMM methods (chapter 5 Verbeek!!) 7. Alternative IV estimator of static panel data models: Hausman Taylor approach. 1 Static linear models: a classification Consider the following equation yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T (1) where - ci := unobserved individual effect - xit := (Kx1)-vector of exogenous time varying regressors assumed to be contemporaneously exogenous: E(uit | xit , ci ) = 0. In order to estimate model (1), one has to make assumptions about the following: • Is there a correlation between ci and the rhs vars xit ?: E(ci | xi ) = 0 or E(ci | xi ) 6= 0 (2) where xi = (x0i1 , . . . , x0iT )0 . • Is xit strictly exogenous (conditional on the unobserved individual effect)? E(uit | xi1 , xi2 , ..., xiT , ci ) = 0 (e.g. no lagged dependent variables, no feedbacks!!!!) 2 (3) Table 1: estimation methods under different assumptions on strict exogeneity and on correlation between individual effect and rhs vars. E(ci | xi ) 6= 0 E(ci | xi ) = 0 some (not all) rhs vars corr with ci all xit 1) within estimation RE estimation Hausmanprocedure; 2) first dif- (GLS) Taylor strictly exogenous ferencing; 3) Mundlak procedure some xit IV(GMM) 1) Pooled OLS IV(GMM) (no lagged not strictly exogenous dependent var); 2) IV (GMM) (lagged dep var included) 3 Static Fixed-effect models with strict exogenous regressors Balanced panel yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T (4) where - ci := unobserved individual effect. Model (4) allows for correlation between ci and xi (x0i1 , . . . , x0iT )0 : E(ci | xi ) 6= 0 - xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly exogenous (conditional on the unobserved individual effect): E(uit | xi1 , xi2 , ..., xiT , ci ) = 0 (5) (e.g. no lagged dependent variables, no feedbacks!!!!) - uit := error term Remarks • It (partly) solves the problem the omitted unobserved variable bias problem (the effect of ability on earnings (dependent variable) and education (rhsvariable) • The parameter vector β (cf. equation (4) can be estimated consistently in two ways: 1. the within estimation procedure 2. take the first difference of equation (4) ∆yit = ∆x0it β + ∆uit , t = 2, . . . , T 4 (6) The within estimation procedure Computation of β̂within requires the following steps: • First average equation (4) over t=1,...,T to get the following cross-section equation: ȳi = x̄i β + ci + ūi , i = 1, ..., N (7) P P P where ȳi = T −1 Tt=1 yit ; x̄i = T −1 Tt=1 xit ; ūi = T −1 Tt=1 uit • Subtraction of equation (7) from (4) gives: ÿit = ẍ0it β + üit , i = 1, ..., N, t = 1, ..., T (8) where ÿit = yit − y i ;ẍit = xit − xi ; üit = uit − ūi ; • Like first differencing, ’time demeaning’ also removes the individual effect ci (cf. eq. (8)). • The within estimator β̂within can be obtained by applying OLS on eq. (8)) • If one assumes that uit is a white noise error term, i.e.: 1. Euit = 0;var(uit ) = σu2 (homoskedasticity); 2. uit independent over time and across individuals then one can show that à [ β̂within ) = σ̂u2 Avar( N X T X !−1 ẍit ẍ0it (9) i=1 t=1 PN PT σ̂u2 = i=1 2 t=1 ûit N (T − 1) − K where ûit = ÿit − ẍit β̂within . • In other words, running OLS on equation (8) with a standard statistical package gives almost the correct standard errors. However, pay careful attention to the denominator of (15): N(T-1)-K instead of NT-K (as used by OLS packages). • The Stata procedures xtreg (and areg) computes the within estimation procedure and calculates standard errors correctly. 5 • Suppose that there is autocorrelation and heteroskedasticity in uit . In that case, formula (9) is incorrect. One has to compute robust Newey-West standard errors: [ β̂within ) = Â−1 B̂ Â−1 (10) Avar( ³P P ´ PN PT PT N T 0 0 where  = ẍ ẍ ; B̂ = i=1 t=1 it it i=1 t=1 s=1 ûis ûit ẍis ẍit • Stata procedure xtreg computes robust standard errors if one provides the cluster option • Estimator for ci : ĉi = ȳi − X̄i0 β̂within Remarks 1. The parameter vector β is identified due to time variation (’within variation’) in Xit 2. Variables which are constant over time (e.g. gender, year of birth) cannot be included in a fixed effects regression. Their parameters are not identified (subsumed in the fixed effect). 3. Estimator ĉi and β̂within consistent if T (and N) large 4. If T is small and N large,β̂within still consistent; ĉi , however, not because it is based on a small no. of obs (T). 5. Why is β̂within a consistent estimator? Consistency of requires the following assumption (compare assumption A7) needed to establish consistency of OLS estimator): E ẍ0it üit = E(xit − x̄i )0 (uit − ūi ) = 0 (11) Due to the assumption of strict exogeneity, condition (11) is satisfied. 6. Notice that the assumption of contemporaneous exogeneity is too weak to prove consistency of the fixed effects estimator because it does not exclude correlation between uit and x̄i 6 Estimation by taking first differences One can estimate consistently the β-parameters of model (4) by a) taking first differences: (12) ∆yit = ∆Xit0 β + ∆uit , t = 2, . . . , T and estimate β by means of OLS. Denote the resulting estimator as β̂f dif Remarks • By taking first differences, the individual effect ci is swept out of the model. • Why is β̂f dif a consistent estimator? Consistency of requires the following assumption (compare assumption A7) needed to establish consistency of OLS estimator): E∆x0it ∆uit = E(xit − xit−1 )0 (uit − uit−1 ) = 0 (13) Due to the assumption of strict exogeneity, condition (13) is satisfied. • Notice that the assumption of contemporaneous exogeneity is too weak to prove consistency of the estimator β̂f dif because it does not exclude correlation between uit−1 and xit . • If one assumes that uit follows a random walk, i.e.: 1. uit = uit−1 + ²it 2. E²it = 0;var(²it ) = σ²2 (homoskedasticity); 3. ²it independent over time and across individuals then one can show that [ β̂f dif ) = σ̂²2 Avar( à N T XX !−1 ∆xit ∆x0it (14) i=1 t=2 PN PT σ̂²2 = i=1 2 t=2 ²̂it N (T − 1) − K where ²̂it = ∆yit − ∆x0it β̂f dif . • In other words, OLS on equation (12) with a standard statistical package gives the correct standard errors if uit follows a random walk. 7 • Again, one can compute standard errors of β̂f dif which are robust against the presence of autocorrelation and heteroskedasticity: [ β̂f dif ) = Â−1 B̂ Â−1 Avar( (15) ´ ³P P PN PT PT N T 0 0 where  = i=1 t=2 s=2 ²̂is ²̂it ∆xis ∆xit i=1 t=2 ∆xit ∆xit ; B̂ = • Stata procedure regress computes robust standard errors if one provides the cluster option. • If model (4) is correctly specified, the within estimation procedure and the ’first-difference estimation procedure’ should yield similar estimates for the parameter vector β. • Question: Which of the two estimation procedure should one prefer? Answer : This depends on the time series behavior of uit : if it is a white noise error term, use the within estimation procedure. If it follows a radom walk, use the ’first-difference estimation procedure. • If the two procedures yield dramatically different estimates for β, one can conclude that either 1. For some rhs variables the assumption of strict exogeneity does not hold, or 2. model (4) is incorrectly specified: some important time varying regressors are missing in model (4). • In other words, it is useful to compare the results of the two estimation procedures: it provides a check whether or not the model (4) is correctly specified. 8 An empirical illustration • Youth sample of the National Longitudinal Survey held in the US. • 545 full-time working males who have completed their schooling by 1980 and then followed over the period 1980-1987. (It concerns a balanced panel dataset). • The males in the sample are young, with an age in 1980 ranging from 17 to 23. • Consequently, they entered the labor market recently, with an average 3 years of experience in the beginning of the sample period. • Log wage is the dependent variable. 9 Sample Stata program cap log c clear #delimit set more 1; cd\ti2003; log using males2.log,replace t; /* males2.log */ use males2; /* ’tsset’ the observations */ tsset NR YEAR; /* fixed effect estimation */ xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR); est store fixed; predict ee,e; /* Breusch-Godfrey test on autocorrelation */ xtreg ee l.ee SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR); drop ee; /* fixed effect estimation with robust standard errors */ xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR) cluster(NR); est store fixed2; /* first differences */ reg d.WAGE d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK d.PUB,nocon; predict ee,res; /* Breusch-Godfrey test 10 on autocorrelation */ reg ee l.ee d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK d.PUB,nocon; /* first differences with robust standard errors */ reg d.WAGE d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK d.PUB,nocon cluster(NR); /* random effects estimation */ xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR); est store random; /* hausman test */ hausman fixed random; /* random effects estimation with robust standard errors */ xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR) cluster(NR); est store random2; hausman fixed2 random2; stop; 11 Results > xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR); Fixed-effects regression Number of obs Group variable (i): NR Number of groups R-sq: within = 0.1721 between = 0.1217 overall = 0.1430 corr(u_i, Xb) = 0.0468 = = 4360 545 Obs per group: min = avg = max = 8 8.0 8 F(5,3810) Prob > F = = 158.45 0.0000 --------------------------------------------WAGE | Coef. Std. Err. t -------------+------------------------------SCHOOL | (dropped) AGE | .2029331 .0307714 6.59 AGE2 | -.0029448 .00063 -4.67 UNION | .0815508 .0193876 4.21 MAR | .0554758 .0182796 3.03 BLACK | (dropped) PUB | .0397994 .0387418 1.03 _cons | -1.565376 .3726142 -4.20 -------------+------------------------------sigma_u | .36739891 sigma_e | .35255949 rho | .52060273 (fraction of vari --------------------------------------------F test that all u_i=0:F(544,3810)=7.89;Prob>F=0.0000 12 Test for first order autocorrelation . predict ee,e; . /* Breusch-Godfrey test on autocorrelation */ > xtreg ee l.ee SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR); Fixed-effects (within) regression Number of obs = 3815 Group variable (i): NR Number of groups = 545 R-sq: within = 0.0046 between = 0.0031 overall = 0.0036 Obs per group: min avg max F(6,3264) corr(u_i, Xb) = -0.0457 Prob > F ----------------------------------------------------ee | Coef. Std. Err. t P>|t| -------------+--------------------------------------ee | L1. | .0574774 .0159189 3.61 0.000 AGE | -.0478537 .036904 -1.30 0.195 AGE2 | .0009327 .0007411 1.26 0.208 UNION | -.0132928 .0203034 -0.65 0.513 MAR | .0037855 .0189592 0.20 0.842 PUB | -.0056418 .0390771 -0.14 0.885 _cons | .6104772 .4564699 1.34 0.181 -------------+--------------------------------------sigma_u | .06463637 sigma_e | .32616935 rho | .0377867 (fraction of variance due ----------------------------------------------------- = = = = = 7 7.0 7 2.53 0.0190 • The error term of equation (4) might follow a first order autoregressive process: uit = ρuit−1 + ²it (16) where ²it is a white noise error term. 13 • If ρ = 0, uit is white noise • If ρ = 1, uit is a random walk • H0 : ρ = 0 can be tested by performing the regression presented above. We obtain ρ̂ = 0.0575 (a low autocorrelation coefficient). However Ho : ρ = 0 is rejected! • Therefore, it is wise to compute standard errors whichare robust to the presence of autocorrelation (and heteroskedasticity) (see next page) 14 Results FE regression with Newey-West standard errors . xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR) cluster (NR); Fixed-effects regression Number of obs Group variable (i): NR Number of groups R-sq: within = 0.1721 between = 0.1217 overall = 0.1430 corr(u_i, Xb) = 0.0468 = = 4360 545 Obs per group: min = avg = max = 8 8.0 8 F(5,544) Prob > F = = 84.33 0.0000 (Std. Err. adjusted for 545 clusters in NR) --------------------------------------------| Robust WAGE | Coef. Std. Err. t -------------+------------------------------SCHOOL | (dropped) AGE | .2029331 .0377546 5.38 AGE2 | -.0029448 .000765 -3.85 UNION | .0815508 .0229148 3.56 MAR | .0554758 .0212522 2.61 BLACK | (dropped) PUB | .0397994 .0382693 1.04 _cons | -1.565376 .4625294 -3.38 -------------+------------------------------sigma_u | .36739891 sigma_e | .35255949 rho | .52060273 --------------------------------------------• Obviously, parameter estimates β̂within not affected by computation robust standard errors 15 • Robust standard errors somewhat bigger than non-robust ones (because there is a slight positive autocorrelation in the error term) 16 Result first differencing . tsset NR YEAR; panel variable: time variable: NR, 13 to 12548 YEAR, 1980 to 1987 . reg d.WAGE d.SCHOOL d.AGE d.AGE2 d.UNION d.MAR d.BLACK d.PUB,nocon; Source | SS df MS -------------+-----------------------------Model | 20.131068 5 4.02621359 Residual | 748.481933 3810 .196451951 -------------+-----------------------------Total | 768.613001 3815 .201471298 Number of obs F( 5, 3810) Prob > F R-squared Adj R-squared Root MSE = = = = = = 3815 20.49 0.0000 0.0262 0.0249 .44323 ----------------------------------------------------D.WAGE | Coef. Std. Err. t P>|t| -------------+--------------------------------------SCHOOL | D1. | (dropped) AGE | D1. | .2105551 .0693057 3.04 0.002 AGE2 | D1. | -.0030015 .0014173 -2.12 0.034 UNION | D1. | .0425044 .0196675 2.16 0.031 MAR | D1. | .0396707 .0229176 1.73 0.084 BLACK | D1. | (dropped) PUB | D1. | .0424731 .041014 1.04 0.300 ----------------------------------------------------• Notice that the variable DAGE=1 for all individuals. Therefore DAGE 17 perfect collinear with constant term. • Drop therefore the constant term using the ’nocon’-option • Test for autocorrelation indicates the presence of strong negative autocorrelation in the error term ∆uit of equation (6) (ρ̂ = −0.396): . predict ee,res; (545 missing values generated) . /* Breusch-Godfrey test on autocorrelation */ > reg ee l.ee d.AGE d.AGE2 d.UNION d.MAR d.PUB,nocon; Source | SS df MS Number of obs ----------+------------------------------ F( 6, 3264) Model | 104.4708 6 17.4117999 Prob > F Residual | 467.850877 3264 .143336666 R-squared ----------+------------------------------ Adj R-squared Total | 572.321677 3270 .17502192 Root MSE ----------------------------------------------------ee | Coef. Std. Err. t P>|t| -------------+--------------------------------------ee | L1. | -.3956867 .0146911 -26.93 0.000 AGE | D1. | -.0970248 .071408 -1.36 0.174 AGE2 | D1. | .0018146 .0014331 1.27 0.206 UNION | D1. | .0164246 .0185744 0.88 0.377 MAR | D1. | .0092617 .0215252 0.43 0.667 PUB | D1. | .0075729 .0364785 0.21 0.836 ----------------------------------------------------- 18 = = = = = = 3270 121.47 0.0000 0.1825 0.1810 .3786 Result first differencing with robust standard errors . reg DWAGE DSCHOOL DAGE DAGE2 DUNION DMAR DBLACK DPUB,nocon cluster(NR); Regression with robust standard errors Number of obs = 3815 F( 5, 544) = 71.35 Prob > F = 0.0000 R-squared = 0.0262 Root MSE = .44323 Number of clusters (NR) = 545 --------------------------------------------| Robust DWAGE | Coef. Std. Err. t -------------+------------------------------DAGE | .2105551 .0473998 4.44 DAGE2 | -.0030015 .0009448 -3.18 DUNION | .0425044 .0220135 1.93 DMAR | .0396707 .0242298 1.64 DPUB | .0424731 .0355365 1.20 --------------------------------------------• The robust standard errors are lower (negative autocorrelation in ∆uit ) • estimation results within regression (xtreg) and first differencing procedure rather similar with one exception: the effect of union status on log(wage) somewhat different 0.082 (0.024) versus 0.042 (0.022) • This differece may be due to 1. omitted variable bias 2. UNION not a strict exogenous variable 19 Static random effects model yit = x0it β + ci + uit , i = 1, . . . , N, t = 1, . . . , T (17) where - ci := unobserved individual effect. Contrary to the fixed effects model (4),model (17) assumes that ci and xi = (x0i1 , . . . , x0iT )0 are uncorrelated: E(ci | xi ) = 0 (18) - xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly exogenous (conditional on the unobserved individual effect): E(uit | xi1 , xi2 , ..., xiT , ci ) = 0 (e.g. no lagged dependent variables, no feedbacks!!!!) - uit := white noise error term 1. Euit = 0; var(uit ) = σu2 ; 2. uit independent over time and across individuals - ci := white noise error term 1. Eci = 0; var(ci ) = σc2 ; 2. ci independent across individuals 20 (19) • Model (17) can be rewritten as follows (stacking observation of 1 individual): yi = xi β + eT ci + ui , i = 1, ..., N (20) where yi = (yi1 , ..., yiT )0 ; ui = (ui1 , ..., uiT )0 ; xi = (xi1 , . . . xiT )0 : (TxK)matrix. • Notice that V (ci eT + ui ) = σu2 IT ‘ + σc2 eT e0T = σu2 Ω (21) µ ¶ σc2 Ω = IT + 2 eT e0T σu (22) where • If σu2 and σc2 known, The GLS estimator of β is BLUE: β̂GLS N N X ¡ 0 −1 ¢−1 X = X i Ω xi x0i Ω−1 yi i=1 (23) i=1 • However, since in most cases σu2 and σc2 not known, we have to apply feasible GLS (FGLS). For this purpose, we need estimates for σu2 and σc2 . • The FGLS estimate for β obtained by taking the following steps: 1. Obtain estimate σ̂u2 by running within regression’: ÿit = ẍ0it β + üit Resulting estimator (24) PN PT σ̂u2 = i=1 2 t=1 ûit N (T − 1) − K where ûit = ÿit − ẍit β̂within . 2. Given σ̂u2 (see step 1), one can obtain the estimate σ̂c2 by performing the following between’ regression: ȳi = X̄i β + ζi σu2 2 T . Obviously, σ̂c = and σ̂c2 into eq. (22) to where ζi = ci + ūi ; σζ2 = σc2 + 3. Substitute σ̂u2 (cf. step 1) 21 (25) σ̂ζ2 − σ̂u2 T obtain Ω̂. 4. Perform FGLS to obtain an estimate for beta: β̂F GLS = N ³ X x0i Ω̂−1 xi N ´−1 X x0i Ω̂−1 yi (26) i=1 i=1 Remarks • The parameter vector β in model (17) can be estimated consistently by applying the within estimation method (i.e. applying regression on equation (24). This fixed-effects estimator of β does not exploit the following information embodied in model (17): E (ci | xi ) ‘ = ‘0. Consequently, fixedeffects estimator of β inefficient. • Contrary to fixed effect model, variables which are constant over time (e.g. gender, year of birth) can be included in a random effects regression. Their parameters are identified. • If T or N is large, beta is estimated consistently by means of FGLS. • In program packages like STATA, random-effects estimation and fixedeffects estimation are standard options. • If one makes the additional assumptions that 1. σc2 ∼ N ID(0, σc2 ), 2. σu2 ∼ N ID(0, σu2 ), model (5) can be estimated by means of maximum likelihood (see Hsiao (1986), page 38-40). • In program packages like STATA, random-effects estimation and fixedeffects estimation are standard options. • The parameter vector β in model (17) can also be estimated consistently by using OLS because assumptions (18) and (19) imply that Exit (ci + uit ) = 0 (27) This is a sufficient condition to prove consistency of the OLS estimator. • Condition (27) is satisfied if 22 1. xit is contemporaneously exogenous: Euit xit = 0. This is a much weaker condition than strict exogeneity! 2. xit uncorrelated with ci Notice that lagged endogenous variables such as yit−1 do not satisfy condition (2). • The composite error term, cov(ci + uit , ci + uiτ 6== 0. Consequently, standard errors produced by standard OLS routines are incorrect. • Instead, one should calculate the robust Newey-West var-cov matrix: [ β̂OLS ) = Â−1 B̂ Â−1 Avar( ³P P ´ PN PT PT N T 0 0 where  = x x ; B̂ = i=1 t=1 it it i=1 t=1 s=1 ûis ûit xis xit (28) • Stata procedure regress computes robust standard errors if one provides the cluster option 23 Fixed or random effects empirical illustration: random effect regression . xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR); Random-effects GLS regression Number of obs = 4360 Group variable (i): NR Number of groups = 545 R-sq: within = 0.1715 Obs per group: min = 8 between = 0.1869 avg = 8.0 overall = 0.1798 max = 8 Random effects Fixed effects -------------------------------------------------------------------WAGE | Coef. Std. Err. z Coef. Std. Err. -------------+-----------------------------------------------------SCHOOL | .0487195 .0086393 5.64 (dropped) AGE | .1984999 .0306503 6.48 .2029331 .0307714 AGE2 | -.0028933 .0006279 -4.61 -.0029448 .00063 UNION | .1078025 .0179103 6.02 .0815508 .0193876 MAR | .0707689 .016747 4.23 .0554758 .0182796 BLACK | -.1456282 .0470085 -3.10 (dropped) PUB | .0352122 .0365596 0.96 .0397994 .0387418 _cons | -2.057903 .3809207 -5.40 -1.565376 .3726142 -------------+-----------------------------------------------------sigma_u | .32514288 .36739891 sigma_e | .35255949 .35255949 -------------------------------------------------------------------- • Disadvantage fixed effect model: In a fixed effects model the SCHOOL and BLACK not identified because these variables do not show time variation. D • In a random effects model the SCHOOL and BLACK identified because in that case it makes the additional assumption: E (ci | xi ) = 0. • Disadvantage RE model:In regression above, it has been assumed that all regressors (incl. SCHOOL) are orthogonal on ci . This is an implausible assumption • Comparison of the FE and RE results by means of eye-balling) suggest that results do not differ a lot. Eye-balling may lead to misleading conclusions. • Hausman (1978) has proposed an alternative way of testing H0 : E (ci | xi ) = 0, the so-called Hausman test (H1 : E (ci | xi ) 6= 0). • General idea: compare two estimators for beta: 1. First estimator is consistent and efficient under H0 only In this case: first estimator β̂RE . 24 2. Second estimator is consistent under H0 and H1 . In this case: first estimator β̂F E . • If two estimates differ significantly, one should reject H0 . Let us consider: ˆ RE . It can be shown that under H0 : V (β̂F E − β̂RE ) = V (β̂F E )− ˆ F E − beta beta V (β̂RE ) • Hausman test statistic: ´−1 ξh = (β̂F E − β̂RE ) V (β̂F E ) − V (β̂RE ) (β̂F E − β̂RE ) 0 ³ (29) • In empirical illustration Hausman test statistic indicates rejection H0 : xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, fe i(NR) cluster (NR); est store fixed2; xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, re i(NR) cluster(NR); est store random2; hausman fixed2 random2; ---- Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed2 random2 Difference S.E. -------------+---------------------------------------------------------------AGE | .2029331 .1984999 .0044332 .0064048 AGE2 | -.0029448 -.0028933 -.0000515 .0001236 UNION | .0815508 .1078025 -.0262518 .0092935 MAR | .0554758 .0707689 -.0152931 .0093238 PUB | .0397994 .0352122 .0045872 .017353 -----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(5) = (b-B)’[(V_b-V_B)^(-1)](b-B) = 27.62 Prob>chi2 = 0.0000 • I conclude that fixed effects estimation should be preferred. • However, in many cases one likes to know more about the nature of the individual fixed effect. • Alternative to RE estimation: OLS with robust standard errors: 25 . reg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB, cluster(NR); Regression with robust standard errors Number of obs F( 7, 544) Prob > F R-squared Root MSE Number of clusters (NR) = 545 = = = = = 4360 57.39 0.0000 0.1855 .48106 -----------------------------------------------------------------------------| Robust WAGE | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------SCHOOL | .0522904 .0081987 6.38 0.000 .0361855 .0683953 AGE | .1817776 .0404779 4.49 0.000 .1022656 .2612897 AGE2 | -.0027101 .0008226 -3.29 0.001 -.0043259 -.0010943 UNION | .1829544 .0275705 6.64 0.000 .1287967 .2371121 MAR | .1097745 .025984 4.22 0.000 .0587332 .1608158 BLACK | -.1462183 .0490136 -2.98 0.003 -.2424973 -.0499392 PUB | .0077676 .0499577 0.16 0.876 -.0903661 .1059013 _cons | -1.837542 .4901171 -3.75 0.000 -2.800295 -.8747878 ------------------------------------------------------------------------------ • Notice that the OLS UNION coefficient (0.18) differs dramatically from the corresponding RE (GLS)-regression coefficient (0.10) • If RE model is correctly specified, then OLS results and RE results should be similar • Our findings again suggests that the RE model is incorrectly specified. Fe model to be preferred. 26 Mundlak’s formulation In order to allow for possible correlations between the explanatory variables and the individual effects, Mundlak (1978) proposes to estimate the following model: yit = x0it β + x̄0i γ + ωi + uit (30) where - ωi := unobserved individual effect assumed to be uncorrelated with xi = (x0i1 , . . . , x0iT )0 : E(ωi | xi ) = 0 (31) - xit := (Kx1)-vector of exogenous time varying regressors assumed to be strictly exogenous (conditional on the unobserved individual effect): E(uit | xi1 , xi2 , ..., xiT , ci ) = 0 (32) Remarks • In eq. (30), the individual effect ci has been specified as follows: ci = x̄0i γ + ωi (33) Clearly, γ = 0 explanatory variables xit are uncorrelated with the unobserved individual effect ci . • In most cases, one should NOT attach any economic interpretation to the coefficient vector γ. • Eq (30) can be estimated by e.g. using the random effects routine (xtreg) in STATA. • Mundlak (1978) has proven that the fixed effect (within) estimate for β is the same as the one obtained from equation (30)!!!! • Mundlak’s approach only leads to the same estimates in case of linear models and mostly not the case in nonlinear models (e.g.probit) • Chamberlain (1982) has proposed analternative way to model the relation between ci and xi : T X ci = x0it γt + ωi (34) t=1 27 • In linear model the approaches of Chamberlain and Mundlak lead to the same results. This is not true for nonlinear models. In that case, equation (34) allows for more flexibility than equation (33) • The test H0 : E (ci | xi ) = 0 boils down to testing the hypothesis: H0 : γ = 0. This is a Wald test (F-test) after having estimated eq. (30) • This Wald test is asymptotically equivalent to the Hausman test described above. • One can add to equation (30) time invariant regressors like gender. The corresponding coefficient is identified under the assumption that these time invariant regressors are orthogonal to ci . 28 . xtreg WAGE SCHOOL AGE AGE2 UNION MAR BLACK PUB MAGE MAGE2 MUNION MMAR MPUB, re i(NR); Random-effects GLS regression Number of obs = 4360 Group variable (i): NR Number of groups = 545 R-sq: within = 0.1721 Obs per group: min = 8 between = 0.2161 avg = 8.0 overall = 0.1957 max = 8 Random effects u_i ~ Gaussian Wald chi2(12) = 940.26 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 -----------------------------------------------------------------------------WAGE | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------SCHOOL | .062819 .0100176 6.27 0.000 .0431849 .0824532 AGE | .2029331 .0307714 6.59 0.000 .1426222 .263244 AGE2 | -.0029448 .00063 -4.67 0.000 -.0041795 -.0017101 UNION | .0815508 .0193876 4.21 0.000 .0435517 .1195498 MAR | .0554758 .0182796 3.03 0.002 .0196484 .0913032 BLACK | -.144793 .0482537 -3.00 0.003 -.2393686 -.0502175 PUB | .0397994 .0387418 1.03 0.304 -.0361331 .1157319 MAGE | -.0580658 .3031591 -0.19 0.848 -.6522468 .5361152 MAGE2 | .0005257 .0062603 0.08 0.933 -.0117443 .0127958 MUNION | .1811375 .0504184 3.59 0.000 .0823192 .2799558 MMAR | .0857142 .0452574 1.89 0.058 -.0029886 .1744171 MPUB | -.0971983 .1158267 -0.84 0.401 -.3242144 .1298179 _cons | -1.26983 3.59163 -0.35 0.724 -8.309297 5.769636 -------------+---------------------------------------------------------------test MAGE MAGE2 MUNION MMAR MPUB; chi2( 5) = 22.84 Prob > chi2 = 0.0004 • Notice that concerning the time varying regressors (age etc.) the ’Mundlak’estimation method and the FE method (see some slides earlier) lead to exactly the same results • In the Mundlak model we allow for correlation between the time varying regressors and the individual effect. • This is not true for the time invariant regressors BLACK and SCHOOL. We still make the implausible assumption that these two variables and ci (which includes ’ability’) are uncorrelated. 29
© Copyright 2026 Paperzz