Cheat sheet

Fixed and Random Effects: Addressing Panel Data
The Problem
 Omitted Variable Bias: unobserved characteristics are biasing the estimates of your
parameters and the error term is no longer random
For example, in this basic regression equation:
Yit* = b0 + b1X1 + b2 X 2 + e it
Where:
 Y is the dependent variable (DV) and i = entity and t = time.
 X represents one independent variable
 β is the coefficient for that variable
 e is the error term, which includes d , the individual error term that is correlated with the
regressors and e, the random error term
The individual, unobserved effects ( d ) are not controlled for and instead are absorbed in the
error term and the parameter estimates.
How this relates to panel data
Datasets with multiple observations, or panel data (see Figure 1), provide an opportunity to pull
out the unobserved characteristics from the parameter estimates and the error term.
Figure 1: Panel Data Output
Panel Data includes multiple observations per entity over a specific period of
time
Solution One: Fixed Effects
A fixed-effects model (FE) takes into account the repetition across entities (individuals, states,
counties, companies, etc.) and controls for individual-level (or entity-level) effects through the
use of dummy variables. FE is used whenever you are only interested in analyzing the impact of
variables that vary over time and when we assume that something within the individual may
impact or bias the predictor or outcome variables.
The “fixed effects” refer to the time-invariant characteristics, both observed and unobserved, that
are unique to each individual.
Assumptions for this model:
1. Each entity has its own individual characteristics that may influence the predictor
variables
2. Entity-level effects are correlated with the predictor variables
3. Entity-level effects are not correlated with other entity’s individual characteristics
The Equation:
Yit = β0 + βkXkt + δtTt + γnEn + uit
Where:
 Y is the dependent variable (DV) where i = entity and t = time.
 X represents one independent variable
 βk is the coefficient for that variable
 uit is the error term
 δt is the coefficient for a binary time regressor
 Tt is time as a binary variable
 γn is the coefficient for the binary repressors (entities)
 En is the entity n; binary (dummies)
In other words, you add a dummy variable for each entity (n-1 for a control group), which
absorbs the effects of particular entities. By adding the dummy, you control for unobserved
heterogeneity and can then estimate the true effect of your predictor variables.
The coefficients on the regressors are measuring the change within individual entities over time.
They will now be interpreted as “as X varies in time by one unit, Y increases or decreases by ß
units.”
Solution Two: Random Effects
The Random Effects model is similar to the Fixed Effects model and addresses the same problem
(how to control for unobservable characteristics that do not change over time in an entity).
However, in a Random Effects model, we make the assumption that the unobservable
characteristics are not correlated with the explanatory variables. This allows us to include
observable, time-invariant variables (such as race or gender) to be included as explanatory
variables.
The individual effect of a specific entity is considered “random,” meaning the effect is measured
by comparing the differences between different entities – not within entities.
By controlling for a random effect, we can make our model more efficient (meaning we’ll have a
tighter distribution curve), but we won’t address possible biased coefficients (like in a Fixed
Effect model).
When to Use Random Effects:
Random Effects models are often used when there is not complete data for all of the entities in
your sample to accurately measure the unobservable characteristics in a dummy variable – or to
be sure that the unobservable characteristics are correlated to the regressors.
Since random effects models treat differences between individual entities as a random draw from
a probability distribution, it can often be more efficient than a Fixed Effects model. If there is a
theory as to why the unobservable characteristics are not correlated to the regressors, than a
Random Effects model may be better.
Fixed or Rand
To decide between fixed or random effects you can run a Hau
Or, you could do a Hausman test…
null hypothesis is that the preferred model is random effects v
Hausman
fixed Test
effects (see Green, 2008, chapter 9). It basically tests w
(uidetermines
) are correlated
with the
regressors,
the
null hypothe
Aerrors
Hausman Test
whether the null hypothesis
(regressors
are not correlated
to the
unobservable characteristics) or the alternative (regressors are correlated to the unobservable
variables) is true.
Run a fixed effects model and save the estimates, then run a
The
Stata notation:
save
the estimates, then perform the test. See below.
xtreg y x1, fe
estimates store fixed
xtreg y x1, re
estimates store random
hausman fixed random
. hausman fixed random
If the Chi-Squared is less than .05 (significant), then use Fixed Effects.
Coefficients
(b)
(B)
fixed
random
Helpful Links:
An easy to digest discussion on fixed effects:
x1
2.48e+09
1.25e+09
http://www.jblumenstock.com/courses/econ174/FEModels.pdf
(b-B)
Difference
sq
1.23e+09
b = consistent under Ho and Ha; o
B = inconsistent under Ha, efficient under Ho; o
Test:
Ho:
difference in coefficients not systematic
Discussion of both models:
www.upa.pdx.edu/IOA/newsom/mlrclass/ho_randfixd.doc
A specific overview of panel data and how fixed and random effects are used:
http://dss.princeton.edu/training/Panel101.pdf