Microsoft PowerPoint - NCRM EPrints Repository

The Choice Between Fixed and
Random Effects Models: Some
Considerations For Educational
Research
Clarke, Crawford, Steele and Vignoles
and funding from ESRC ALSPAC Large Grant
Motivation
• Need evidence from different disciplines to
answer the research question : how can we
improve pupil achievement?
• Contribute to multi-disciplinary understanding
by comparing common alternative models
used by different disciplines
Introduction
• Pupils clustered within schools → hierarchical
models
• Two popular choices: fixed and random effects
• Choice of model:
– Often driven by discipline tradition – economists use fixed
effects for example
– May depend on whether primary interest is pupil or school
characteristics
Illustrations
• What is the impact of SEN status on pupil
achievement?
• What is the impact of FSM status on pupil
achievement?
Why adjust for school effects?
• Want to estimate causal effect of SEN on pupil
attainment no matter what school they attend
• Need to adjust for school differences in SEN labelling
– e.g. children with moderate difficulties more likely to be
labelled SEN in a high achieving school than in a low
achieving school (Keslair et al, 2008; Ofsted, 2004)
– May also be differences due to unobserved factors
• Hierarchical models can account for such differences
– Fixed or random school effects?
Basic model
yis   0  1 X is  us  eis
• FE: us is school dummy variable coefficient
• RE: us is school level residual
– Additional assumption required: E [us|Xis] = 0
• That is, no correlation between unobserved school
characteristics and observed pupil characteristics
• Both: both models assume: E [eis|Xis] = 0
– That is, no correlation between unobserved pupil
characteristics and observed pupil characteristics
Relationship between FE, RE and OLS
yis   0  1 X is  us  eis
FE:
yis  yi  1 ( X is  X i )  (eis  ei )
RE:
yis   yi  1 ( X is   X i )  (eis   ei )
Where:
  1
1
1  S u2 /  e2
How to choose between FE and RE
• Very important to consider sources of bias:
– Is RE assumption (i.e. E [us|Xis] = 0) likely to hold?
• Other issues:
–
–
–
–
Number of clusters
Sample size within clusters
Rich vs. sparse covariates
Whether variation is within or between clusters
• What is the real world consequence of choosing
the wrong model?
SEN: Sources of selection
• Probability of being SEN may depend on:
– Observed school characteristics
• e.g. ability distribution, FSM distribution
– Unobserved school characteristics
• e.g. values/motivation of SEN coordinator
– Observed pupil characteristics
• e.g. prior ability, FSM status
– Unobserved pupil characteristics
• e.g. education values and/or motivation of parents
Intuition I
• If probability of being labelled SEN depends
ONLY on observed school characteristics:
– e.g. schools with high FSM/low achieving intake
are more or less likely to label a child SEN
• Random effects appropriate as RE assumption
holds (i.e. unobserved school effects are not
correlated with probability of being SEN)
Intuition 2
• If probability of being labelled SEN also
depends on unobserved school characteristics:
– e.g. SEN coordinator tries to label as many kids SEN as
possible, because they attract additional resources
• Random effects inappropriate as RE assumption
fails (i.e. unobserved school effects are correlated
with probability of being SEN)
• FE accounts for these unobserved school
characteristics, so is more appropriate
– Identifies impact of SEN on attainment within schools
rather than between schools
Intuition 3
• If probability of being labelled SEN depends
on unobserved pupil/parent characteristics:
– e.g. some parents may push harder for the label
and accompanying additional resources;
– alternatively, some parents may not countenance
the idea of their kid being labelled SEN
• Neither FE nor RE will address the
endogeneity problem:
– Need to resort to other methods, e.g. IV
Data
• Avon Longitudinal Study of Parents and
Children (ALSPAC)
– Children born in Avon between April 1991 and
December 1992
– Rich data
•
•
•
•
Family background (including education, income, etc)
Medical and genetic information
Clinic testing of cognitive and non-cognitive skills
Linked to National Pupil Database
SEN
• One in four pupils in England have SEN age 10
• Just under 4% have statement
• In 2003-04, the period relevant to our data,
approximately £1.3billion spent on primary
school SEN (excluding special schools)
– £1,600 per pupil with SEN
SEN
•
•
•
•
Substantial variation in %SEN across schools
Quarter of schools have fewer than 15% SEN
Quarter with more than 24% SEN
Key question is whether the factors driving
differences in % SEN between schools are
correlated with unmeasured school-level
influences on academic progress
Estimated effects of SEN status on
progress between KS1 and KS2
Model
KS1 test score
Plus administrative data
Plus typical survey data
Plus rich cohort data
Plus school-level data
Fixed effects
ˆ FE
(se)
-0.335
-0.347
-0.355
-0.321
-0.321
(0.025)
(0.025)
(0.025)
(0.024)
(0.024)
Random effects
(i)
ˆ RE
(se)
̂ RE
-0.330
-0.342
-0.349
-0.314
-0.319
(0.025)
(0.025)
(0.024)
(0.024)
(0.024)
0.175
0.161
0.086
0.076
0.064
%
diff
1.5
1.4
1.7
2.2
0.6
Results from this analysis
• SEN negatively correlated with progress
between KS1 and KS2
• Choice of model does not seem to matter here
– OLS, FE and RE give qualitatively similar results
– Correlation between being SEN and unobserved
school characteristics not important
• Regression and random effects assumptions
are likely to hold in this example - prefer the
random effects model
Conclusions
• Often fixed effects approach is used because
RE assumption is a strong one
• Efficiency advantages to the RE approach
• Failure of the regression assumption is major
issue
• Approach each problem with agnostic view on
model/ may not make a difference
– Should be determined by theory and data, not
tradition