Selecting Growth Measures for School and Teacher Evaluations

Selecting Growth Measures for
School and Teacher Evaluations:
Should Proportionality Matter?
Mark Ehlert
Cory Koedel
Eric Parsons
Michael Podgursky
Department of Economics, University of Missouri -Columbia
1
Motivation
• Growth models are increasingly being
incorporated into district, school and teacher
evaluations across the United States.
• The question of how to model student testscore growth has resulted in lively policy
debates
• What are objectives of the evaluation system?
2
Summary of Findings
• We argue that the three key objectives of an
evaluation system in education are:
• Elicit optimal effort from agents
• Provide useful performance signals to educational
actors
• Avoid exacerbating pre-existing inequities in the labor
markets faced by advantaged and disadvantaged
schools
• Given these objectives, the proper growth model
for use in evaluation systems is neither the sparse
model or a traditional VAM model. Instead, it is
what we call the “proportional” VAM model.
3
A Model Menu
• The growth-model choice set essentially comes down to these
three choices:
1)
2)
The sparse model (e.g., SGPs)
A single-equation VAM model (e.g., a standard value-added model from the
research literature).
Yisjt  0  Yisjt 11  Yiskt 12  X it 3  Sit 4  s   ijst
3)
The proportional model (e.g., a two-step fixed effects model or randomeffects model, less common in research)
Yisjt   0  Yisjt 1 1  Yiskt 1 2  X it  3  Sit  4 isjt
ˆisjt   s  uisjt
4
http://www.leg.state.nv.us/session/76th2011/exhibits/assembly/ed/aed1013c.pdf
5
Comparing the One-Step and Two-Step VAMs
• The key difference is that the two-step VAM partials out variation in
test scores attributable to student and school characteristics before
estimating the school effects.
• Specific example: Suppose that high-poverty schools really are of
lower quality (causally).
– In the one-step VAM, the model identifies poverty effects (F/R lunch)
using within-school variation in student poverty status so it can separately
identify differences in school quality between high- and low-poverty
students
– In the two-step VAM, the first step attributes any and all systematic
performance differences between high- and low-poverty students to the
first-step variables (e.g., it purges them from the residuals), including
systematic differences in school quality.
• The implication is that high- and low-poverty schools are only
compared to each other in the model output – not to dissimilar
schools.
6
Missouri Schools, Median SGPs
r = -.37
7
Missouri Schools, one-step fixed effects VAM
r = -.25
8
Missouri Schools, two-step fixed effects VAM
r = -.03
9
Implications
Table 1. Correlations in School-Level Estimates Across Models.
SGP
One-step fixed effects
SGP
1.00
0.82
One-step fixed effects
-1.00
Two-step fixed effects
---
Two-step fixed effects
0.85
0.84
1.00
Table 3. Average Share of Students Eligible for Free/Reduced-Price Lunch in Non-Overlapping TopQuartile Schools Across Models.
Top-Quartile: SGP
Top-Quartile: One-step FE
Top-Quartile: Two-step FE
Outside of Top Quartile:
SGP
Outside of Top Quartile:
One-step FE
Outside of Top Quartile:
Two-step FE
-52.4
47.7
-60.5
32.8
69.7
29.2
--
Note: See text for a description of “non-overlapping top-quartile schools.”
Sample Average Free/Reduced-Price Lunch Share: 48.2
10
Objective #1: Elicit optimal educator effort
•
Barlevy and Neal (2012) cover this issue extensively.
•
There is also a large literature in economics, outside of the educationevaluation context, that is very clear on how to design evaluation systems
when some competitors are at an inherent disadvantage (e.g., see Schotter
and Weigelt (1992), who study this issue in the context of affirmative action
policy) .
•
A central lesson from these studies is that the right signal must be sent to
agents in different circumstances to elicit optimal effort. This signal need not
be a direct measure of absolute productivity; instead, it should be an indicator
of performance relative to equally-circumstanced peers.
•
This is precisely what the proportional model does (based on observable
circumstances).
11
Objective #1: Elicit optimal educator effort
• Limitation: There is some evidence that the effort
response margin in education in the United States is
weak (Springer et al., 2010; on other hand … Fryer,
et. al., 2012).
12
Objective #2: Provide useful performance signals
• It is a common conventional wisdom that growth-model output
doesn’t help educational actors improve. Is this really true?
– Growth model output can:
• Encourage effective schools (districts/teachers) to continue to refine
and augment existing instructional strategies
• Serve as a point of departure for interventions/overhauls in ineffective
schools (districts/teachers)
• Facilitate productive educator-to-educator learning by pairing lowand high-performing schools (districts/teachers).
– The signaling value of an evaluation system is particularly
important when it is difficult for individual schools
(districts/teachers) to assess their performance, and the
performance of others, accurately.
13
Objective #2: Provide useful performance signals
• We argue that the most useful performance signals
come from the two-step “proportional” model.
• This is true even under the maintained assumption that
the one-step VAM produces causal estimates.
• A key reason is that the causal estimates from the one-step VAM
do not account for the counterfactual.
– Example: Disadvantaged schools face weaker educator labor markets
(Boyd et al., 2005; Jacob, 2007; Koedel et al., 2011; Reininger, 2012)
• Sparse models provide the least-useful performance
signals (not controversial: acknowledged in SGP
literature)
14
Example
What do we tell Rough Diamond and Gold Leaf? What do we tell other
schools about Rough Diamond and Gold Leaf?
15
Objective #3: Labor-market inequities
• The labor-market difficulties faced by disadvantaged schools
have been well-documented (Boyd et al., 2005; Jacob, 2007;
Koedel et al., 2011; Reininger, 2012).
• As stakes become attached to school rankings based on
growth models, systems that disproportionately identify poor
schools as “losers” will make positions at these schools even
less desirable to prospective educators.
16
Summary thus far…
• We identify three key objectives of an evaluation
system in education:
1. Elicit optimal effort from agents
2. Provide useful performance signals to educational
actors
3. Avoid exacerbating pre-existing inequities in the
labor markets faced by advantaged and
disadvantaged schools
•
When one considers these key objectives, the
“proportionality” feature of the two-step model is
preferred on all three.
17
But what about…
• The fact remains that schools serving disadvantaged students
really do have lower test scores, and lower unconditional
growth, than schools serving advantaged students.
• There seems to be general concern that this information will
be hidden if we construct proportional growth models.
• Our view is that this concern is largely misguided.
18
Test-Score Levels
19
Concluding Remarks
• Growth models are quickly (very quickly) moving from the
research space to the policy space.
– The policy uses for growth models are not the same as the research
uses for growth models.
• Starting with the right question is important: “What are the
objectives of the evaluation system?”
• Beginning with this question, in our view, leads us to conclude
that a “proportional” growth model is best-suited for use in
educational evaluation programs for districts, schools and
teachers.
20