Summary of State Proposed Growth Models Undergoing Peer Review

Summary of State Proposed Growth Models Undergoing
Peer Review in 2005-2006
In November 2005, The United States Department of Education requested state proposals for
accountability models that incorporate measures of student growth. States were encouraged to
submit proposals to the Department for using growth models to demonstrate accountability under
the federal No Child Left Behind (NCLB) Act. States submitting proposals were required to
show how their growth-based accountability models satisfy NCLB alignment elements and
foundational elements explained in the November letter. Those elements included the following:
NCLB Alignment Elements
1. The accountability model must ensure that all students are proficient by 2013-14 and set
annual goals to ensure that the achievement gap is closing for all groups of students.
2. The accountability model must not set expectations for annual achievement based upon
student background and school characteristics.
3. The accountability model must hold schools accountable for student achievement in
reading/language arts and mathematics.
Foundational Elements
4. The accountability model must ensure that all students in the tested grades are included in
the assessment and accountability system. Schools and districts must be held accountable
for the performance of student subgroups. The accountability model includes all schools
and districts.
5. The State’s assessment system, the basis for the accountability model, must receive
approval through the NCLB peer review process for the 2005-06 school year. In
addition, the full NCLB assessment system in each of grades 3-8 and in high school in
reading/language arts and math must have been in place for two testing cycles.
6. The accountability model and related State data system must track student progress.
7. The accountability model must include student participation rates in the state assessment
system and student achievement on an additional academic indicator.
The Department received 20 proposals in February 2006. Seven (Hawaii, Maryland, Nevada,
New Hampshire, Ohio, Pennsylvania, and South Dakota) of the twenty proposals applied to
evaluate growth in the 2006-2007 school year, so they were not evaluated in the first round of
reviews. Of the 20 submissions, eight (Alaska, Arizona, Arkansas, Delaware, Florida, North
Carolina, Oregon, and Tennessee) were sent to be evaluated by a peer review group. The five
states that submitted but were not sent by USDE for peer review included Colorado, Indiana,
Iowa, South Carolina, and Utah. During the review of the eight proposals, the peer review group
identified additional elements on which they evaluated proposals. Based on their Crosswalks
paper, the peer review group determined that states should
1. incorporate available years of existing achievement data, instead of relying on only two
years of data,
2. align growth timeframe with school grade configuration and district enrollment,
3. make growth projections for all students, not just those below proficient, and
4. hold schools accountable for same subgroups as they did under status model.
They determined that states should not
11
1. use wide confidence intervals,
2. reset growth targets each year, and
3. average scores between proficient and non-proficient students.
The goal of this paper is to identify distinguishing practical and psychometric features of the
submitted proposals. Since states function within unique political contexts and have unique data
structures for their assessments, no one growth model best meets all states’ needs. By comparing
and contrasting growth model proposals on specific features, states looking for a growth model
can use this information to find a model that best fits their particular circumstances. States with
existing growth models might use this information to find similar models for which they can
compare their model.
The tables in Appendix A summarize the eight proposals that were reviewed by the committee
on practical and psychometric features.
Growth Model Features
Pilot Approved in 2006 – State growth model proposals approved by the USDE in 2006. These
states will be able to include growth measures in accountability decisions for the 2005-2006
school year and are expected to present data that show how the model works compared with the
current AYP model.
Resubmitting in 2007 – This field indicates if the state plans to resubmit their growth model
proposal in 2007.
Name of Growth Measure – For states that have named their measure of growth, the name is
listed.
All Students at Proficiency or On Track by 2013-2014 – This field indicates if states’ growth
model proposals will have all students at proficiency or on track to be proficient by 2013-2014.
Scores on Vertical Scale – This field indicates whether the state has a vertical or developmental
scale underlying its assessment.
Vertically Aligned Standards – This field indicates whether the state vertically aligned its
performance standards.
First Year All Grades Tested – The years listed in this field indicate the first time that the state
assessed students in the NCLB grades in reading and mathematics.
Includes Grade 3 Students – States that calculate growth for grade 3 students are noted in this
field.
Includes Students Without Prior Year Score – States that measure growth for students who are
missing the prior year test score are noted in this field. As examples, states may calculate growth
for students without the prior year score by using scores from more than one year prior, the mean
of the grade-level scores the prior year, or a pretest score to compute growth.
21
Includes SWD Taking Alternate Assessment – For states that have an alternate assessment for
students with disabilities, this field indicates whether the proposed growth model can be applied
to scores on that alternate assessment.
Includes ELL Taking Alternate Assessment – For states that have an alternate assessment for
English Language Learners, this field indicates whether the proposed growth model can be
applied to scores on that alternate assessment.
Grades for which Growth is Calculated – Some states calculate growth for grades 3-12, whereas
other states only calculate growth for a subset of grades. This field indicates the grades in which
states are calculating student growth.
Number of Years for Students to Reach Proficiency – States use different numbers of years for
students to reach proficiency. This field indicates the number of years that states use in their
growth-based accountability models.
Growth Tracked Only for Below Proficient Students – This field indicates if state growth
models are only applied to students who are below proficient.
Use Confidence Interval – This field indicates if the state will use a confidence interval in any
way in the growth-based accountability model. Some states propose using a confidence interval
around the percent of students reaching proficiency or meeting growth targets and others propose
using a confidence interval around students’ growth estimates.
Averaging of Calculations – This field identifies states that average calculations over more than
one year or over students within subgroups.
Incorporates Available Years of Achievement Data – This field indicates whether states use all
years of previous achievement data or if the state uses only two years of available data.
Growth Target Timeline Starts Over Each Year – This field indicates if states identify a starting
year for student growth calculations and define time to reach proficiency by this starting year or
whether states recalculate growth each year.
Growth Target Timeline Aligns with Grade Configuration – This field indicates if the number
of years in which students are expected to reach proficiency matches with the grade
configurations in the state. For example, if students below proficiency are expected to grow at a
speed so that they will reach proficiency in three years, does the state set the three years in a way
that maps the three years onto the grades that each school unit serves, such as the three middle
school grades.
Accounts for Students Falling Off Track – This field identifies states that use growth to identify
students who may be above proficiency but whose growth will likely result in their falling below
proficiency at a later date.
Minimum N Same as for AYP Status Model – This field identifies states that apply a minimum
sample size rule to the growth model. For example, many states designate a minimum N (or
sample size), such that the state does not include a subgroup in AYP calculations or growth
model calculations if that subgroup has fewer students than the minimum sample size.
31
Growth Applied after Status and Safe Harbor Provisions – This field indicates states that use
growth as an additional method for meeting AYP after applying the status model and the safe
harbor provisions.
Growth Reported at Individual Level – This field notes states that report growth on individual
student reports or at the individual student level.
STATE
Variable
Alaska
AK
Arizona
AZ
Arkansas
AR
Delaware
DE
Pilot approved in 2006
Resubmitting in 2007
Name of Growth Measure
No
No
No
No
N/A
N/A
N/A
Yes
Measure of Academic
Progress (MAP)
Yes
Yes
Yes, on target
only
No
Yes
2001-2002
No
No
Yes
Yes
2004-2005
No
No
Yes, Grades 3-8
Yes
2005-2006
No
No
No
Yes by changing
performance levels
No information found
For all but 1.7%
SWD
For all but 0.9%
LEP students
4-8
3-10
All students at proficiency or
on track by 2013-2014
Scores on vertical scale
Vertically aligned standards
First year all grades tested
Includes grade 3 students
Includes students without
prior year score
Includes SWD taking
alternate assessment
Includes ELL taking alternate
assessment
Grades for which growth
calculated
Number of years for students
to reach proficiency
Growth tracked only for below
proficient students
Uses confidence interval
No information
found
4-9; 10
3-8
Yes
2001-2002
Yes
No
Yes
Yes
4 grades 4-9; 3
grade 10
Yes
Min (3 years or Grade
8)
Yes
4
N/A
Yes
No
99%
99% around subgroup
percents
Yes for overall
AYP only,
98%, one-tailed
41
unknown level
Yes, up to 3 years
starting 2007-2008
No
Averaging of calculations
Yes, uniform
No
Incorporates available years
of achievement data
Growth target timeline starts
over each year
No
No
Yes
Yes
Yes
No
No, unless
discontinuously
enrolled
Yes
No
No
No
No
No
No
Yes, n=20 for
subgroups, n=40
for SWD or LEP
Yes, n=40 for
subgroups
Yes, n=40 for
subgroups
Yes
No, report growth for
n=10; need max
(90% of students in
status model or 40
students) for
subgroups
Yes
Yes
Yes
Yes
Yes
Yes
No
Growth target timeline aligns
with grade configuration
Accounts for students falling
off track
Minimum n same as for AYP
status model
Growth applied after status
and safe harbor provisions
Growth reported at individual
student level
Yes
No
STATE
Variable
Florida
FL
North Carolina
NC
Oregon
OR
Tennessee
TN
Pilot approved in 2006
Resubmitting in 2007
Name of Growth Measure
All students at proficiency or
on track by 2013-2014
Scores on vertical scale
Vertically aligned standards
First year all grades tested
No
Yes
No
Yes
N/A
Yes
ABCs Growth Model
No
N/A
Yes
N/A
Yes
Yes
Yes
2000-2001,
grades 3-10
No
Yes
Yes
Mid 1980s, new
edition in 2005-2006
Yes, using a pretest
score
Yes
Yes
2004-2005
Yes
Yes
1992 (3-8), 2001
(high school)
Yes, in terms of
status only
Includes students without
prior year score
No
Yes, if pretest score
on change scale
Includes SWD taking
alternate assessment
No
Yes, in terms of status
only
Includes ELL taking alternate
assessment
No
Grades for which growth
calculated
Number of years defining on
3-10
Yes, if meet Full
Academic Year and
have previous scores
on change scale
3-8
3 years
4 years
Includes grade 3 students
51
Yes, in terms of
current year
scores
Yes, in terms of
current year
scores
Yes, in terms of
status changes
only
Yes, in terms of
status changes
only
Yes, in terms of
status only
Yes, in terms of
status only
Unclear
3-8, 10
3-8
4 years
3 years
track
Growth tracked only for below
proficient students
Uses confidence interval
No
No
No
No
No
No
Yes, 95% CI
Averaging of calculations
No
Yes, across students
Incorporates available years
of achievement data
Growth target timeline starts
over each year
Growth target timeline aligns
with grade configuration
Accounts for students falling
off track
Minimum n same as for AYP
status model
Yes
No
No, includes previous
or 2 previous scores
No, reset only when
move to new LEA
No
Yes, around
school average
growth
Yes, across
students
No, planned for
future
Yes
No
Yes
No
Yes
Possibly
Yes
Yes, n=30 or
15% of tested
population
Yes
Yes, n=40, yet growth
calculated when n <
40
Yes
No, n=21 scores
over 2 years for
growth
Yes
Yes, greatest of
(n=45 or 1%)
Yes
Not to parents
Yes
Yes
Growth applied after status
and safe harbor provisions
Growth reported at individual
student level
Yes
No
Yes
No
Yes
Although there are a large number of possible ways to measure growth and design accountability
systems, there are a limited number of methods that underlie those possibilities. The next section
describes six model types and eight characteristics that differentiate the model types. A table is
provided to summarize the information.
Model Descriptions
Improvement: The change between different groups of students is measured from one year to
the next. For example, the percent of fourth graders meeting standard in 2005 maybe compared
to the percent of fourth graders meeting standard in 2006. This is the only growth model
described here that does not track individual student's growth. The current NCLB “safe harbor”
provision is an example of Improvement.
Difference Gain Scores: This is a straightforward method of calculating growth. A student's
score at a starting point is subtracted from the same student's score at an ending point. The
difference or gain is the measure of an individual's growth. The difference scores can be
aggregated to the school or district level to obtain a group growth measure. Growth relative to
performance standards can be measured by determining the difference between a student's
current score and the score that would meet standard in a set number of years (usually one to
three). Dividing the difference by the number of years gives the annual gain needed. A student's
actual gain can be compared to the target growth to see if the student is on track to meet
standard.
Residual Gain Scores: In this model, students' current scores are adjusted by their prior scores
using simple linear regression. Each student has a predicted score based on their prior score(s).
The difference between predicted and actual scores is the residual gain score and it is an
indication of the student's growth compared with others in the group. Residual gains near zero
indicate average growth, positive scores indicate greater than average growth and negative sores
indicate less than average growth. Residual gain scores can be averaged to obtain a group growth
61
measure. Residual gain scores can be more reliable than difference gain scores, but they are not
as easily integrated with performance standards in accountability systems such as NCLB because
they focus on relative gain.
Linear Equating: Equating methods set the first two or four moments of the distributions of
consecutive years equal. A student’s growth is defined as the student’s score in Year 2 minus the
student’s predicted score for Year 2. A student’s predicted score for Year 2 is the score in the
distribution at Year 2 that corresponds to the student’s Year 1 score. The linear equating method
results in a function that can be applied year to year. If the student’s score is above the expected
score, the student is considered to have grown. If the student’s score is below the expected
(predicted) score, the student is considered to have regressed. Expected growth is defined as
maintaining location in the distribution year to year.
Transition Matrix: This model tracks students’ growth at the performance standard level. A
transition matrix is set up with the performance levels (e.g., Does not meet, Meets, Exceeds) for
a given year as rows and the performance levels for a later year as columns. Each cell indicates
the number or percent of students that moved from year 1 levels to year 2 levels. The diagonal
cells indicate students that stayed at the same level, cells below the diagonal show the students
that went down one or more levels and the cells above the diagonal show the students that moved
to higher performance levels. Transition matrices an be combined to show the progress of
students across all tested grades. Transition matrices are a clear presentation of a school's success
(or lack thereof) in getting all students to meet standard.
Multi-level: This model simultaneously estimates student-level and group-level (e.g. school or
district) growth. There is evidence that multi-level models can be more accurate than difference
or residual gain score models. However, even though the statistics have been around for many
years, only recently has the computing power, software and expertise been widely available.
Therefore the results of this model appear to be more complex because the methods are still
unfamiliar to many people.
Characteristics of Growth Models
Database of matched student records over time (Student ID)- Most methods of measuring
growth require analysis of individual student's results from two or more years. This means that
student records from two different test administrations have to be combined or matched. Until
recently, most systems lacked a student ID system that assigned each student a unique
identification number that is recorded with any test that student takes as long as he or she is in
the system. Without such an ID number, record matching must be based on some combination of
name, birthdate or other demographic information. Because of changes in that information over
time, combining students' test records is usually time consuming and prone to non-matches and
mis-matches.
The preferred solution is to develop a student ID system in which the ID number is part of the
students' records system wide. This usually means integrating the ID into each school's student
information system and maintaining a central database to assign and report the ID numbers.
These changes require a significant investment or resources to develop and implement the new
procedures. However, in the long run there should be a reduction in the work needed to match
student records and an improvement in the quality of the information available.
Requires common scale- Some growth methods require student scores to be reported on a
common scale. Ideally this would mean that all the tests were written with measuring growth in
mind and based on content standards that are aligned across grades. However it is possible to
create a common scale for existing tests that were designed separately across grades. There are
71
technical issues and controversies about how to do this equating. Psychometric advice from
experts should be sought before determining that a set of tests can be combined for measuring
growth.
Confidence Interval- A confidence interval (CI) is used to take into account the uncertainty in
measuring growth. Sources for uncertainty include the normal measurement error of the test and
sampling error. There are well established statistical techniques for estimating uncertainty and
growth models use different techniques due to the differences in the way growth is calculated.
Implementing a confidence interval is not simply a matter of applying a statistical technique. A
decision must be made about the width of the confidence interval. A typical narrow CI is 68%
(or 1 standard error) while a wider CI would be 95% or 99%. If the confidence interval is
implemented around the target for growth, choosing a wider instead of a narrow CI will decrease
the chances of incorrectly identifying a student or school as failing to meet the growth target.
However, choosing a wide CI also increases the chances of incorrectly stating that adequate
growth has been made when in fact it hasn't. Choosing the width of the CI always involves a
compromise between those two types of errors. The policy-maker must weigh the consequences
of each type of error and choose a CI that best serves the intended purpose of implementing a
growth model.
Includes students with missing scores- Student mobility is a potential problem in any model of
growth that measures student achievement over time. If large numbers of students (i.e., more
than 15%) do not stay in the same school long enough to take the test each time it is
administered, then the sample of students whose scores are included in the model may not
represent the whole school's enrollment. A problem would arise if the students with missing
scores showed significantly higher or lower performance on the test.
In the improvement model, all students' scores are included. However since individual students
are not tracked over time, it is possible that the differences in performance of students who are
moving in and out of the school contribute to the observed improvement. This could lead to
over- or under- estimation of the school's effectiveness. Multi-level models use all the students'
scores to estimate growth for both individuals and groups. However, students with only one
score are estimated to make growth that was the average their group.
A secondary problem with missing scores occurs when some groups have more missing scores
than other groups. In that case the lack of data may mean that growth estimates for those groups
are less reliable and may have to be excluded from reports. For all models, the effects of missing
scores on growth estimates can be determined and should be examined.
Includes Results From Alternate Tests- Since some models require measurements on a common
scale, if alternative tests (e.g., for students with disabilities, English language learners, or high
school end-of-course tests) do not produce scores on that scale, it may not be possible to include
those students in the growth calculations. The Transition Matrix model is based on student
progress as indicated by changes in the performance levels attained by students. If common
performance levels have been set across different tests, the results can be combined. However,
meaningful results depend on the assumption that the performance standard were set such that it
is reasonable to assume that the performance levels on both tests indicate that students have the
same knowledge and skills.
Growth Question Answered- Growth models may be distinguished by the questions they answer.
Determining the question you want to answer by using a growth model will make it easier to
choose a growth model and to interpret the results of that model.
Student Performance Standards Explicitly Included in Definition of Growth- For two growth
models (Linear Equating and Transition Matrix ), the performance standard is built into the
81
model. Therefore there is no need to so through a separate process to set standards for adequate
growth after the estimate of student growth are obtained. For the other models, users often
conduct a standard setting process similar to the ones used to determine the individual
performance standards for students at each grade level.
Handles Non-linear Growth- Some growth models assume that each student's growth in achievement
follows a straight line. This is generally a reasonable assumption. However, there is evidence that
growth over many years is curved with elementary grade achievement growing at a greater rate than
high school achievement. If growth is measured more frequently than once a year, there may
differences in the rate of growth at different times. If you believe that students' growth is nonlinear, it
maybe necessary to choose a growth model that can statistically model that type of growth.Table of
Growth Model Characteristics
Improvement
Difference Residual
Linear
Gain Scores Gain Scores Equating
Transition
Matrix
Multi-level
Data Requirements
Database of matched
student records over
time (Student ID)
N
Y
Y
Y
Y
Y
Requires common scale
N
Y
N
N
N
Y
Independent
Groups
t-Test
Model Error
Variance
Model Error
Variance
NA
Model Error
Variance
Includes students with
missing scores
Y
N
N
N
N
Y
Includes Results From
Alternate Tests
(Different scales)
N
N
N
N
Y
N
Psychometric Issues
Confidence Interval
Growth Question
Answered
Did this year's Is the gain for
students do
a group higher
better than last or lower than
year's students?
average?
How much
growth was
produced by a
group?
Did students
stay at the
same
percentile?
Are students in
a group
How much of a
making
group's growth
adequate
is the result of
progress across group-level
performance
effects?
levels?
Student Performance
Standards Explicitly
Included in Definition
of Growth
Y
N
N
N
Y
N
Handles non-linear
growth
N
N
Y
N
Y
Y
91