Document

Setting Benchmarks for Early Grade Reading
October 1, 2015
Agenda
8:30 a.m. – 9:00 a.m.
Registration and Coffee
9:00 a.m. – 9:10 a.m.
Welcome
Barbara N. Turner, URC, President
Penelope Bender, USAID, Goal 1 Lead/ Senior Education Advisor
Facilitators
Simon King, RTI, Research Statistician
Joe DeStefano, RTI, Director of the Policy, Systems, and Governance Program
9:10 a.m. – 9:30 a.m.
Introduction: Importance of Reading in the Early Grades
Defining and Discussing Benchmarks
9:30 a.m. – 10:15 a.m.
Evidence and Process: The Science and Common Sense
of Setting Benchmarks
10:15 a.m. – 11:00 a.m.
Guidelines for and Examples of Benchmarking Work
11:00 a.m. – 11:30 a.m.
Break
11:30 a.m. – 1:15 p.m.
Applying Benchmarking and Target Setting
Working Lunch: Case Study Exercises
1:15 p.m. – 2:00 p.m.
Conclusion, Questions and Answers
Meet the Facilitators
Simon King is a research statistician with interests in survey methodology and
analysis. Mr. King has assisted with survey design and analysis on many of RTI’s
international education projects. Mr. King has developed data visualization,
analysis, and monitoring capabilities for international education projects using
geographical information systems (GIS). Mr. King oversees the data management
and analysis for all of RTI’s EdData II task orders and other projects involving
EGRA/EGMA data collection. Prior to working for RTI, Mr. King was involved in
K–12 education, notably as a principal of a charter school and as a volunteer
teacher working for the British Department for International Development (DFID)
in rural Zambia.
Joe DeStefano is the Director of the Policy, Systems, and Governance Program
in the International Education Division of RTI International. His 30 years of
experience includes a full range of K–12 education issues—from teaching and
learning, to teacher professional development, to school-community relations,
finance, and policy and system reform. He has provided technical assistance
and support to urban school districts in the United States and to ministries of
education throughout the developing world. Mr. DeStefano has also conducted
extensive research on the topics of early grade reading and math; school
effectiveness; community-based approaches to education; education reform;
teacher supply and demand; and education finance, governance and
management. He grew up in the Bronx, New York, and received an EdM from
the Harvard Graduate School of Education.
1
EdData II
Education Data for Decision Making
Supporting the Development of
Reading Performance
Benchmarks
October 1, 2015
Prepared by Joseph DeStefano and Simon King
RTI International, Research Triangle Park, North Carolina, USA
1
About the Presentation
• This presentation was prepared for the Early Grade Reading
Professional Development Series in Chevy Chase, Maryland, on
October 1, 2015. The webinar and workshop were organized by
RTI International and URC, LLC, for participants in the Global
Reading Network.
• The USAID EdData II project is led by RTI International.
“Measurement and Research Support to Education Strategy
Goal 1” is EdData II RTI Task Order Number 20, Activity 7, AIDOAA-12-BC-00003.
• The process for developing benchmarks is based on the
experience of the EdData II project, Task Order 20, supporting the
establishment of benchmarks in Egypt, Ghana, Jordan, Liberia,
Malawi, Pakistan, the Philippines, Tanzania, and Zambia.
2
Overview of Workshop Objectives
By the end of today’s workshop, participants will have:
 A sound understanding of what data and processes are
needed to work with a country to set benchmarks
 Guidelines for conducting benchmarking work
 Recommendations for conducting benchmarking based on
lessons learned and current best practice
Why is reading important in early grades?
This
is
how
most
grade 3 kids
in
Africa
read.
This
is
how
most
grade 3 kids
in
rich countries (OECD)
read.
Without learning to read well, students cannot easily read to learn.
UNDERSTANDING
BENCHMARKS
6
What are benchmarks?
A benchmark is a standard or point of reference
against which things may be compared or assessed.
• Benchmark errors in a factory:
 not more than 1 defect per
1,000 items
• Benchmark performance of athletes:
 11 seconds to run 100 meters
• Benchmark skills (e.g., reading) to
evaluate student progress:
 80% comprehension of text
7
Why create benchmarks for reading?
 Establish expectations or norms for reading performance (especially
in mother tongue languages). EGRA does not provide norms, but it
generates data you can use to define norms
 Use benchmarks to give specificity to the curriculum and create
clearer expectations
 Establish objectives against which to gauge progress – translate
ultimate goals into manageable measures of performance at specific
points
 Assist teachers, principals, school supervisors – enabling them to
target help where needed
 Create means to communicate publicly about improvement, e.g.:
 School report cards
 National-level monitoring and reporting
8
Important Distinctions and Definitions
• Goal is a long-term aspiration, maybe without numerical
value
All children will be independent readers by grade 3
• Metric is a valid, reliable unit of measurement
Correct words per minute (cwpm) reading connected text
• Benchmark is a milestone on the way to meeting that goal
(using the metric)
45 cwpm reading a passage of grade-level text
• Target is a variable using the benchmark
50% of students to meet benchmark in X years
9
PROCESS FOR SETTING
BENCHMARKS
10
Process for Setting Benchmarks: Science or Art?
• Both
• Common sense / science + statistics + wisdom
Curriculum objectives
Data on early grade
reading performance
On-the-ground knowledge
of what’s happening
Insights from
science
Experience in other
places
11
Setting Benchmarks: Reading Science
• What is fluency?
Fluency is the ability to read text accurately, quickly, and with
smoothness and expression (NICHD, 2000).
• Dimensions of fluency:
• Accuracy in word decoding
• Automaticity (automatic processing)
• Prosody (reading with smoothness,
phrasing, and expression)
12
The Importance of Reading Fluency
Fluency is the bridge that connects word decoding to
comprehension (Rasinski, 2004).
•
Fluency begins before students can read continuous text
•
Automaticity of letters, letters and sounds, segmentation of
phonemes, and decoding are initial steps
•
For pupils to attain fluency, their word recognition needs to
be accurate and needs to occur at a reasonable rate
13
Fluent vs. Non-Fluent Readers
Readers who are fluent:
Readers who have not yet
developed fluency:
 Recognize words
automatically
 Read slowly
 Group words quickly to help
them gain meaning from
what they read
 Have choppy phrasing
 Read aloud effortlessly and
with expression
 Sound natural as they read,
as if speaking
 May read word by word
 Focus their attention on
figuring out (decoding)
words
 Focus little attention on
comprehension
14
Insights from Cognitive Science
The message must pass fast through a very narrow opening
7 items in 12 seconds
Short-term memory
Long-term memory
15
Reading Skills Development and EGRA Subtasks
What being a good
reader requires
Reading text well enough
to understand it
Being able to read
familiar words
Some of what EGRA
measures
Reading comprehension
Oral reading fluency
Familiar-word fluency
Being able to decode
unfamiliar words
Nonword reading/decoding
Knowing letters and letter
sounds
Phonological/phonemic awareness―
Letter sounds, syllable reading, dictation
Knowing enough
language to be able to
understand things
Letter names
Listening comprehension
16
Insights from Cognitive Science
• A sentence of about 7 words read in
about 12 seconds gives roughly:
• One word per 1 – 1.5 second
• 40‒60 words per minute
• And must be done automatically, without
effort
17
Experience from Other Countries: Oral Reading Fluency
Measured actual levels at end of grade 1:
Germany 58 cwpm; Spain 43; Holland 38
Oral Reading Fluency
Benchmark
Reading Comprehension
(cwpm)
% Meeting
Benchmark
Benchmark
(% correct)
% Meeting
Benchmark
Tanzania Gr. 2, Kiswahili
50
12%
80%
8%
Philippines Gr. 2, Ilokano
40
29%
80%
35%
Malawi Gr. 3, Chichewa
50
6%
80%
6%
Egypt Gr. 3, Arabic
60
4%
80%
9%
18
Examples from USA
End-of-Year Benchmarks from Widely Used US Approach1
(benchmarks for medium-risk students; low-risk benchmarks are much higher)
Gr. 1
Letter or syllable
sound fluency
Gr. 2
Gr. 3
40–45
Often not measured later than
grade 1; assumed to be
mastered
40
50
Assumed mastered at approx.
50 in grade 2; not checked
later
30
80
(clspm)
Nonword fluency
(cnwpm)
Oral reading fluency
(ORF)
Notes
95
This keeps increasing for all
grades
1
Summarized/adapted from various sources, such as: University of Oregon Center on Teaching and Learning, DIBELS 6th edition
benchmark goals; AIMSweb® Growth Tables; and EasyCBM progress monitoring test results.
19
Setting Benchmarks:
Common Sense, Instinct, and Wisdom
• Draw on the experience of practitioners, coaches,
teachers, experts (their instinct as to what is possible)
• These practitioners, coaches, etc., should:
– Be familiar with the field and related data
– Have experience in coaching teachers using the metrics
(for example, using the metric “correct words per minute”)
Use instinct or wisdom
combined with analysis of data―
i.e., not just anyone with any opinion!
20
Summary of the Benchmarking Process
0%
50%
100%
0
50
100
0
50
100
0
50
100
Comprehension
(% correct)
Oral reading fluency
(cwpm)
Decoding
(cnwpm)
Letter or syllable sounds
(clspm)
21
QUICK REFRESHER ON
SCATTER PLOTS
22
Quick Refresher: How Do Scatter Plots Work?
• Each dot is a student (or school or any
“unit”)
• The dot tells you: For a given level of
fluency, what was the comprehension?
• The line tells you the overall trend,
determined by looking at the dots all
together and seeing how “most students”
trend
• If dots are close to the line, trend is
“stronger”; if they are further from the line
(more spread out), the trend is weaker
6
Comprehension Questions Answered
• Used to denote a “relationship” or
“association”
5
4
3
2
1
0
0
20
40
60
80
Oral Reading Fluency (Correct Words
Per Minute)
• Indicator: Correlation coefficient, or “r,”
which can range from -1 to 0 to 1
23
Interpreting a Scatter Plot
•
Dots below the line:
Comprehension is below
expectation given the fluency
What can we say about the level
of comprehension compared
with what we would expect for
dots above the line?
•
Dots above the line:
Comprehension is above
expectation given the fluency
6
Comprehension Questions Answered
What can we say about the level
of comprehension compared
with what we would expect for
dots below the line?
5
4
3
2
1
0
0
20
40
60
80
Oral Reading Fluency (Correct Words
Per Minute)
24
Scatter Plots and Weighted Data
•
The majority of EGRA survey designs use “weighted” data.
•
In other words, each pupil in the sample has an associated weight― i.e., the
number of pupils in the population that one pupil in the sample represents.
Question: Which one of the graphs below contains “weighted”
data?
Obviously!?
There’s actually no easy way to show weighted data in a scatter plot!
Tip: Use a bubble
plot, where size of
the bubble
indicates the pupil
level weight.
Histogram Borders
These graphs are statistically more accurate.
But, which scatter plot
best demonstrates the
associations and weights
for benchmarking
workshop participants?
27
ESTABLISHING A BENCHMARK
AND SETTING A TARGET
28
Step-by-Step Instructions
Step 1. Start with reading comprehension.
Step 2. Determine what you think is the appropriate level of comprehension
students should be achieving (100%, 80%, 60%, … of correct responses).
Step 3. Consider the present levels of average performance―for example:
national data, intervention data, means with and without zeroes.
Step 4. When you have agreed on a benchmark value for reading
comprehension (% correct), use the distribution table to see how many students
from your data set were meeting that benchmark level of performance.
Step 5. Discuss what you think is an appropriate near-term target for the
percentage of students who should be meeting the comprehension benchmark
in five years.
Step-by-Step Instructions – continued
Step 6. After completing the exercise for reading comprehension, move on
to oral reading fluency (ORF).
Step 7. Start by examining the relationship between oral reading fluency
and comprehension and identify the range of ORF scores that correspond to
the benchmark for comprehension that you chose.
Step 8. Decide where in that range the ORF benchmark should fall (in other
words, choose a specific value that is within the range).
Step 9. Use the cumulative distribution graph to see how many students
met the ORF benchmark.
Benchmarking Example: Ghana 2013, Grade 2 (English)
OBJECTIVES:
• Set a benchmark for ORF and reading comprehension.
• Find the percentage of pupils achieving this
benchmark.
• Project the target percentage of pupils achieving this
benchmark in 5 years.
USAID Partnership for Education: Ghana Testing, task order under Education Data for
Decision Making (EdData II), 2012–2016, baseline assessment
Box Plot for ORF by Reading Comprehension,
Ghana 2013, Grade 2 (English)
Table Corresponding to Box Plot for ORF by
Reading Comprehension, Ghana 2013, Grade 2
Oral reading fluency: Number summary and mean
Percent correct,
25th
50th
75th
Sample
reading
Maximum
Mean Minimum
percentile percentile percentile
count
comprehension
0%
5
0
0
3
6
92
7060
20%
29
2
14
25
41
97
339
40%
45
4
33
42
53
124
232
60%
55
3
41
53
67
120
148
80%
73
29
57
69
88
138
89
100%
80
33
63
74
101
138
47
Distribution of Reading Comprehension
Ghana 2013, Grade 2
Reading
comprehension
(% correct)
Percent
Count (no. of pupils)
Zero
89%
7,068
20%
4%
339
40%
3%
232
60%
2%
148
80%
1%
89
100%
1%
47
Levels of ORF Corresponding to 80% Reading Comprehension
Ghana 2013, Grade 2
25th percentile
50th percentile
75th percentile
Zero
1-<10
10-<20
20-<30
30-<40
40-<50
50-<60
60-<70
70-<80
80-<90
90-<100
100-<110
110-<120
120-<130
130-<140
98%
% (wt) Sample n
51%
4148
25%
1918
10%
777
4%
317
2%
202
2%
173
2%
147
1%
76
1%
50
1%
45
0%
26
0%
20
0%
9
0%
4
0%
3
98% of pupils scored
less than the
benchmark of 70
cwpm.
In 2013, 100%-98%=
2% of pupils met the
target of 70 cwpm.
Documenting Benchmarks
Reading fluency benchmark and percentages
of pupils meeting benchmark
Ghana 2013, Grade 2
Subtask
Grade 2
benchmark
% of pupils
presently
meeting the
suggested
benchmark
(2013)
Target % of
Target % of
pupils to meet
pupils to meet
the
the benchmark in benchmark in
2014
5 years
Reading
comprehension (%
correct)
80% correct
1%
3%
20%
Oral reading fluency
(cwpm)
70
2%
5%
20%
When more data are available:
• Setting the target of the percentage of
students meeting the benchmark can be
challenging when we lack data to work
with.
• We can use intervention data, or data
from multiple grades,to set better
targets.
Practice:
Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia)
OBJECTIVES:
• Set a benchmark for ORF.
• Find the percentage of pupils achieving
this benchmark.
• Project the target percentage of pupils
achieving this benchmark over the
next 5 years.
Prioritizing Reform, Innovation, and Opportunities for Reaching Indonesia's Teachers,
Administrators, and Students (PRIORITAS), 2012–2017, baseline assessment
Box Plot for ORF by Reading Comprehension,
Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia)
Table Corresponding to Box Plot for ORF by
Reading Comprehension, Indonesia 2013, Grade 3
Oral Reading Fluency: Number summary and mean
% Correct,
Reading
Mean Minimum
Comprehension
25th
percentile
50th
percentile
75th
percentile
Maximum
Sample
Count
0%
13.9
0
0
5
19
94
315
20%
41.4
1
21
35
58
127
288
40%
61.4
9
37
56
83
183
382
60%
76.3
12
57
74
94
166
771
80%
81.9
29
63
81
99
166
992
100%
89.3
33
71
92
105
174
823
Cumulative Percent ORF
Indonesia (PRIORITAS), 2013, Grade 3
Documenting Benchmarks
Reading fluency benchmark and percentages of
pupils meeting benchmark, Indonesia 2013, grade 3
Subtask
Oral reading
fluency (cwpm)
Suggested
benchmark
% of pupils
meeting the
benchmark
in 2013
Adding Data from an Intervention Research Study
• Find the percentage of pupils who achieved the
benchmark for the control and intervention schools.
• Find the difference between these two percent
values―This becomes your potential increase in
percentage of pupils achieving the proposed
benchmark.
• Use this difference to project the percentage of pupils
meeting the benchmark for the next 5 years.
Note: Be aware of how many years the intervention has been in
place. Measurement usually takes place after 1 or 2 years; the
potential growth is over that period of time (Indonesia PRIORITAS
was evaluated after 1 year)
Cumulative Percent, ORF
Indonesia (PRIORITAS), 2013, Grade 3
For example: using 85 cwpm
as a benchmark:
In 2013, 100-60% = 40% of
Intervention schools met the
benchmark, and
100%-66% = 34% of Control
schools met the benchmark of
85 wpm.
We could therefore expect a
40% - 34% = 6 percentage
point increase in the percent of
pupils meeting the benchmark.
Blue = Intervention
Schools
Red = Control Schools
Documenting Benchmarks
Benchmark and targets for ORF
Indonesia (PRIORITAS), Grade 3
Targets: Projected percentage of
pupils meeting benchmark
Subtask
Oral
Reading
Fluency
(cwpm)
Suggested
Benchmark
% of pupils
meeting the
benchmark in
2013
2014
2015
2016
2017
2018
Another Potentially Useful Benchmark:
Improvement at the Low End of the Distribution
In this case:
• The goal could be “reducing the percentage of
students who are struggling the most to develop
reading skills.”
• The objective would be the % of students scoring zero
that you would try to “move down to” from the present
level…
• …Or, what you consider an acceptable level of “zero
scores” for the grade (standard) and skill area under
consideration.
Example:
Ghana 2013, Grade 2 (English)
OBJECTIVES:
• Find the percentage of pupils scoring
zero (or similar) in 2013.
• Find the percentage of pupils achieving
this level.
• Project the target percentage of pupils
achieving this level over the next 5
years.
Cumulative Distribution for ORF
Ghana 2013, Grade 2
25th percentile
50th percentile
75th percentile
Zero
1-<10
10-<20
20-<30
30-<40
40-<50
50-<60
60-<70
70-<80
80-<90
90-<100
100-<110
110-<120
120-<130
130-<140
% (wt) Sample n
51%
4148
25%
1918
10%
777
4%
317
2%
202
2%
173
2%
147
1%
76
1%
50
1%
45
0%
26
0%
20
0%
9
0%
4
0%
3
Documenting Benchmarks
Reading fluency zero scores (or similar)
and percentage of pupils meeting level,
Ghana 2013, Grade 2
Subtask
Oral reading fluency
– zero scores
Grade 2 %
scoring zero
51%
Target % of
pupils scoring
zero in 2014
Target % of
pupils scoring
zero in 5 years
Practice:
Liberia 2013, Grade 2 (English)
OBJECTIVES:
• Find the percentage of pupils scoring zero
(or similar) in 2013.
•
Find the percentage of pupils achieving
this level.
• Project the target percentage of pupils
achieving this level over the next 5 years.
Liberia Teacher Training Program (LTTP), 2010–2015, midterm assessment
Cumulative Percent, ORF, Liberia 2013, Grade 2
Documenting Benchmarks
Reading fluency zero scores (or similar) and
percentage of pupils meeting level
Liberia 2013, Grade 2
Targets % of pupils scoring zero in 5 years
Subtask
Oral
reading
fluency –
zero
scores
Grade 2 %
scoring
zero
1 year
2 years 3 years 4 years 5 years
Intermediate Benchmarks
Adding an intermediate benchmark is
useful if you want to separate the pupils
who are not proficient and the pupils who
are nonreaders into two separate
categories or classifications.
Tools for Setting Intermediate Benchmarks
1. Graph for setting intermediate levels of
reading performance showing ranges of
oral reading scores organized by level of
reading comprehension score
2. Data—Cumulative distribution of
“percentages of students scoring at
different levels” of performance
3. Table to record your results
4. Worksheet to record the justifications for
your benchmarks
Defining Four Levels of Reading Proficiency
Fluently with
full
comprehension
With increasing
fluency
and
comprehension
Slowly with
limited
comprehension
Nonreader
In step 1, you just defined the
benchmark for reading fluently
with full comprehension.
In step 2, you will define these
other levels of reading ability.
Nonreaders are children
scoring zero on the oral reading
subtask.
Think of the Levels of Reading Proficiency Like a Scale
Example:
Nonreader
Slowly with
limited
comprehension
0
5
With increasing
fluency
and
comprehension
Fluently with full
comprehension
10 15 20 25 30 35 40 45 50 55
By setting the intermediate benchmark,
we are defining ranges for each level of performance.
Example:
Ghana 2013, Grade 2 (English)
OBJECTIVES:
• Set benchmark for ORF
• Set a nonreader benchmark for ORF (zero
or similar).
• Set an intermediate benchmark for ORF.
• Find the percentage of pupils achieving
these individual categories.
Box Plot for ORF by Reading Comprehension
Ghana 2013, Grade 2 (English)
We set the
benchmark at
70 cwpm
Box Plot for ORF by Reading Comprehension
Ghana 2013, Grade 2 (English)
Table Corresponding to Box Plots for ORF by
Reading Comprehension, Ghana 2013, Grade 2
Oral Reading Fluency: Number summary and mean
Percent correct,
Reading
Comprehension
Mean Minimum
25th
50th
75th
Sample
Maximum
percentile percentile percentile
Count
0%
5
0
0
3
6
92
7,060
20%
29
2
14
25
41
97
339
40%
45
4
33
42
53
124
232
60%
55
3
41
53
67
120
148
80%
73
29
57
69
88
138
89
100%
80
33
63
74
101
138
47
Cumulative Percent, ORF, Ghana 2013, Grade 2
50%
Nonreader
92% - 50%
= 42%
Slowly with
limited
comprehension
98% 92% = 6%
100% 98% = 2%
With increasing
fluency
and comprehension
Fluently with full
comprehension
Documenting Benchmarks
Benchmarks for Ghana 2013
Grade 2 (English)
Subtask
Oral
reading
fluency
Reading with
Reading fluently with increasing fluency
full comprehension and comprehension
Reading slowly with
limited
comprehension
Nonreader
% meeting
% meeting
% meeting
% meeting
Benchmark benchmark Benchmark benchmark Benchmark benchmark Benchmark benchmark
>=70
2%
40-69
6%
0-39
42%
0
50%
Practice:
Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia)
OBJECTIVES:
•
Set a benchmark for ORF.
•
Set a nonreader benchmark for ORF (zero or
similar).
•
Set an intermediate benchmark for ORF.
•
Find the percentage of pupils achieving these
individual categories.
•
Set targets for 2014.
Box Plot for ORF by Reading Comprehension
Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia)
Table Corresponding to Box Plot for ORF by Reading
Comprehension, Indonesia 2013, Grade 3
Oral reading fluency: Number summary and mean
% correct,
25th
50th
75th
Reading
Maximum Sample Count
Mean Minimum
percentile percentile percentile
Comprehension
0%
13.9
0
0
5
19
94
315
20%
41.4
1
21
35
58
127
288
40%
61.4
9
37
56
83
183
382
60%
76.3
12
57
74
94
166
771
80%
81.9
29
63
81
99
166
992
100%
89.3
33
71
92
105
174
823
Cumulative Percent, ORF
Indonesia 2013, Grade
Red = Control
Schools
Blue = Treatment
Schools
Documenting Benchmarks
Reading with
Subtask Reading fluently with increasing fluency and Reading slowly with
comprehension
comprehension
limited comprehension
Oral
Reading
Fluency
%
Bench
Target
meeting
mark
%
bench
Range
2014
mark
Bench
mark
Range
%
Target
meeting
%
bench
2014
mark
Bench
mark
Range
Nonreader
%
%
Target Bench
Target
meeting
Meeting
%
mark
%
bench
Bench
2014 Range
2014
mark
mark
Benchmarking Case Studies
Instructions:
1) In your packet of materials, find the case study your
group has been assigned.
2) Read the objectives for your case study.
3) Complete the “Desired Outcome” table using the tables
and graphs you have been provided.
References
National Institute of Child Health and Human Development (NICHD) [US]. (2000). Report of the
National Reading Panel. Teaching children to read: An evidence-based assessment of the
scientific research literature on reading and its implications for reading instruction: Reports of
the subgroups (NIH Publication No. 00-4754). Washington, DC: NICHD.
http://www.nichd.nih.gov/publications/pubs/nrp/documents/report.pdf
Rasinski, T. V. (2004). Assessing reading fluency. Prepared for the U.S. Department of Education
under the Regional Educational Laboratory program, Award No. ED01CO0014. Honolulu, HI:
Pacific Resources for Education and Learning. http://files.eric.ed.gov/fulltext/ED483166.pdf
USAID Partnership for Education: Ghana Testing, task order under Education Data for Decision
Making (EdData II), 2012–2016, baseline assessment
Prioritizing Reform, Innovation, and Opportunities for Reaching Indonesia's Teachers,
Administrators, and Students (PRIORITAS), 2012–2017, baseline assessment
69
Benchmarking Definitions and Distinctions
Term
Goal
Benchmark
Definition (as used in webinar)
Long-term objective. Example: The goal of all children being able to read
grade-level material with comprehension by the end of grade 3.
A milestone used to evaluate progress toward attaining the long-term
goal; a desired level of performance for students in a specific skill area.
Example: Decoding nonwords at a rate of 23 correct words per minute
(cwpm).
Performance
Levels
Another way to establish benchmarks, with different performance levels
corresponding to different specific levels of achievement in a skill area.
For example, a “proficient” reader may be a student who reaches an oral
reading fluency (ORF) level of 45 cwpm. An “emergent” reader may be a
student who has an ORF rate of at least 20 cwpm, and up to 45 cwpm. A
“struggling” reader may be a student who scores below 20 cwpm, and a
“nonreader” may be a student who scores zero.
Metric
A valid, reliable unit of measure. Benchmarks are expressed in terms of a
specific metric, such as number of correct words per minute a child can
read orally.
Target
The percentage of students who would be able to meet the benchmark
(or performance level) by a given period of time. For example, at
baseline, perhaps 10% of students meet the benchmark for ORF. In two
years’ time, the target may be double that percentage, or 20% of
students meeting the benchmark.
October 1, 2015
Benchmarking Webinar and Workshop
Page 1
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Objective:
Set benchmarks in the following skill areas:
• Reading comprehension
• ORF
• Non-Words
• Familiar Words
• Syllable Sounds
• Letter Names
• Listening Comprehension
Desired Outcomes- Grade 4 Benchmarks by Subtask:
Subtask
Benchmark
Reading Comprehension (% correct)
ORF (cwpm)
Non-Words (cnonwpm)
Familiar Words (cwpm)
Syllable Sounds (csspm)
Letter Names (clpm)
Listening Comprehension (% correct)
October 1, 2015
Benchmarking Webinar and Workshop
Page | 1
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Box Plots- ORF by Reading Comprehension:
Table- ORF by Reading Comprehension:
October 1, 2015
Benchmarking Webinar and Workshop
Page | 2
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Cumulative Percentiles by ORF:
Correlations between Subtasks:
Reading
Comp.
(% correct)
ORF
Non- Familiar Syllable Letter
Words Words Sounds Names
Reading
Comp.
(% correct)
ORF
1
0.867
1
Non-Words
0.805
0.908
1
Familiar
Words
Syllable
Sounds
Letter
Names
Listening
Comp.
(% Correct)
0.833
0.948
0.941
1
0.815
0.906
0.915
0.926
1
0.706
0.769
0.765
0.794
0.816
1
0.321
0.313
0.311
0.317
0.321
0.360
October 1, 2015
Benchmarking Webinar and Workshop
Listening
Comp.
(% Correct)
1
Page | 3
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Scatter Plot- Matrix of Subtasks:
October 1, 2015
Benchmarking Webinar and Workshop
Page | 4
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Scatter Plot- ORF vs. Non- Words:
Scatter Plot- ORF vs. Familiar Words:
October 1, 2015
Benchmarking Webinar and Workshop
Page | 5
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Scatter Plot- ORF vs. Letter Sounds:
Scatter Plot- ORF vs. Letter Names:
October 1, 2015
Benchmarking Webinar and Workshop
Page | 6
Case Study 1
Malawi, Grade 4, Chichewa (2012)
Scatter Plot- ORF vs. Listening Comprehension:
October 1, 2015
Benchmarking Webinar and Workshop
Page | 7
Case Study 1
Malawi, Grade 4, Chichewa (2012)
ORF Non-Words
Familiar
Words
Syllable
Sounds
Letter
Names
Listening
Comp.
# of pupils
in sample
Zero
0.7
1.3
4.3
9.8
2.0
534
1<5
3.4
5.1
10.8
19.5
2.2
81
5<10
8.1
10.6
19.5
23.8
2.9
113
10<15 10.3
15.3
23.4
29.3
2.6
81
15<20 12.8
20.6
33.9
32.7
2.7
123
20<25 15.8
25.2
38.6
34.2
2.7
182
25<30 19.4
29.6
44.8
41.6
3.0
198
30<35 26.8
35.9
51.3
40.9
2.9
110
October 1, 2015
Benchmarking Webinar and Workshop
Page | 8
Case Study 1
Malawi, Grade 4, Chichewa (2012)
35<40 25.9
38.5
58.2
51.8
3.0
154
40<45 26.7
40.8
57.5
52.3
3.1
113
45<50 31.9
48.9
65.9
57.6
3.3
32
50<55 44.5
50.8
75.6
50.0
3.6
21
55<60 33.5
47.3
68.0
55.8
3.2
35
60<65 37.2
53.1
86.2
65.9
4.6
16
65<70 32.0
47.7
61.0
59.6
1.4
2
70<75 33.9
62.9
69.0
39.6
3.3
2
75<80 41.4
72.2
73.1
40.0
1.4
2
80<85 45.5
70.0
85.0
63.5
4.5
2
85<90 68.0
110.8
46.0
29.0
1.0
1
October 1, 2015
Benchmarking Webinar and Workshop
Average
Fluency
ScoresSubtasks that
correspond to
given ranges
of ORF:
Page | 9
Case Study 2
Philippines, Grade 2, Ilokano (2014)
Objective:
 Set an ORF benchmark for grade 2 using the information provided.
 Set a second benchmark to create two intermediate levels.
 Use these benchmarks to set categories of performance levels and find
percentage of pupils in those learning categories for 2014 & 2015.
 Finally, create a target percentage of pupils in these categories for 2016.
Desired Outcomes- Percentage of Pupils by Learning Categories:
Category
Benchmarks for Each Category
Baseline
Targets
% of pupils meeting
this benchmark in
2014
% of pupils meeting
% of pupils
this benchmark in
meeting this
2015
benchmark in 2016
Non-reader ORF of zero
Emergent
ORF greater than zero, and less than ___.
reader
Basic reader ORF greater than ____, and less than ____.
Proficient
ORF greater than ____
Reader
October 1, 2015
Benchmarking Webinar and Workshop
Page 1
Case Study 2
Philippines, Grade 2, Ilokano (2014)
Box Plots- ORF by Reading Comprehension:
Table- ORF by Reading Comprehension:
# Correct
Mean
ORF
25th
percentile
50th
percentile
75th
percentile
Count
(achieving #
correct)
Zero
6.1
0
3
10
109
1
25.7
13
26
34
35
2
35.0
25
35
42
62
3
38.4
30
36
44
67
4
41.6
35
41
46
66
5
54.3
46
51
63
60
October 1, 2015
Benchmarking Webinar and Workshop
Page 2
Case Study 2
Philippines, Grade 2, Ilokano (2014)
Cumulative Percentile by ORF:
Red = 2014
Blue = 2015
October 1, 2015
Benchmarking Webinar and Workshop
Page 3
Case Study 3
Jordan, Grade 2, Arabic (2014)
Objective:
Set targets for the percentage of pupils who will meet the
reading comprehension, non-words, and ORF benchmarks in
1, 2, 3, 4, and 5 years using intervention data.
Desired Outcomes- Benchmarks and Targets by Subtasks:
Benchmark
% of pupils
meeting
benchmark in
2014
Targets: Projected % of pupils meeting
benchmark
2015
2016
2017
2018
2019
Reading Comp.
(% correct)
ORF (cwpm)
Non-Words (cnonwpm)
Percentage of Pupils Achieving Reading Comprehension (% correct):
TREATMENT 2012
Reading Comp.
(% Correct)
Percent
Count
(# pupils)
TREATMENT 2014
Count
(# pupils)
Percent
Zero
27%
237
19%
185
20%
23%
161
17%
138
40%
24%
175
18%
141
60%
12%
85
22%
150
80%
7%
54
13%
91
100%
6%
49
11%
101
October 1, 2015
Benchmarking Webinar and Workshop
Page 1
Case Study 3
Jordan, Grade 2, Arabic (2014)
Box Plots- ORF by Reading Comprehension:
Table- ORF by Reading Comprehension:
% Correct
25th
50th
75th
Reading
Mean
percentile percentile percentile
Comprehension
Sample
Count
0%
5.9
0
2
9
1163
20%
18.2
11
17
23
614
40%
23.7
17
22
27
552
60%
37.1
27
33
43
246
80%
46.3
37
45
56
114
100%
50.3
45
48
54
88
October 1, 2015
Benchmarking Webinar and Workshop
Page 2
Case Study 3
Jordan, Grade 2, Arabic (2014)
Cumulative Percentile by ORF for Intervention:
Red = 2012
Blue = 2014
Scatter Plot- ORF vs. Non- Words:
120
Oral Reading Fluency
100
80
60
40
20
0
-10
-20
0
10
20
30
40
50
Correct Non-words per Minute
October 1, 2015
Benchmarking Webinar and Workshop
60
Page 3
Case Study 3
Jordan, Grade 2, Arabic (2014)
Cumulative Distribution of ORF vs. Non-words for Intervention Schools:
Red = 2012 Intervention Schools
Blue = 2014 Intervention Schools
October 1, 2015
Benchmarking Webinar and Workshop
Page 4
Case Study 4
Malawi, Grade 3, Chichewa (2012)
Objective:
 Use the results from grade 2 and grade 4 reading
comprehension and ORF.
 Set Reading Comprehension and ORF benchmarks and
targets in grade 3 for the next 1, 2, 3, 4, and 5 years.
Desired Outcomes- Benchmarks and Targets by Subtasks:
Benchmark
% of pupils
meeting
benchmark in
2012
Targets: Projected % of pupils meeting
benchmark
2013
2014
2015
2016
2017
Reading Comprehension
(% correct)
ORF
Percentage of Pupils Achieving Reading Comprehension (% correct):
Reading Comp.
(% Correct)
Grade 2
Count (#
Percent
pupils)
Grade 4
Count
Percent
(# pupils)
Zero
94.4%
2926
50.5%
712
20%
3.4%
187
13.7%
267
40%
1.6%
143
17.3%
355
60%
0.5%
79
12.4%
309
80%
0.1%
23
5.4%
170
100%
0.0%
2
0.7%
26
October 1, 2015
Benchmarking Webinar and Workshop
Page 1
Case Study 4
Malawi, Grade 3, Chichewa (2012)
Box Plots- Reading Comprehension and Grade:
Tables- ORF by Reading Comprehension and Grade:
October 1, 2015
Benchmarking Webinar and Workshop
Page 2
Case Study 4
Malawi, Grade 3, Chichewa (2012)
Cumulative Percentile of ORF for Grade 2:
Red = 2012 Intervention Schools
Blue = 2014 Intervention Schools
October 1, 2015
Benchmarking Webinar and Workshop
Page 3
Case Study 4
Malawi, Grade 3, Chichewa (2012)
Cumulative Percentile of ORF for Grade 4:
Red = 2010
Green = 2011
Blue = 2012
October 1, 2015
Benchmarking Webinar and Workshop
Page 4
Some Basics for Leading Benchmarking Work Using Data
from the Early Grade Reading Assessment
1. The Process
Benchmarking should rely on actual data on student performance in specific reading skill areas. The
underlying relationships between the reading skill areas—in terms of both the research on how students
learn to read in alphabetic languages, and the statistical relationships that have consistently been
demonstrated across scores of EGRA applications—are what make it possible to use EGRA data to set
benchmarks.
Step 1:Begin by discussing the level of reading
comprehension that is acceptable as demonstrating full
understanding of a given text. Most countries have
settled on 80% or higher (4 or more correct responses
out of 5 questions) as the desirable level of
comprehension.
Step 2: Given a reading comprehension benchmark,
EGRA data are used to show the range of oral reading
fluency (ORF) scores—measured in correct words per
minute (cwpm)—obtained by students able to achieve
the desired level of comprehension. Discussion then is
needed to determine the value within that range that
should be put forward as the benchmark. Alternatively,
a range can indicate the levels of skill development that
are acceptable as “proficient” or meeting a grade-level
standard (for example, 40 to 50 cwpm).
Comprehension
20%
40%
60%
80%
100%
Oral reading fluency
35 to 50
45 cwpm
Decoding
30 cnonwpm
Syllable reading
50 csspm
Step 3: With an ORF benchmark defined, the
relationship between ORF and decoding (nonword reading) makes it possible to identify the average
rate of nonword reading that corresponds to the given level of ORF.
Step 4: The process then proceeds in the same manner for each subsequent skill area.
Some tips regarding this process:




A minimum, yet still adequate, approach to benchmarking would include two skill areas: reading
comprehension and oral reading fluency.
Going beyond those two to develop benchmarks for other skill areas can be useful, especially in
countries where all of the EGRA-measured skills are poorly developed (so that progress can be
detected in students’ development of more basic skills).
Syllable reading (especially when syllables are important components of words, such as in Bantu
languages) is a good skill area to include.
If syllable reading was not tested, letter sound recognition, not letter naming, should be used.
An exception would be in a language like Bahasa Indonesia which is totally transparent, and
therefore in which letter names and sounds are essentially the same.
Benchmarking Webinar and Workshop
October 1, 2015
Page 1
2. The Data
A good benchmarking exercise is quite data intensive. In fact, one of the added benefits of doing this
exercise in a country is that the participants get to engage the EGRA results in a much deeper way than
they normally would, leading to a richer understanding of what is happening in the country in terms of
skill development.
The data needed to do benchmarking include:

A table (like the one to the right) showing the range of
reading fluency scores obtained by students achieving each
level of reading comprehension. This makes it possible for
participants to complete steps 1 and 2 in the benchmarking
process.
A graphic way to depict this same information is a set of
“box and whisker” plots showing the distribution of ORF
scores for each level or reading comprehension.

A table that shows the average scores on each other
subtask that correspond to different levels of oral
reading fluency (as shown here) is what enables
participants to connect the ORF benchmark to desirable
levels of performance in other skill areas.
A graphic way to show this same information is to use a
scatter plot (below)—for example, of ORF x nonword
decoding, with the best fit line drawn in so that
workshop participants can match a given level of ORF to
the average corresponding level of nonword decoding.
Benchmarking Webinar and Workshop
October 1, 2015
Page 2

For determining the percentage of students meeting the benchmark (in the year for which the
EGRA data are available), a cumulative distribution graph or table makes it possible for
participants to “look up” the percentage of students, for example, achieving 45 cwpm or higher.
3. Performance Levels
Some countries are interested in establishing performance points that capture stages of skill development that are below the desired level of achievement defined by the benchmark. For example, the
benchmark for reading fluency may be defined as 50 cwpm, representing students who are reading
fluently and with full (or almost full) comprehension. Students who score zero are those who are not
reading. In between zero and 50 cwpm exist different levels of reading ability that in fact may
correspond to stages of literacy acquisition. Setting multiple performance levels makes it possible to
determine what percentages of children are at each of those stages of development of their reading skill.
Nonreader
Fluently with full
comprehension
Slowly with limited
comprehension
0
With increasing fluency
and comprehension
20
50
As illustrated above, it is possible to create two other performance levels below the benchmark for
reading fluently with full comprehension set at 50 cwpm. Data describing how reading fluency and
comprehension scores are distributed (e.g., using a two-way distribution table) inform where to place
another level of reading achievement in between zero and 50 cwpm. Thus, two other performance
levels are created: students who score above zero and up to 20 cwpm are said to be reading slowly with
limited comprehension, and those scoring above 20 and up to 50 cwpm can be said to be reading with
increasing fluency and comprehension. The performance levels in this example are from the
benchmarking work done in Ethiopia in early 2015. Such intermediate performance levels in other
contexts could, of course, be given other labels.
An alternative approach to setting performance levels (as was the case in Pakistan) would be to establish
a range of ORF scores that are defined as “meeting expectations”: 60 to 90 cwpm. Students scoring
above 90 cwpm would be considered to be “exceeding expectations.” Those scoring below 60 would be
“not meeting expectations.”
4. Moving Beyond Benchmarks to Targets
One of the main purposes of setting benchmarks is to establish the means to evaluate and measure
progress in improving reading outcomes. In fact, one of the more interesting challenges in working with
ministry colleagues to set benchmarks arises during discussions of the prospects for future improvement
in student performance relative to those benchmarks. To set targets for future improvement,
benchmarks can be used in the following way.
Benchmarking Webinar and Workshop
October 1, 2015
Page 3
Once a benchmark has been set, say for oral reading fluency, it is useful to employ the existing data to
determine the percentage of students presently meeting that benchmark. The challenge arises when
assumptions have to be made about how
ORF Benchmark = 45
things will improve—that is, to estimate
cwpm
the percentages of students who will
meet the benchmark in future years (as
Which line describes the
illustrated here).
path of improvement?
What % of students will
meet the benchmark in
Yr 1, Yr 2, …?
10%
If data are available from a reading
intervention in the country, then the
amount of improvement achieved by that
program provides a useful starting point
for estimating future targets.
If data from an intervention are not
available, but EGRA results from more
than one year are, then the prevailing pattern of change over time can be used to begin discussing how
that pattern may evolve in future years.
Baseline
Yr 1
Yr 2
Yr 3
If only one year of reading results is available, then the task is less data-driven and more a dialogue about
how much improvement can be expected. Data from other countries’ programs that have had
demonstrated impact could inform that dialogue. Additionally, if EGRA data from a given year are
available for two successive grades (say, grades 1 and 2), then the “intergrade” difference is a good
means for estimating how much improvement to expect.
The intergrade difference represents the amount of progress students make given an additional year in
school (under preexisting conditions). For example, a successful intervention could aim to improve
performance in grade 1 by as much as the intergrade difference between grades 1 and 2; or put
differently, to increase student performance by as much as an additional year of schooling.
The value of setting targets is that if performance is initially low—say, very few students meeting the
benchmark—there often emerges a tendency to want to lower the benchmark (so performance does
not look as bad). It is better to have a benchmark that is genuinely meaningful in terms of the skill level
achieved (e.g., oral reading that is fluent enough to enable students to comprehend what they are
reading). Therefore, instead of lowering the benchmark, a compromise is to have modest targets for the
percentage of children expected to meet the benchmark moving forward. Examples of benchmarks and
targets from Jordan are shown in the table below.
Oral Reading Fluency
Nonword Decoding
46 cwpm
23 cnonwpm
2014 actual
7.5%
5.3%
5-year target
35%
31%
Benchmark
% of students meeting
the benchmark
Participants estimated that the percentage of students meeting the benchmarks for these two skill areas
would increase from 7.5% to 35% and from 5.3% to 31% over the course of the next five years.
Benchmarking Webinar and Workshop
October 1, 2015
Page 4
5. Some Things to Remember
Having facilitated benchmarking exercises in nine countries, the Education Data for Decision Making
(EdData II) project team has learned some useful lessons, which are summarized here.

Supply the data. The process requires a fair amount of data. Preparing the right data tools—
graphs, tables, forms to be filled out—and carefully labeling those tools to correspond to the
different steps in the process help greatly with facilitating the running of a benchmarking
workshop.

Match the data to the task. A balance needs to be struck between too much and too little
data. When a lot of data are available (from more than one year of EGRA, for multiple grades,
for an intervention as well as from national surveys), be sure to have participants working only
with the sets of data that correspond to the task at hand. Do not dump everything on them at
once.

Work across grade levels. For working with more than one grade—e.g., for grades 1
through 3—it is best to work in each skill area across grades. For example, when setting a
benchmark for ORF, set it for the highest grade for which data are available and then work to
set the benchmarks for the other two grades based on that. Then move on to do the same in
another skill area.

Have multiple small groups work simultaneously. It is useful to have more than one
group working in parallel with the data to set a benchmark. When groups arrive at different
suggested benchmarks, the facilitated dialogue that ensues is usually quite fruitful. And that
dialogue illustrates that even when everyone is using data, there is room for interpretation and
negotiation about what constitutes a reasonable benchmark for a given country and language.

Encourage discussion. Similarly, the discussion, and often debate, about what targets should
be set for future improvement brings to the surface everyone’s assumptions about how the
system is going to improve over time. For example, when looking at the results of a pilot
interventions in Malawi and Liberia as the bases for determining future targets, participants had a
lively discussion about whether one could assume that the conditions created in a pilot (which
led to the results) could be expected to be implemented on a national scale (and what it would
take for the ministry and its partners to achieve that).

Limit the number of benchmarks. There is often a tendency to want to set benchmarks for
every skill area. Limiting the number of skill areas to no more than four is highly recommended:
reading comprehension, oral reading fluency, and two others (nonword decoding and syllable
reading or letter sound identification).

Consider how to institutionalize the decisions. It is necessary to engage participants in
determining how benchmarks they develop could become official. Even if the benchmarks are
not made official, they should be used to summarize reading performance the next time early
grade reading is assessed. This was the case most recently in the Philippines where, even though
the benchmarks were not officially adopted, comparison of the percentages of students meeting
benchmarks in 2014 and 2015 helped the Department of Education evaluate the extent to which
progress was being made.
Benchmarking Webinar and Workshop
October 1, 2015
Page 5