Setting Benchmarks for Early Grade Reading October 1, 2015 Agenda 8:30 a.m. – 9:00 a.m. Registration and Coffee 9:00 a.m. – 9:10 a.m. Welcome Barbara N. Turner, URC, President Penelope Bender, USAID, Goal 1 Lead/ Senior Education Advisor Facilitators Simon King, RTI, Research Statistician Joe DeStefano, RTI, Director of the Policy, Systems, and Governance Program 9:10 a.m. – 9:30 a.m. Introduction: Importance of Reading in the Early Grades Defining and Discussing Benchmarks 9:30 a.m. – 10:15 a.m. Evidence and Process: The Science and Common Sense of Setting Benchmarks 10:15 a.m. – 11:00 a.m. Guidelines for and Examples of Benchmarking Work 11:00 a.m. – 11:30 a.m. Break 11:30 a.m. – 1:15 p.m. Applying Benchmarking and Target Setting Working Lunch: Case Study Exercises 1:15 p.m. – 2:00 p.m. Conclusion, Questions and Answers Meet the Facilitators Simon King is a research statistician with interests in survey methodology and analysis. Mr. King has assisted with survey design and analysis on many of RTI’s international education projects. Mr. King has developed data visualization, analysis, and monitoring capabilities for international education projects using geographical information systems (GIS). Mr. King oversees the data management and analysis for all of RTI’s EdData II task orders and other projects involving EGRA/EGMA data collection. Prior to working for RTI, Mr. King was involved in K–12 education, notably as a principal of a charter school and as a volunteer teacher working for the British Department for International Development (DFID) in rural Zambia. Joe DeStefano is the Director of the Policy, Systems, and Governance Program in the International Education Division of RTI International. His 30 years of experience includes a full range of K–12 education issues—from teaching and learning, to teacher professional development, to school-community relations, finance, and policy and system reform. He has provided technical assistance and support to urban school districts in the United States and to ministries of education throughout the developing world. Mr. DeStefano has also conducted extensive research on the topics of early grade reading and math; school effectiveness; community-based approaches to education; education reform; teacher supply and demand; and education finance, governance and management. He grew up in the Bronx, New York, and received an EdM from the Harvard Graduate School of Education. 1 EdData II Education Data for Decision Making Supporting the Development of Reading Performance Benchmarks October 1, 2015 Prepared by Joseph DeStefano and Simon King RTI International, Research Triangle Park, North Carolina, USA 1 About the Presentation • This presentation was prepared for the Early Grade Reading Professional Development Series in Chevy Chase, Maryland, on October 1, 2015. The webinar and workshop were organized by RTI International and URC, LLC, for participants in the Global Reading Network. • The USAID EdData II project is led by RTI International. “Measurement and Research Support to Education Strategy Goal 1” is EdData II RTI Task Order Number 20, Activity 7, AIDOAA-12-BC-00003. • The process for developing benchmarks is based on the experience of the EdData II project, Task Order 20, supporting the establishment of benchmarks in Egypt, Ghana, Jordan, Liberia, Malawi, Pakistan, the Philippines, Tanzania, and Zambia. 2 Overview of Workshop Objectives By the end of today’s workshop, participants will have: A sound understanding of what data and processes are needed to work with a country to set benchmarks Guidelines for conducting benchmarking work Recommendations for conducting benchmarking based on lessons learned and current best practice Why is reading important in early grades? This is how most grade 3 kids in Africa read. This is how most grade 3 kids in rich countries (OECD) read. Without learning to read well, students cannot easily read to learn. UNDERSTANDING BENCHMARKS 6 What are benchmarks? A benchmark is a standard or point of reference against which things may be compared or assessed. • Benchmark errors in a factory: not more than 1 defect per 1,000 items • Benchmark performance of athletes: 11 seconds to run 100 meters • Benchmark skills (e.g., reading) to evaluate student progress: 80% comprehension of text 7 Why create benchmarks for reading? Establish expectations or norms for reading performance (especially in mother tongue languages). EGRA does not provide norms, but it generates data you can use to define norms Use benchmarks to give specificity to the curriculum and create clearer expectations Establish objectives against which to gauge progress – translate ultimate goals into manageable measures of performance at specific points Assist teachers, principals, school supervisors – enabling them to target help where needed Create means to communicate publicly about improvement, e.g.: School report cards National-level monitoring and reporting 8 Important Distinctions and Definitions • Goal is a long-term aspiration, maybe without numerical value All children will be independent readers by grade 3 • Metric is a valid, reliable unit of measurement Correct words per minute (cwpm) reading connected text • Benchmark is a milestone on the way to meeting that goal (using the metric) 45 cwpm reading a passage of grade-level text • Target is a variable using the benchmark 50% of students to meet benchmark in X years 9 PROCESS FOR SETTING BENCHMARKS 10 Process for Setting Benchmarks: Science or Art? • Both • Common sense / science + statistics + wisdom Curriculum objectives Data on early grade reading performance On-the-ground knowledge of what’s happening Insights from science Experience in other places 11 Setting Benchmarks: Reading Science • What is fluency? Fluency is the ability to read text accurately, quickly, and with smoothness and expression (NICHD, 2000). • Dimensions of fluency: • Accuracy in word decoding • Automaticity (automatic processing) • Prosody (reading with smoothness, phrasing, and expression) 12 The Importance of Reading Fluency Fluency is the bridge that connects word decoding to comprehension (Rasinski, 2004). • Fluency begins before students can read continuous text • Automaticity of letters, letters and sounds, segmentation of phonemes, and decoding are initial steps • For pupils to attain fluency, their word recognition needs to be accurate and needs to occur at a reasonable rate 13 Fluent vs. Non-Fluent Readers Readers who are fluent: Readers who have not yet developed fluency: Recognize words automatically Read slowly Group words quickly to help them gain meaning from what they read Have choppy phrasing Read aloud effortlessly and with expression Sound natural as they read, as if speaking May read word by word Focus their attention on figuring out (decoding) words Focus little attention on comprehension 14 Insights from Cognitive Science The message must pass fast through a very narrow opening 7 items in 12 seconds Short-term memory Long-term memory 15 Reading Skills Development and EGRA Subtasks What being a good reader requires Reading text well enough to understand it Being able to read familiar words Some of what EGRA measures Reading comprehension Oral reading fluency Familiar-word fluency Being able to decode unfamiliar words Nonword reading/decoding Knowing letters and letter sounds Phonological/phonemic awareness― Letter sounds, syllable reading, dictation Knowing enough language to be able to understand things Letter names Listening comprehension 16 Insights from Cognitive Science • A sentence of about 7 words read in about 12 seconds gives roughly: • One word per 1 – 1.5 second • 40‒60 words per minute • And must be done automatically, without effort 17 Experience from Other Countries: Oral Reading Fluency Measured actual levels at end of grade 1: Germany 58 cwpm; Spain 43; Holland 38 Oral Reading Fluency Benchmark Reading Comprehension (cwpm) % Meeting Benchmark Benchmark (% correct) % Meeting Benchmark Tanzania Gr. 2, Kiswahili 50 12% 80% 8% Philippines Gr. 2, Ilokano 40 29% 80% 35% Malawi Gr. 3, Chichewa 50 6% 80% 6% Egypt Gr. 3, Arabic 60 4% 80% 9% 18 Examples from USA End-of-Year Benchmarks from Widely Used US Approach1 (benchmarks for medium-risk students; low-risk benchmarks are much higher) Gr. 1 Letter or syllable sound fluency Gr. 2 Gr. 3 40–45 Often not measured later than grade 1; assumed to be mastered 40 50 Assumed mastered at approx. 50 in grade 2; not checked later 30 80 (clspm) Nonword fluency (cnwpm) Oral reading fluency (ORF) Notes 95 This keeps increasing for all grades 1 Summarized/adapted from various sources, such as: University of Oregon Center on Teaching and Learning, DIBELS 6th edition benchmark goals; AIMSweb® Growth Tables; and EasyCBM progress monitoring test results. 19 Setting Benchmarks: Common Sense, Instinct, and Wisdom • Draw on the experience of practitioners, coaches, teachers, experts (their instinct as to what is possible) • These practitioners, coaches, etc., should: – Be familiar with the field and related data – Have experience in coaching teachers using the metrics (for example, using the metric “correct words per minute”) Use instinct or wisdom combined with analysis of data― i.e., not just anyone with any opinion! 20 Summary of the Benchmarking Process 0% 50% 100% 0 50 100 0 50 100 0 50 100 Comprehension (% correct) Oral reading fluency (cwpm) Decoding (cnwpm) Letter or syllable sounds (clspm) 21 QUICK REFRESHER ON SCATTER PLOTS 22 Quick Refresher: How Do Scatter Plots Work? • Each dot is a student (or school or any “unit”) • The dot tells you: For a given level of fluency, what was the comprehension? • The line tells you the overall trend, determined by looking at the dots all together and seeing how “most students” trend • If dots are close to the line, trend is “stronger”; if they are further from the line (more spread out), the trend is weaker 6 Comprehension Questions Answered • Used to denote a “relationship” or “association” 5 4 3 2 1 0 0 20 40 60 80 Oral Reading Fluency (Correct Words Per Minute) • Indicator: Correlation coefficient, or “r,” which can range from -1 to 0 to 1 23 Interpreting a Scatter Plot • Dots below the line: Comprehension is below expectation given the fluency What can we say about the level of comprehension compared with what we would expect for dots above the line? • Dots above the line: Comprehension is above expectation given the fluency 6 Comprehension Questions Answered What can we say about the level of comprehension compared with what we would expect for dots below the line? 5 4 3 2 1 0 0 20 40 60 80 Oral Reading Fluency (Correct Words Per Minute) 24 Scatter Plots and Weighted Data • The majority of EGRA survey designs use “weighted” data. • In other words, each pupil in the sample has an associated weight― i.e., the number of pupils in the population that one pupil in the sample represents. Question: Which one of the graphs below contains “weighted” data? Obviously!? There’s actually no easy way to show weighted data in a scatter plot! Tip: Use a bubble plot, where size of the bubble indicates the pupil level weight. Histogram Borders These graphs are statistically more accurate. But, which scatter plot best demonstrates the associations and weights for benchmarking workshop participants? 27 ESTABLISHING A BENCHMARK AND SETTING A TARGET 28 Step-by-Step Instructions Step 1. Start with reading comprehension. Step 2. Determine what you think is the appropriate level of comprehension students should be achieving (100%, 80%, 60%, … of correct responses). Step 3. Consider the present levels of average performance―for example: national data, intervention data, means with and without zeroes. Step 4. When you have agreed on a benchmark value for reading comprehension (% correct), use the distribution table to see how many students from your data set were meeting that benchmark level of performance. Step 5. Discuss what you think is an appropriate near-term target for the percentage of students who should be meeting the comprehension benchmark in five years. Step-by-Step Instructions – continued Step 6. After completing the exercise for reading comprehension, move on to oral reading fluency (ORF). Step 7. Start by examining the relationship between oral reading fluency and comprehension and identify the range of ORF scores that correspond to the benchmark for comprehension that you chose. Step 8. Decide where in that range the ORF benchmark should fall (in other words, choose a specific value that is within the range). Step 9. Use the cumulative distribution graph to see how many students met the ORF benchmark. Benchmarking Example: Ghana 2013, Grade 2 (English) OBJECTIVES: • Set a benchmark for ORF and reading comprehension. • Find the percentage of pupils achieving this benchmark. • Project the target percentage of pupils achieving this benchmark in 5 years. USAID Partnership for Education: Ghana Testing, task order under Education Data for Decision Making (EdData II), 2012–2016, baseline assessment Box Plot for ORF by Reading Comprehension, Ghana 2013, Grade 2 (English) Table Corresponding to Box Plot for ORF by Reading Comprehension, Ghana 2013, Grade 2 Oral reading fluency: Number summary and mean Percent correct, 25th 50th 75th Sample reading Maximum Mean Minimum percentile percentile percentile count comprehension 0% 5 0 0 3 6 92 7060 20% 29 2 14 25 41 97 339 40% 45 4 33 42 53 124 232 60% 55 3 41 53 67 120 148 80% 73 29 57 69 88 138 89 100% 80 33 63 74 101 138 47 Distribution of Reading Comprehension Ghana 2013, Grade 2 Reading comprehension (% correct) Percent Count (no. of pupils) Zero 89% 7,068 20% 4% 339 40% 3% 232 60% 2% 148 80% 1% 89 100% 1% 47 Levels of ORF Corresponding to 80% Reading Comprehension Ghana 2013, Grade 2 25th percentile 50th percentile 75th percentile Zero 1-<10 10-<20 20-<30 30-<40 40-<50 50-<60 60-<70 70-<80 80-<90 90-<100 100-<110 110-<120 120-<130 130-<140 98% % (wt) Sample n 51% 4148 25% 1918 10% 777 4% 317 2% 202 2% 173 2% 147 1% 76 1% 50 1% 45 0% 26 0% 20 0% 9 0% 4 0% 3 98% of pupils scored less than the benchmark of 70 cwpm. In 2013, 100%-98%= 2% of pupils met the target of 70 cwpm. Documenting Benchmarks Reading fluency benchmark and percentages of pupils meeting benchmark Ghana 2013, Grade 2 Subtask Grade 2 benchmark % of pupils presently meeting the suggested benchmark (2013) Target % of Target % of pupils to meet pupils to meet the the benchmark in benchmark in 2014 5 years Reading comprehension (% correct) 80% correct 1% 3% 20% Oral reading fluency (cwpm) 70 2% 5% 20% When more data are available: • Setting the target of the percentage of students meeting the benchmark can be challenging when we lack data to work with. • We can use intervention data, or data from multiple grades,to set better targets. Practice: Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia) OBJECTIVES: • Set a benchmark for ORF. • Find the percentage of pupils achieving this benchmark. • Project the target percentage of pupils achieving this benchmark over the next 5 years. Prioritizing Reform, Innovation, and Opportunities for Reaching Indonesia's Teachers, Administrators, and Students (PRIORITAS), 2012–2017, baseline assessment Box Plot for ORF by Reading Comprehension, Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia) Table Corresponding to Box Plot for ORF by Reading Comprehension, Indonesia 2013, Grade 3 Oral Reading Fluency: Number summary and mean % Correct, Reading Mean Minimum Comprehension 25th percentile 50th percentile 75th percentile Maximum Sample Count 0% 13.9 0 0 5 19 94 315 20% 41.4 1 21 35 58 127 288 40% 61.4 9 37 56 83 183 382 60% 76.3 12 57 74 94 166 771 80% 81.9 29 63 81 99 166 992 100% 89.3 33 71 92 105 174 823 Cumulative Percent ORF Indonesia (PRIORITAS), 2013, Grade 3 Documenting Benchmarks Reading fluency benchmark and percentages of pupils meeting benchmark, Indonesia 2013, grade 3 Subtask Oral reading fluency (cwpm) Suggested benchmark % of pupils meeting the benchmark in 2013 Adding Data from an Intervention Research Study • Find the percentage of pupils who achieved the benchmark for the control and intervention schools. • Find the difference between these two percent values―This becomes your potential increase in percentage of pupils achieving the proposed benchmark. • Use this difference to project the percentage of pupils meeting the benchmark for the next 5 years. Note: Be aware of how many years the intervention has been in place. Measurement usually takes place after 1 or 2 years; the potential growth is over that period of time (Indonesia PRIORITAS was evaluated after 1 year) Cumulative Percent, ORF Indonesia (PRIORITAS), 2013, Grade 3 For example: using 85 cwpm as a benchmark: In 2013, 100-60% = 40% of Intervention schools met the benchmark, and 100%-66% = 34% of Control schools met the benchmark of 85 wpm. We could therefore expect a 40% - 34% = 6 percentage point increase in the percent of pupils meeting the benchmark. Blue = Intervention Schools Red = Control Schools Documenting Benchmarks Benchmark and targets for ORF Indonesia (PRIORITAS), Grade 3 Targets: Projected percentage of pupils meeting benchmark Subtask Oral Reading Fluency (cwpm) Suggested Benchmark % of pupils meeting the benchmark in 2013 2014 2015 2016 2017 2018 Another Potentially Useful Benchmark: Improvement at the Low End of the Distribution In this case: • The goal could be “reducing the percentage of students who are struggling the most to develop reading skills.” • The objective would be the % of students scoring zero that you would try to “move down to” from the present level… • …Or, what you consider an acceptable level of “zero scores” for the grade (standard) and skill area under consideration. Example: Ghana 2013, Grade 2 (English) OBJECTIVES: • Find the percentage of pupils scoring zero (or similar) in 2013. • Find the percentage of pupils achieving this level. • Project the target percentage of pupils achieving this level over the next 5 years. Cumulative Distribution for ORF Ghana 2013, Grade 2 25th percentile 50th percentile 75th percentile Zero 1-<10 10-<20 20-<30 30-<40 40-<50 50-<60 60-<70 70-<80 80-<90 90-<100 100-<110 110-<120 120-<130 130-<140 % (wt) Sample n 51% 4148 25% 1918 10% 777 4% 317 2% 202 2% 173 2% 147 1% 76 1% 50 1% 45 0% 26 0% 20 0% 9 0% 4 0% 3 Documenting Benchmarks Reading fluency zero scores (or similar) and percentage of pupils meeting level, Ghana 2013, Grade 2 Subtask Oral reading fluency – zero scores Grade 2 % scoring zero 51% Target % of pupils scoring zero in 2014 Target % of pupils scoring zero in 5 years Practice: Liberia 2013, Grade 2 (English) OBJECTIVES: • Find the percentage of pupils scoring zero (or similar) in 2013. • Find the percentage of pupils achieving this level. • Project the target percentage of pupils achieving this level over the next 5 years. Liberia Teacher Training Program (LTTP), 2010–2015, midterm assessment Cumulative Percent, ORF, Liberia 2013, Grade 2 Documenting Benchmarks Reading fluency zero scores (or similar) and percentage of pupils meeting level Liberia 2013, Grade 2 Targets % of pupils scoring zero in 5 years Subtask Oral reading fluency – zero scores Grade 2 % scoring zero 1 year 2 years 3 years 4 years 5 years Intermediate Benchmarks Adding an intermediate benchmark is useful if you want to separate the pupils who are not proficient and the pupils who are nonreaders into two separate categories or classifications. Tools for Setting Intermediate Benchmarks 1. Graph for setting intermediate levels of reading performance showing ranges of oral reading scores organized by level of reading comprehension score 2. Data—Cumulative distribution of “percentages of students scoring at different levels” of performance 3. Table to record your results 4. Worksheet to record the justifications for your benchmarks Defining Four Levels of Reading Proficiency Fluently with full comprehension With increasing fluency and comprehension Slowly with limited comprehension Nonreader In step 1, you just defined the benchmark for reading fluently with full comprehension. In step 2, you will define these other levels of reading ability. Nonreaders are children scoring zero on the oral reading subtask. Think of the Levels of Reading Proficiency Like a Scale Example: Nonreader Slowly with limited comprehension 0 5 With increasing fluency and comprehension Fluently with full comprehension 10 15 20 25 30 35 40 45 50 55 By setting the intermediate benchmark, we are defining ranges for each level of performance. Example: Ghana 2013, Grade 2 (English) OBJECTIVES: • Set benchmark for ORF • Set a nonreader benchmark for ORF (zero or similar). • Set an intermediate benchmark for ORF. • Find the percentage of pupils achieving these individual categories. Box Plot for ORF by Reading Comprehension Ghana 2013, Grade 2 (English) We set the benchmark at 70 cwpm Box Plot for ORF by Reading Comprehension Ghana 2013, Grade 2 (English) Table Corresponding to Box Plots for ORF by Reading Comprehension, Ghana 2013, Grade 2 Oral Reading Fluency: Number summary and mean Percent correct, Reading Comprehension Mean Minimum 25th 50th 75th Sample Maximum percentile percentile percentile Count 0% 5 0 0 3 6 92 7,060 20% 29 2 14 25 41 97 339 40% 45 4 33 42 53 124 232 60% 55 3 41 53 67 120 148 80% 73 29 57 69 88 138 89 100% 80 33 63 74 101 138 47 Cumulative Percent, ORF, Ghana 2013, Grade 2 50% Nonreader 92% - 50% = 42% Slowly with limited comprehension 98% 92% = 6% 100% 98% = 2% With increasing fluency and comprehension Fluently with full comprehension Documenting Benchmarks Benchmarks for Ghana 2013 Grade 2 (English) Subtask Oral reading fluency Reading with Reading fluently with increasing fluency full comprehension and comprehension Reading slowly with limited comprehension Nonreader % meeting % meeting % meeting % meeting Benchmark benchmark Benchmark benchmark Benchmark benchmark Benchmark benchmark >=70 2% 40-69 6% 0-39 42% 0 50% Practice: Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia) OBJECTIVES: • Set a benchmark for ORF. • Set a nonreader benchmark for ORF (zero or similar). • Set an intermediate benchmark for ORF. • Find the percentage of pupils achieving these individual categories. • Set targets for 2014. Box Plot for ORF by Reading Comprehension Indonesia (PRIORITAS) 2013, Grade 3 (Bahasa Indonesia) Table Corresponding to Box Plot for ORF by Reading Comprehension, Indonesia 2013, Grade 3 Oral reading fluency: Number summary and mean % correct, 25th 50th 75th Reading Maximum Sample Count Mean Minimum percentile percentile percentile Comprehension 0% 13.9 0 0 5 19 94 315 20% 41.4 1 21 35 58 127 288 40% 61.4 9 37 56 83 183 382 60% 76.3 12 57 74 94 166 771 80% 81.9 29 63 81 99 166 992 100% 89.3 33 71 92 105 174 823 Cumulative Percent, ORF Indonesia 2013, Grade Red = Control Schools Blue = Treatment Schools Documenting Benchmarks Reading with Subtask Reading fluently with increasing fluency and Reading slowly with comprehension comprehension limited comprehension Oral Reading Fluency % Bench Target meeting mark % bench Range 2014 mark Bench mark Range % Target meeting % bench 2014 mark Bench mark Range Nonreader % % Target Bench Target meeting Meeting % mark % bench Bench 2014 Range 2014 mark mark Benchmarking Case Studies Instructions: 1) In your packet of materials, find the case study your group has been assigned. 2) Read the objectives for your case study. 3) Complete the “Desired Outcome” table using the tables and graphs you have been provided. References National Institute of Child Health and Human Development (NICHD) [US]. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction: Reports of the subgroups (NIH Publication No. 00-4754). Washington, DC: NICHD. http://www.nichd.nih.gov/publications/pubs/nrp/documents/report.pdf Rasinski, T. V. (2004). Assessing reading fluency. Prepared for the U.S. Department of Education under the Regional Educational Laboratory program, Award No. ED01CO0014. Honolulu, HI: Pacific Resources for Education and Learning. http://files.eric.ed.gov/fulltext/ED483166.pdf USAID Partnership for Education: Ghana Testing, task order under Education Data for Decision Making (EdData II), 2012–2016, baseline assessment Prioritizing Reform, Innovation, and Opportunities for Reaching Indonesia's Teachers, Administrators, and Students (PRIORITAS), 2012–2017, baseline assessment 69 Benchmarking Definitions and Distinctions Term Goal Benchmark Definition (as used in webinar) Long-term objective. Example: The goal of all children being able to read grade-level material with comprehension by the end of grade 3. A milestone used to evaluate progress toward attaining the long-term goal; a desired level of performance for students in a specific skill area. Example: Decoding nonwords at a rate of 23 correct words per minute (cwpm). Performance Levels Another way to establish benchmarks, with different performance levels corresponding to different specific levels of achievement in a skill area. For example, a “proficient” reader may be a student who reaches an oral reading fluency (ORF) level of 45 cwpm. An “emergent” reader may be a student who has an ORF rate of at least 20 cwpm, and up to 45 cwpm. A “struggling” reader may be a student who scores below 20 cwpm, and a “nonreader” may be a student who scores zero. Metric A valid, reliable unit of measure. Benchmarks are expressed in terms of a specific metric, such as number of correct words per minute a child can read orally. Target The percentage of students who would be able to meet the benchmark (or performance level) by a given period of time. For example, at baseline, perhaps 10% of students meet the benchmark for ORF. In two years’ time, the target may be double that percentage, or 20% of students meeting the benchmark. October 1, 2015 Benchmarking Webinar and Workshop Page 1 Case Study 1 Malawi, Grade 4, Chichewa (2012) Objective: Set benchmarks in the following skill areas: • Reading comprehension • ORF • Non-Words • Familiar Words • Syllable Sounds • Letter Names • Listening Comprehension Desired Outcomes- Grade 4 Benchmarks by Subtask: Subtask Benchmark Reading Comprehension (% correct) ORF (cwpm) Non-Words (cnonwpm) Familiar Words (cwpm) Syllable Sounds (csspm) Letter Names (clpm) Listening Comprehension (% correct) October 1, 2015 Benchmarking Webinar and Workshop Page | 1 Case Study 1 Malawi, Grade 4, Chichewa (2012) Box Plots- ORF by Reading Comprehension: Table- ORF by Reading Comprehension: October 1, 2015 Benchmarking Webinar and Workshop Page | 2 Case Study 1 Malawi, Grade 4, Chichewa (2012) Cumulative Percentiles by ORF: Correlations between Subtasks: Reading Comp. (% correct) ORF Non- Familiar Syllable Letter Words Words Sounds Names Reading Comp. (% correct) ORF 1 0.867 1 Non-Words 0.805 0.908 1 Familiar Words Syllable Sounds Letter Names Listening Comp. (% Correct) 0.833 0.948 0.941 1 0.815 0.906 0.915 0.926 1 0.706 0.769 0.765 0.794 0.816 1 0.321 0.313 0.311 0.317 0.321 0.360 October 1, 2015 Benchmarking Webinar and Workshop Listening Comp. (% Correct) 1 Page | 3 Case Study 1 Malawi, Grade 4, Chichewa (2012) Scatter Plot- Matrix of Subtasks: October 1, 2015 Benchmarking Webinar and Workshop Page | 4 Case Study 1 Malawi, Grade 4, Chichewa (2012) Scatter Plot- ORF vs. Non- Words: Scatter Plot- ORF vs. Familiar Words: October 1, 2015 Benchmarking Webinar and Workshop Page | 5 Case Study 1 Malawi, Grade 4, Chichewa (2012) Scatter Plot- ORF vs. Letter Sounds: Scatter Plot- ORF vs. Letter Names: October 1, 2015 Benchmarking Webinar and Workshop Page | 6 Case Study 1 Malawi, Grade 4, Chichewa (2012) Scatter Plot- ORF vs. Listening Comprehension: October 1, 2015 Benchmarking Webinar and Workshop Page | 7 Case Study 1 Malawi, Grade 4, Chichewa (2012) ORF Non-Words Familiar Words Syllable Sounds Letter Names Listening Comp. # of pupils in sample Zero 0.7 1.3 4.3 9.8 2.0 534 1<5 3.4 5.1 10.8 19.5 2.2 81 5<10 8.1 10.6 19.5 23.8 2.9 113 10<15 10.3 15.3 23.4 29.3 2.6 81 15<20 12.8 20.6 33.9 32.7 2.7 123 20<25 15.8 25.2 38.6 34.2 2.7 182 25<30 19.4 29.6 44.8 41.6 3.0 198 30<35 26.8 35.9 51.3 40.9 2.9 110 October 1, 2015 Benchmarking Webinar and Workshop Page | 8 Case Study 1 Malawi, Grade 4, Chichewa (2012) 35<40 25.9 38.5 58.2 51.8 3.0 154 40<45 26.7 40.8 57.5 52.3 3.1 113 45<50 31.9 48.9 65.9 57.6 3.3 32 50<55 44.5 50.8 75.6 50.0 3.6 21 55<60 33.5 47.3 68.0 55.8 3.2 35 60<65 37.2 53.1 86.2 65.9 4.6 16 65<70 32.0 47.7 61.0 59.6 1.4 2 70<75 33.9 62.9 69.0 39.6 3.3 2 75<80 41.4 72.2 73.1 40.0 1.4 2 80<85 45.5 70.0 85.0 63.5 4.5 2 85<90 68.0 110.8 46.0 29.0 1.0 1 October 1, 2015 Benchmarking Webinar and Workshop Average Fluency ScoresSubtasks that correspond to given ranges of ORF: Page | 9 Case Study 2 Philippines, Grade 2, Ilokano (2014) Objective: Set an ORF benchmark for grade 2 using the information provided. Set a second benchmark to create two intermediate levels. Use these benchmarks to set categories of performance levels and find percentage of pupils in those learning categories for 2014 & 2015. Finally, create a target percentage of pupils in these categories for 2016. Desired Outcomes- Percentage of Pupils by Learning Categories: Category Benchmarks for Each Category Baseline Targets % of pupils meeting this benchmark in 2014 % of pupils meeting % of pupils this benchmark in meeting this 2015 benchmark in 2016 Non-reader ORF of zero Emergent ORF greater than zero, and less than ___. reader Basic reader ORF greater than ____, and less than ____. Proficient ORF greater than ____ Reader October 1, 2015 Benchmarking Webinar and Workshop Page 1 Case Study 2 Philippines, Grade 2, Ilokano (2014) Box Plots- ORF by Reading Comprehension: Table- ORF by Reading Comprehension: # Correct Mean ORF 25th percentile 50th percentile 75th percentile Count (achieving # correct) Zero 6.1 0 3 10 109 1 25.7 13 26 34 35 2 35.0 25 35 42 62 3 38.4 30 36 44 67 4 41.6 35 41 46 66 5 54.3 46 51 63 60 October 1, 2015 Benchmarking Webinar and Workshop Page 2 Case Study 2 Philippines, Grade 2, Ilokano (2014) Cumulative Percentile by ORF: Red = 2014 Blue = 2015 October 1, 2015 Benchmarking Webinar and Workshop Page 3 Case Study 3 Jordan, Grade 2, Arabic (2014) Objective: Set targets for the percentage of pupils who will meet the reading comprehension, non-words, and ORF benchmarks in 1, 2, 3, 4, and 5 years using intervention data. Desired Outcomes- Benchmarks and Targets by Subtasks: Benchmark % of pupils meeting benchmark in 2014 Targets: Projected % of pupils meeting benchmark 2015 2016 2017 2018 2019 Reading Comp. (% correct) ORF (cwpm) Non-Words (cnonwpm) Percentage of Pupils Achieving Reading Comprehension (% correct): TREATMENT 2012 Reading Comp. (% Correct) Percent Count (# pupils) TREATMENT 2014 Count (# pupils) Percent Zero 27% 237 19% 185 20% 23% 161 17% 138 40% 24% 175 18% 141 60% 12% 85 22% 150 80% 7% 54 13% 91 100% 6% 49 11% 101 October 1, 2015 Benchmarking Webinar and Workshop Page 1 Case Study 3 Jordan, Grade 2, Arabic (2014) Box Plots- ORF by Reading Comprehension: Table- ORF by Reading Comprehension: % Correct 25th 50th 75th Reading Mean percentile percentile percentile Comprehension Sample Count 0% 5.9 0 2 9 1163 20% 18.2 11 17 23 614 40% 23.7 17 22 27 552 60% 37.1 27 33 43 246 80% 46.3 37 45 56 114 100% 50.3 45 48 54 88 October 1, 2015 Benchmarking Webinar and Workshop Page 2 Case Study 3 Jordan, Grade 2, Arabic (2014) Cumulative Percentile by ORF for Intervention: Red = 2012 Blue = 2014 Scatter Plot- ORF vs. Non- Words: 120 Oral Reading Fluency 100 80 60 40 20 0 -10 -20 0 10 20 30 40 50 Correct Non-words per Minute October 1, 2015 Benchmarking Webinar and Workshop 60 Page 3 Case Study 3 Jordan, Grade 2, Arabic (2014) Cumulative Distribution of ORF vs. Non-words for Intervention Schools: Red = 2012 Intervention Schools Blue = 2014 Intervention Schools October 1, 2015 Benchmarking Webinar and Workshop Page 4 Case Study 4 Malawi, Grade 3, Chichewa (2012) Objective: Use the results from grade 2 and grade 4 reading comprehension and ORF. Set Reading Comprehension and ORF benchmarks and targets in grade 3 for the next 1, 2, 3, 4, and 5 years. Desired Outcomes- Benchmarks and Targets by Subtasks: Benchmark % of pupils meeting benchmark in 2012 Targets: Projected % of pupils meeting benchmark 2013 2014 2015 2016 2017 Reading Comprehension (% correct) ORF Percentage of Pupils Achieving Reading Comprehension (% correct): Reading Comp. (% Correct) Grade 2 Count (# Percent pupils) Grade 4 Count Percent (# pupils) Zero 94.4% 2926 50.5% 712 20% 3.4% 187 13.7% 267 40% 1.6% 143 17.3% 355 60% 0.5% 79 12.4% 309 80% 0.1% 23 5.4% 170 100% 0.0% 2 0.7% 26 October 1, 2015 Benchmarking Webinar and Workshop Page 1 Case Study 4 Malawi, Grade 3, Chichewa (2012) Box Plots- Reading Comprehension and Grade: Tables- ORF by Reading Comprehension and Grade: October 1, 2015 Benchmarking Webinar and Workshop Page 2 Case Study 4 Malawi, Grade 3, Chichewa (2012) Cumulative Percentile of ORF for Grade 2: Red = 2012 Intervention Schools Blue = 2014 Intervention Schools October 1, 2015 Benchmarking Webinar and Workshop Page 3 Case Study 4 Malawi, Grade 3, Chichewa (2012) Cumulative Percentile of ORF for Grade 4: Red = 2010 Green = 2011 Blue = 2012 October 1, 2015 Benchmarking Webinar and Workshop Page 4 Some Basics for Leading Benchmarking Work Using Data from the Early Grade Reading Assessment 1. The Process Benchmarking should rely on actual data on student performance in specific reading skill areas. The underlying relationships between the reading skill areas—in terms of both the research on how students learn to read in alphabetic languages, and the statistical relationships that have consistently been demonstrated across scores of EGRA applications—are what make it possible to use EGRA data to set benchmarks. Step 1:Begin by discussing the level of reading comprehension that is acceptable as demonstrating full understanding of a given text. Most countries have settled on 80% or higher (4 or more correct responses out of 5 questions) as the desirable level of comprehension. Step 2: Given a reading comprehension benchmark, EGRA data are used to show the range of oral reading fluency (ORF) scores—measured in correct words per minute (cwpm)—obtained by students able to achieve the desired level of comprehension. Discussion then is needed to determine the value within that range that should be put forward as the benchmark. Alternatively, a range can indicate the levels of skill development that are acceptable as “proficient” or meeting a grade-level standard (for example, 40 to 50 cwpm). Comprehension 20% 40% 60% 80% 100% Oral reading fluency 35 to 50 45 cwpm Decoding 30 cnonwpm Syllable reading 50 csspm Step 3: With an ORF benchmark defined, the relationship between ORF and decoding (nonword reading) makes it possible to identify the average rate of nonword reading that corresponds to the given level of ORF. Step 4: The process then proceeds in the same manner for each subsequent skill area. Some tips regarding this process: A minimum, yet still adequate, approach to benchmarking would include two skill areas: reading comprehension and oral reading fluency. Going beyond those two to develop benchmarks for other skill areas can be useful, especially in countries where all of the EGRA-measured skills are poorly developed (so that progress can be detected in students’ development of more basic skills). Syllable reading (especially when syllables are important components of words, such as in Bantu languages) is a good skill area to include. If syllable reading was not tested, letter sound recognition, not letter naming, should be used. An exception would be in a language like Bahasa Indonesia which is totally transparent, and therefore in which letter names and sounds are essentially the same. Benchmarking Webinar and Workshop October 1, 2015 Page 1 2. The Data A good benchmarking exercise is quite data intensive. In fact, one of the added benefits of doing this exercise in a country is that the participants get to engage the EGRA results in a much deeper way than they normally would, leading to a richer understanding of what is happening in the country in terms of skill development. The data needed to do benchmarking include: A table (like the one to the right) showing the range of reading fluency scores obtained by students achieving each level of reading comprehension. This makes it possible for participants to complete steps 1 and 2 in the benchmarking process. A graphic way to depict this same information is a set of “box and whisker” plots showing the distribution of ORF scores for each level or reading comprehension. A table that shows the average scores on each other subtask that correspond to different levels of oral reading fluency (as shown here) is what enables participants to connect the ORF benchmark to desirable levels of performance in other skill areas. A graphic way to show this same information is to use a scatter plot (below)—for example, of ORF x nonword decoding, with the best fit line drawn in so that workshop participants can match a given level of ORF to the average corresponding level of nonword decoding. Benchmarking Webinar and Workshop October 1, 2015 Page 2 For determining the percentage of students meeting the benchmark (in the year for which the EGRA data are available), a cumulative distribution graph or table makes it possible for participants to “look up” the percentage of students, for example, achieving 45 cwpm or higher. 3. Performance Levels Some countries are interested in establishing performance points that capture stages of skill development that are below the desired level of achievement defined by the benchmark. For example, the benchmark for reading fluency may be defined as 50 cwpm, representing students who are reading fluently and with full (or almost full) comprehension. Students who score zero are those who are not reading. In between zero and 50 cwpm exist different levels of reading ability that in fact may correspond to stages of literacy acquisition. Setting multiple performance levels makes it possible to determine what percentages of children are at each of those stages of development of their reading skill. Nonreader Fluently with full comprehension Slowly with limited comprehension 0 With increasing fluency and comprehension 20 50 As illustrated above, it is possible to create two other performance levels below the benchmark for reading fluently with full comprehension set at 50 cwpm. Data describing how reading fluency and comprehension scores are distributed (e.g., using a two-way distribution table) inform where to place another level of reading achievement in between zero and 50 cwpm. Thus, two other performance levels are created: students who score above zero and up to 20 cwpm are said to be reading slowly with limited comprehension, and those scoring above 20 and up to 50 cwpm can be said to be reading with increasing fluency and comprehension. The performance levels in this example are from the benchmarking work done in Ethiopia in early 2015. Such intermediate performance levels in other contexts could, of course, be given other labels. An alternative approach to setting performance levels (as was the case in Pakistan) would be to establish a range of ORF scores that are defined as “meeting expectations”: 60 to 90 cwpm. Students scoring above 90 cwpm would be considered to be “exceeding expectations.” Those scoring below 60 would be “not meeting expectations.” 4. Moving Beyond Benchmarks to Targets One of the main purposes of setting benchmarks is to establish the means to evaluate and measure progress in improving reading outcomes. In fact, one of the more interesting challenges in working with ministry colleagues to set benchmarks arises during discussions of the prospects for future improvement in student performance relative to those benchmarks. To set targets for future improvement, benchmarks can be used in the following way. Benchmarking Webinar and Workshop October 1, 2015 Page 3 Once a benchmark has been set, say for oral reading fluency, it is useful to employ the existing data to determine the percentage of students presently meeting that benchmark. The challenge arises when assumptions have to be made about how ORF Benchmark = 45 things will improve—that is, to estimate cwpm the percentages of students who will meet the benchmark in future years (as Which line describes the illustrated here). path of improvement? What % of students will meet the benchmark in Yr 1, Yr 2, …? 10% If data are available from a reading intervention in the country, then the amount of improvement achieved by that program provides a useful starting point for estimating future targets. If data from an intervention are not available, but EGRA results from more than one year are, then the prevailing pattern of change over time can be used to begin discussing how that pattern may evolve in future years. Baseline Yr 1 Yr 2 Yr 3 If only one year of reading results is available, then the task is less data-driven and more a dialogue about how much improvement can be expected. Data from other countries’ programs that have had demonstrated impact could inform that dialogue. Additionally, if EGRA data from a given year are available for two successive grades (say, grades 1 and 2), then the “intergrade” difference is a good means for estimating how much improvement to expect. The intergrade difference represents the amount of progress students make given an additional year in school (under preexisting conditions). For example, a successful intervention could aim to improve performance in grade 1 by as much as the intergrade difference between grades 1 and 2; or put differently, to increase student performance by as much as an additional year of schooling. The value of setting targets is that if performance is initially low—say, very few students meeting the benchmark—there often emerges a tendency to want to lower the benchmark (so performance does not look as bad). It is better to have a benchmark that is genuinely meaningful in terms of the skill level achieved (e.g., oral reading that is fluent enough to enable students to comprehend what they are reading). Therefore, instead of lowering the benchmark, a compromise is to have modest targets for the percentage of children expected to meet the benchmark moving forward. Examples of benchmarks and targets from Jordan are shown in the table below. Oral Reading Fluency Nonword Decoding 46 cwpm 23 cnonwpm 2014 actual 7.5% 5.3% 5-year target 35% 31% Benchmark % of students meeting the benchmark Participants estimated that the percentage of students meeting the benchmarks for these two skill areas would increase from 7.5% to 35% and from 5.3% to 31% over the course of the next five years. Benchmarking Webinar and Workshop October 1, 2015 Page 4 5. Some Things to Remember Having facilitated benchmarking exercises in nine countries, the Education Data for Decision Making (EdData II) project team has learned some useful lessons, which are summarized here. Supply the data. The process requires a fair amount of data. Preparing the right data tools— graphs, tables, forms to be filled out—and carefully labeling those tools to correspond to the different steps in the process help greatly with facilitating the running of a benchmarking workshop. Match the data to the task. A balance needs to be struck between too much and too little data. When a lot of data are available (from more than one year of EGRA, for multiple grades, for an intervention as well as from national surveys), be sure to have participants working only with the sets of data that correspond to the task at hand. Do not dump everything on them at once. Work across grade levels. For working with more than one grade—e.g., for grades 1 through 3—it is best to work in each skill area across grades. For example, when setting a benchmark for ORF, set it for the highest grade for which data are available and then work to set the benchmarks for the other two grades based on that. Then move on to do the same in another skill area. Have multiple small groups work simultaneously. It is useful to have more than one group working in parallel with the data to set a benchmark. When groups arrive at different suggested benchmarks, the facilitated dialogue that ensues is usually quite fruitful. And that dialogue illustrates that even when everyone is using data, there is room for interpretation and negotiation about what constitutes a reasonable benchmark for a given country and language. Encourage discussion. Similarly, the discussion, and often debate, about what targets should be set for future improvement brings to the surface everyone’s assumptions about how the system is going to improve over time. For example, when looking at the results of a pilot interventions in Malawi and Liberia as the bases for determining future targets, participants had a lively discussion about whether one could assume that the conditions created in a pilot (which led to the results) could be expected to be implemented on a national scale (and what it would take for the ministry and its partners to achieve that). Limit the number of benchmarks. There is often a tendency to want to set benchmarks for every skill area. Limiting the number of skill areas to no more than four is highly recommended: reading comprehension, oral reading fluency, and two others (nonword decoding and syllable reading or letter sound identification). Consider how to institutionalize the decisions. It is necessary to engage participants in determining how benchmarks they develop could become official. Even if the benchmarks are not made official, they should be used to summarize reading performance the next time early grade reading is assessed. This was the case most recently in the Philippines where, even though the benchmarks were not officially adopted, comparison of the percentages of students meeting benchmarks in 2014 and 2015 helped the Department of Education evaluate the extent to which progress was being made. Benchmarking Webinar and Workshop October 1, 2015 Page 5
© Copyright 2026 Paperzz