The contribution of large-scale test results to policy Esther Care, Patrick Griffin, and Zhonghua Zhang The University of Melbourne, Australia Abstract This summary, drawn from Care, Griffin, Zhang, and Hutchinson (2013) provides an example of how tests that are intended for teacher use to inform teaching, provide information that can signal the need for policy change. Introduction There are many views on the functions of assessment in education. In this summary, the capacity of assessment data gathered initially to inform differentiated teaching, also to inform policy is described. The use of assessment data to inform instruction is ensconced firmly in the work of Vygotsky (1986) and his identification of the phenomenon of a zone in which an individual is able to achieve more with assistance than he or she could manage alone. The Zone of Proximal Development (ZPD) is typically used to refer to an area or level of skills in which a student is ranging between correct and incorrect responses as he or she engages with the level of difficulty. As discussed by Griffin (2007), this perspective links well with the work of Glaser (1963), who proposed the concept of criterion-referenced interpretation of assessments. When students are assessed in such a way that their current skills are identified, the information can be used by teachers to guide interventions, ensuring that information is presented to the student at the level at which he or she can engage with the learning goals. Notwithstanding a growing interest in this approach to assessment, attested to by increased understanding and encouragement of formative assessment approaches (Black & Wiliam, 1998), international and national large-scale assessments have by and large ignored the capacity of their data to empower and inform teachers. Our argument is that a change in approach to assessment is required—one that is based on changing notions of teaching and learning—as we move toward a more differentiated, more responsive model that meets individual student needs. It is necessary (a) to determine what is needed in the classroom for formative assessment and what is helpful for teachers, (b) to determine what is needed in terms of large-scale accountability and (c) to achieve consistency between these without imposing the limitations of one upon the other. We should not pursue large-scale assessment to inform policy without also taking a classroom perspective on the usefulness of the information. The teacher’s need is for specific information about students’ current understandings and skills in the context of the program of learning outlined by the curriculum and interpreted for teaching purposes within the school. This means that logistically the large-scale assessment must be capable of providing both foreground information for use by teachers and background information to harvest for summative, system-level analysis. Care, Griffin, & Zhang 1 In Victoria, Australia, the Assessment Research Centre at The University of Melbourne is implementing just such a model through the State and Catholic education jurisdictions in literacy, numeracy and problem-solving tests—tests that provide information about students’ location along developmental learning progressions. The work has informed teacher decisionmaking about interventions, school leadership teams’ decision-making about staff organization and regional decision-making about professional development needs. Integral to this "Assessment and Learning Partnerships" (ALP) program is the Assessment Research Centre Online Testing System (ARCOTS), an online platform that supports a comprehensive assessment system. Through this system student competency is mapped to underlying, empirically based developmental progressions, and reports are generated for teacher use. ARCOTS tests are available for reading comprehension, numeracy and problem solving, and are targeted for students across Grades 3–10. Schools participating in ALP test their students twice a year on the developmental progression of interest. As part of the program, teachers learn how to interpret ARCOTS results using a developmental assessment approach, and identify the point of readiness to learn, ZPD, for the student assessed. The first test establishes a starting point. Testing at second, third and subsequent points in time provides teachers with evidence of progress and an opportunity to review the student’s ZPD. Teachers analyze these results together with other examples of student achievement drawn from the classroom, in order to plan for teaching interventions. The data presented here indicates how these test results that are used in the first instance by teachers to identify the ZPD for teaching purposes, can equally be used for large scale analysis, interpretation, and policy development. Method Given the primary goal of ALP—that each student should be taught at the point at which she or he is ready to learn—all students should be able to progress in their learning. Analysis of the distributions of student progress can provide some evidence concerning whether this is achieved for all students, or for particular groups of students. As a test of equal progress in learning, distributions of assessment data were analyzed to identify whether students at the top end of each grade level were progressing at a similar rate to students at the lower end of the grade. Participants Enrolled in Department of Education and Early Childhood primary and secondary schools, 21,000 students participated in ARCOTS testing in March 2011. For this example, a subsample of students matched across time and test difficulty level, studying in Grades 3–6, have their numeracy results included in the analyses. Care, Griffin, & Zhang 2 Tests The numeracy areas cover number, geometry, measurement, chance and data. Each test has questions drawing on a range of content, with varying levels of complexity. There is overlap in both content and complexity between one test and the next. Each test is comprised of 40 items and is designed to be completed in approximately 50 minutes. The items are presented in multiple-choice format on an online platform. Items on the tests are mapped onto a uniform, latent variable scale for each domain that fits into a single-parameter, Rasch, model. The test taken by each student is targeted so that the student is likely to achieve approximately 50 per cent correct. Results A sub-sample of primary school students who took ARCOTS numeracy tests both in March and October 2011 across different grades are listed in Table 1 (with outliers removed). Grade Total Grade 3 Grade 4 Grade 5 Grade 6 1551 988 565 589 Lower achievement group 414 259 161 166 Higher achievement group 397 265 147 168 Table 1. Sample distributions of the students taking ARCOTS numeracy tests in March and October 2011 The distributions of students’ achievement scores on the numeracy tests for Grades 3–6 are displayed in Figure 1. The growth from March to October can be seen in the distributions. The results of Shapiro-Wilk tests indicated that the students’ achievement scores were not normally distributed. Hence, the non-parametric Wilcoxon signed-rank test was used to assess whether the students’ achievement scores on the March 2011 numeracy test and on the October 2011 numeracy test differ. The results indicated that there are statistically significant differences over time from March numeracy test to October numeracy test for all grades. Consistent across the grades, the mean differences as well as the effect size of the differences show medium positive growth across the 2011 school year. Care, Griffin, & Zhang 3 Figure 1. Distributions of students’ achievement scores on ARCOTS numeracy tests across grades Based on the students’ achievement scores, two groups are used for analysis. The students whose achievement scores are equal to or smaller than the 25th percentile of the scores on ARCOTS March 2011 numeracy tests are taken as lower-achievement students and those with achievement scores equal to or greater than 75th percentile are regarded as higherachievement students. Again, due to distributional characteristics, the Wilcoxon signed-rank test was used to assess whether the students’ achievement scores from March to October significantly differed across achievement and grade groups. For the lower-achievement students, there are statistically significant differences between achievement from March to October for all grades; the mean differences and the corresponding effect sizes show substantive growth. However, for the Care, Griffin, & Zhang 4 higher-achievement students, similar results were not obtained. No significant differences were found for Grades 3 and 5. For Grades 4 and 6, no statistically significant differences at p < .001 were found in achievement scores between March and October. The effect sizes also imply little growth for these high-achieving student groups. These findings suggest that a different growth trajectory exists among lower-achievement and higher-achievement students. It is apparent that scores of the lower-achievement students grow faster than those of higher-achievement students, as summarised in Figure 2. Figure 2. Distributions of gain scores for lower-achievement and higher-achievement groups of students on ARCOTS numeracy tests from March 2011 to October 2011, across different grades Care, Griffin, & Zhang 5 Discussion What we see in the results is both reassuring and alarming. For students whose results indicate that they are operating within the lower 25 per cent distribution of the grade, there is a consistent pattern of growth regardless of the grade. However, what is of concern is that the students within the top 25 per cent of distributions across all grades tested achieved little growth at cohort level. The skill levels of students in each grade appear to be converging. Thus, it seems that teachers are indeed ‘closing the gap’, a term coined to characterize the Australian policy to reduce inequities between Indigenous and non-Indigenous Australians (MCEEDYA, 2009), but extended to education more generally (Gonski, 2011). Given the national and state emphasis on raising the skill levels of those at the lower end of the distribution, it is not surprising that these students are prioritized. Through use of large-scale testing data such as that presented in this summary from Care, et al. (2013), we can see the direct effects of policy. Where policy may have brought about unanticipated outcomes, we are provided with an evidence base upon which to promote more appropriate planning. As can be seen in this instance, the positive policy of promoting equity has in fact brought about its opposite—for a group other than that originally targeted for positive outcome. Counter-intuitively, it is the students in the top of the distribution who are at greatest risk of reversing the trend in growth rates. Implications for action The progress of those students in the lower 25 per cent of the distributions may be seen as a direct outcome of good teaching practice. In this first year of participation in the ALP program, teachers learn about developmental approaches to teaching and learning, and how to use assessment data to inform their decisions about interventions with students—what resources and strategies to bring to bear and what level of content and complexity of subject matter to include. The implications of the outcomes of this analysis extend to teacher, school, jurisdiction and system levels. For the teacher, the data highlight the need to differentiate teaching for all students rather than those who appear to have the greatest need. This differentiation may require a change in attitude, as well as a need for identification of different strategies and resources to cover the full range of need in the classroom. At the school level, the required changes herald the need to implement professional learning activity in order to address both the attitudinal and skills needs of teachers. At the jurisdiction or regional level, this implies that a need exists for appropriate resourcing to schools for professional learning, and for regional level promotional support for change in practice. At the system level, the implications for policy are clear: it must promote equal opportunity for all, ahead of the prioritization of sub-groups. The large scale national testing in the Philippines generates huge information resources that has the potential for use both in the classroom and at system level to inform policy. How the data are captured, recorded, reported upon, and interpreted determines its use. The example Care, Griffin, & Zhang 6 provided by these data from the Australian system can be used as model for re-thinking use of large scale data in the Philippines system of education. Information relevant to the individual student can be used at class level by the teacher as an aide to differentiated instruction; at school level by groups of teachers in order to inform their professional development, responsive to student profiles; and at jurisdiction and policy levels in order to identify patterns in student learning that are attributable to educational policies of current governments. In so doing, the potential for student, class and school-level data to be used at policy level is clear. In order for use of assessment data to be effective, it is essential that teachers have access to quick turnaround of results such that they are responding to students’ current levels of functioning and performance. It is also essential that teachers acquire skills relevant to understanding aggregate as well as individual data, so that they can bring their professional judgment to bear in terms of interpretation. Both these criteria—for data capture and professional understandings—require systems resources. As the Philippines rolls out its K-12 reforms, and re-thinks its national assessment system, this approach to both formative and system level use of test results is an imperative. References Black, P., & Wiliam, D. (1998) Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80, 139–48. Care, E., Griffin, P., Zhang, Z., & Hutchinson, D. (2013). Large scale testing and its contribution to learning. In C. Wyatt-Smith, V. Klenowski, & P. Colbert (Eds.) The enabling power of assessment. The Netherlands: Springer International. Glaser, W. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519-21. Gonski, D. (2011) Review of Funding for Schooling—Final Report. Canberra: Department of Education, Employment and Workplace Relations. Griffin, P. (2007). The comfort of competence and the uncertainty of assessment, Studies in Educational Evaluation, 33, 87–99. MCEEDYA (2009). Aboriginal and Torres Strait Islander Education Action Plan 2010– 2014. Canberra: Ministerial Council for Education, Early Childhood Development and Youth Affairs. Vygotsky, L. S. (1986). Thought and language. Boston: MIT Press. Wiliam, D., & Thompson, M. (2007). Integrating assessment with learning: What will it take to make it work? In C. A. Dwyer (Ed.), The future of assessment: Shaping, teaching, and learning. Mahwah, NJ: Erlbaum. Note Note that this summary presentation for the 12th National Convention on Statistics, Manila, 1st-2nd October, 2013, is adapted from the article by Care, Griffin, Zhang, and Hutchinson (2013) published in C. Wyatt-Smith, V. Klenowski, & P. Colbert (Eds.) The enabling power of assessment. The Netherlands: Springer International. Care, Griffin, & Zhang 7
© Copyright 2026 Paperzz