education policy analysis archives A peer-reviewed, independent, open access, multilingual journal EPAA Response Table Manuscript Number: Insert manuscript number here Title: The impact of teacher attitudes and beliefs about large-scale assessment on the use of provincial data for instructional change Manuscript re-submitted: Insert re-submission date here Reviewer #1’s Comments: 1. I believe there is need for the provision of more situational context for the broad scope of the article, and there are five topics that could be explored more thoroughly to address this. To begin with, a large portion of the data on which the study is based is generated through the administration of the nation-wide teacher survey. I see some items listed in Figure 2. However, it is not clear to me whether or not this is the entire survey. Another item regarding the situation context also centers on the survey items. As I stated above, I am unclear as to whether Figure 2 illustrates the entirety of the survey. If the items shown in Figure 2 represent the entirety of the questions included in the survey, I am not sure which questions are aimed at getting to teacher attitudes. Table 1 does list, “…values for questions related to teacher attitudes…”, but the actual survey items are not present. Therefore, I am having a difficult time understanding what respondents indicated to that relates to attitudes. All of the items in Figure 2 are about actual behaviors exhibited by teachers, and not about teacher attitudes toward LSA. Again, I believe the inclusion of the entire survey instrument would provide a more solid context for the article. 2. An addition to the article that could strengthen the situational context relates to the national policy mentioned. The article states that policy goals set for Canadian LSAs Responses: More details about the extent of the survey were provided on p. 9 and in the notes following figure 2. I have included the survey outline in the appendix. It includes all the survey questions and assigned binary values given to responses. Reference to the full survey being found here is located on page 13. A chart has been added to the appendix for reference. I agree that this information is helpful 1 include that there be tests written across various subjects and grades to gauge progress. A description of the grade levels and subject areas in which these LSAs are administered would add greatly to the article. This would provide a point of reference for the reader regarding the study results. 3. One other point regarding the context for the article is with regard to the timing of the administration of LSAs. In other words, are the LSAs all exit exams, which I infer to mean summative? If some LSAs are formative and some are summative, differences in survey results could possibly be explained according to the requirements of specific teachers and/or in specific provinces. 4. The context could also be strengthened with a more full explanation regarding the interviews. The article states, “…the interview guide was employed to follow up upon the same themes as the survey instrument.” It is clear that interviews were conducted with teachers, administrators, and district-level staff. But, I am not sure what types of questions were asked in the interview. I believe an outline of the interview protocol would be very helpful. 5. The first comment I have regarding the Theoretical Framework may be a philosophical difference of opinion, but I did want to mention it. Figure 2 displays survey items, and categorizes them according to whether they suggest “Teaching (to) the curriculum” or “Teaching to the test”. The item, “I have requested additional resources related to testing” is listed under the “Teaching (to) the curriculum” category. I am not sure that I agree with this categorization. Asking for testing resources is very similar to using the format of the test to design practice questions (an item that appears in the “Teaching to the test” category), in my opinion. 6. Furthermore, the theoretical framework for the research centers on reactivity and how teachers react when policy includes sanctions or rewards for teachers based on LSA results. I believe the reader needs to know what stakes are in place for teachers and students related to the LSAs that are in place. Because the article does not state what the existing sanctions and rewards are, the framework is a bit difficult to reconcile. 7. A related topic lies with the Conclusion. In this section it is stated, “Another striking finding that comes from this analysis is that provincial variations in testing policy and test design are quite telling in terms of using teaching to the test strategies.” I do not find it striking that the provinces without high-stakes exams do not use score- to the reader (especially when not aware of the Canadian system), but it was originally left out in the interest of space. The chart on provincial testing added (see above) should shed some light here. Some discussion related to different provincial testing models has been added on page I have included the survey outline in the appendix. It includes all the survey questions and assigned binary values given to responses. Reference to the full survey being found here is located on page 13. I have added some commentary on page 10 that answers this concern somewhat. There are certainly cases where my categorization may not fit the circumstances of a given class or a given student. In this one case, requesting additional resources is posed in contrast to using old exams or test-format practice questions which are not necessarily the more thorough or variable assessment methods. A good point. I have added some explanation on page 19. Clarifying what is meant by stakes has been added to this paragraph. The Canadian context does make teacher reactivity somewhat more striking than in cases where they would be 2 improvement strategies and that the province with the unique assessment design does not “teach to the test”, given my current and admittedly traditional thinking regarding the meaning of high-stakes tests and strategies employed by teachers to promote student success on them. Reviewer #2’s Comments: 1. The author(s) should contrast the results with other relevant studies, state the limitations of the study, and explore the implications of the findings (for teachers, students, policymakers, etc.). Reviewer #3’s Comments: 1. The authors indicate that the data that are described in this paper constitute a subset of the available data for examining teachers’ responses to assessment, so it might be beneficial for them to explore ways to incorporate the findings from the analyses described in this paper into a more comprehensive investigation that could address some of the limitations. 2. Perhaps the most significant limitation of the paper is that the authors claim several times to be examining “impact” but they use a research design and a set of statistical methods that do not support the causal statements that they make throughout the paper. The analysis relies on simple OLS models that examine relationships but cannot determine causality, even though the third research question is framed in causal terms (“Which attitudes variable shave the most effect on the use of data?”). Words like “effect” and “impact” appear throughout the discussion of findings and are used in ways that are inconsistent with the analyses that were actually carried out. Correlations between practices and attitudes could stem to a large degree from factors such as the quality of school curriculum or leadership, and correlations between practices and province could also be influences by factors that are related to both rather than reflecting a true causal relationship. The language needs to be much more cautious and should make it clear that this is an exploratory, descriptive analysis. 3. A second limitation is that the authors state that their study examines how teachers use assessment data to improve their instructional practices, but the measures on which they report in this paper provide a very narrow window into the kinds of datause strategies and practices that teachers might implement. The “teaching to the curriculum” and “teaching to the test” survey items do not seem clearly distinguishable; the curriculum scale includes an item about seeking additional personally sanctioned for poor results (as in the US… sorry). Responses: a) some additional relevant related studies have been referenced. b) the limitations were reviewed a section added on page 16; c) the implications are found mainly in the conclusions section. Responses: A limitations section was included on page 16. Addressing these limitations will require further research and publication. I am working on that. More careful wording has been attempted in this draft. I agree that there are correlations not causational links here, so this has been addressed in the text with a careful edit. The third research question has also been subjected to this scrutiny. This is not a new criticism, but it is one that I have tried to address here and also for Reviewer 1. Developing the unique reactivity table meant drawing a line between some instructional practices which teach most directly to a specific assessment instrument and those that are 3 resources relate to testing, and an item about group study sessions (which could be focused on test prep; the topics of these study sessions are not specified). Similarly, the testing to the test scale includes an item about focusing more on tested subjects, which could involve devoting more attention to the curriculum in those tested subjects. effective in more general terms. I cannot improve upon Daniel Koretz’s phrasing quoted on page 4: “Preparation that leads to improved mastery of the domain is appropriate… Preparation that generates gains largely or entirely specific to the given test is inappropriate because it generates inflation rather than meaningful gains.” I think these divergent streams are distinguishable, and they should be.The STF code of conduct was a useful added (in this draft) instrument for this purpose. 4. Along similar lines, there is no effort to explore the various types of assessments that While this is certainly true, it was not the topic of are available for teachers to use to inform instruction (LSAs, benchmark assessments, this paper or the larger study. To focus on other teacher-developed tests, data from intelligent tutoring software, etc.). A teacher’s kinds of assessments would have muddied the propensity to rely on LSA data for decision making is likely to be influenced in large waters. part by the extent to which he or she has access to higher-quality (i.e., more detailed or timely, or more closely aligned with the curriculum) information from other forms of assessment. 5. The paper would benefit from more detail about the survey scales that are used as As mentioned by reviewer 1, the survey outline dependent variables in the regression models, including information on the items and scales have been added to the appendix. included in each. 6. Finally, the theoretical boundaries between teaching to the test and teaching to the Teachers and administrators appear to be aware curriculum are not always clear. According to the theory of action that underlies from their responses that test preparation which concepts of standards-based reform, in an ideal world the curriculum would be fully is focused on a specific instrument, of whatever aligned with the assessment such that teaching to the test would constitute teaching to quality, is focused in such a way that instruction the curriculum. Of course this perfect alignment never occurs in practice, but the narrows. extent to which teaching to the test and teaching to the curriculum can be separated There are many ways to display curriculum depends in part on the degree of alignment between the two. Moreover, the extent to mastery and most tests (with the exception of which teaching to the test is problematic is influenced by the quality of the test, likely the International Baccalaureate exam) do including whether the items are predictable from one year to the next. These complex not allow for unbounded creativity in responses. aspects of standards-based reform and assessment are not explored in this paper. I don’t believe this is a matter of having ‘better tests’ or even their alignment with curriculum, which is simply a starting point for any valid assessment. This is a matter of teaching practices 4 working in a reality where the entire curriculum cannot be practically tested nor can enough high quality unpredictable items be practically and suitably marked in a standardized way. There are good issues to explore outside of this paper, agreed. But to focus on either test design (items types) or miss the difference between testfocused and more generalized instruction methods misses the thrust here, somewhat. N.B. I do have a paper on teaching to the test specifically which has been approved pending revisions for publication in Assessment in Education: Principles, Policy Practice which explores this one issue in some depth. Specific guidelines from the editorial board See comments related to both of the instruments you apparently used, and the need for much more detail, and evidence, there. See related comments as per your methods in that much more detail is required there, as well, particularly in terms of data analysis (e.g., what evidence is there that results matched how you theoretically positioned items into constructs, see also comments provided by Reviewer 1 and 3 about items and evidence that the items/constructs used actually map onto the constructs purportedly being assessed). See also need for additional analyses for evidence (e.g., factor analysis, alpha, perhaps alpha with items deleted). See comments on “context” in general, and specifically provided via Reviewer 1 and 3 (e.g., on the stakes attached to LSAs as per your theoretical framework, other options and whether this, or the lack thereof, conflated your results). What are the specific recommendations you are making regarding policy, teaching and learning (see comments provided by Reviewers 1 and 2). Resituate findings within the literature (see comment provided by Reviewer 2). Also strengthen, with literature, the sections in which you discuss teaching to - Survey added to appendix - Details about the alpha and Spearman’s analyses were clarified. The survey responses were also correlated better to a guiding code of professional conduct (as in the original study). - Appendices added to clarify context as well as text added to try to illuminate the unique Canadian context of the study - Each of the three main conclusions has an added sentence related to a specific recommendation. Findings were situated more with existing literature - 5 the curriculum and teaching to the test (see also comments provided by Reviewer 3). Need a section on the limitations of this study (see comments provided by Reviewers 2 and 3). Need a section, with citations, on generalizability for both the survey section, given the low response rate, and the interview section. See also comments provided by Reviewer 3 about concerns about your evidentiary warrants. Please adjust the causal language used/implied throughout, also in accordance with the comments provided by Reviewer 3. - Limitations section added - Citations and text added related to the generalizability of the survey data and the different, purposive selection of the interview subjects - Causal language checked and corrected Accept with Revisions Original Email: Copy/paste “accept with revisions” email with date here 6
© Copyright 2026 Paperzz