- Education Policy Analysis Archives

education policy analysis archives
A peer-reviewed, independent,
open access, multilingual journal
EPAA Response Table
Manuscript Number: Insert manuscript number here
Title: The impact of teacher attitudes and beliefs about large-scale assessment on the use of provincial data for instructional
change
Manuscript re-submitted: Insert re-submission date here
Reviewer #1’s Comments:
1. I believe there is need for the provision of more situational context for the broad
scope of the article, and there are five topics that could be explored more thoroughly
to address this. To begin with, a large portion of the data on which the study is based
is generated through the administration of the nation-wide teacher survey. I see some
items listed in Figure 2. However, it is not clear to me whether or not this is the entire
survey. Another item regarding the situation context also centers on the survey items.
As I stated above, I am unclear as to whether Figure 2 illustrates the entirety of the
survey. If the items shown in Figure 2 represent the entirety of the questions included
in the survey, I am not sure which questions are aimed at getting to teacher attitudes.
Table 1 does list, “…values for questions related to teacher attitudes…”, but the actual
survey items are not present. Therefore, I am having a difficult time understanding
what respondents indicated to that relates to attitudes. All of the items in Figure 2 are
about actual behaviors exhibited by teachers, and not about teacher attitudes toward
LSA. Again, I believe the inclusion of the entire survey instrument would provide a
more solid context for the article.
2. An addition to the article that could strengthen the situational context relates to the
national policy mentioned. The article states that policy goals set for Canadian LSAs
Responses:
More details about the extent of the survey were
provided on p. 9 and in the notes following
figure 2.
I have included the survey outline in the
appendix. It includes all the survey questions and
assigned binary values given to responses.
Reference to the full survey being found here is
located on page 13.
A chart has been added to the appendix for
reference. I agree that this information is helpful
1
include that there be tests written across various subjects and grades to gauge progress.
A description of the grade levels and subject areas in which these LSAs are
administered would add greatly to the article. This would provide a point of reference
for the reader regarding the study results.
3. One other point regarding the context for the article is with regard to the timing of
the administration of LSAs. In other words, are the LSAs all exit exams, which I infer
to mean summative? If some LSAs are formative and some are summative, differences
in survey results could possibly be explained according to the requirements of specific
teachers and/or in specific provinces.
4. The context could also be strengthened with a more full explanation regarding the
interviews. The article states, “…the interview guide was employed to follow up upon
the same themes as the survey instrument.” It is clear that interviews were conducted
with teachers, administrators, and district-level staff. But, I am not sure what types of
questions were asked in the interview. I believe an outline of the interview protocol
would be very helpful.
5. The first comment I have regarding the Theoretical Framework may be a
philosophical difference of opinion, but I did want to mention it. Figure 2 displays
survey items, and categorizes them according to whether they suggest “Teaching (to)
the curriculum” or “Teaching to the test”. The item, “I have requested additional
resources related to testing” is listed under the “Teaching (to) the curriculum”
category. I am not sure that I agree with this categorization. Asking for testing
resources is very similar to using the format of the test to design practice questions (an
item that appears in the “Teaching to the test” category), in my opinion.
6. Furthermore, the theoretical framework for the research centers on reactivity and
how teachers react when policy includes sanctions or rewards for teachers based on
LSA results. I believe the reader needs to know what stakes are in place for teachers
and students related to the LSAs that are in place. Because the article does not state
what the existing sanctions and rewards are, the framework is a bit difficult to
reconcile.
7. A related topic lies with the Conclusion. In this section it is stated, “Another
striking finding that comes from this analysis is that provincial variations in testing
policy and test design are quite telling in terms of using teaching to the test strategies.”
I do not find it striking that the provinces without high-stakes exams do not use score-
to the reader (especially when not aware of the
Canadian system), but it was originally left out in
the interest of space.
The chart on provincial testing added (see above)
should shed some light here. Some discussion
related to different provincial testing models has
been added on page
I have included the survey outline in the
appendix. It includes all the survey questions and
assigned binary values given to responses.
Reference to the full survey being found here is
located on page 13.
I have added some commentary on page 10 that
answers this concern somewhat. There are
certainly cases where my categorization may not
fit the circumstances of a given class or a given
student. In this one case, requesting additional
resources is posed in contrast to using old exams
or test-format practice questions which are not
necessarily the more thorough or variable
assessment methods.
A good point. I have added some explanation on
page 19.
Clarifying what is meant by stakes has been
added to this paragraph. The Canadian context
does make teacher reactivity somewhat more
striking than in cases where they would be
2
improvement strategies and that the province with the unique assessment design does
not “teach to the test”, given my current and admittedly traditional thinking regarding
the meaning of high-stakes tests and strategies employed by teachers to promote
student success on them.
Reviewer #2’s Comments:
1. The author(s) should contrast the results with other relevant studies, state the
limitations of the study, and explore the implications of the findings (for teachers,
students, policymakers, etc.).
Reviewer #3’s Comments:
1. The authors indicate that the data that are described in this paper constitute a subset
of the available data for examining teachers’ responses to assessment, so it might be
beneficial for them to explore ways to incorporate the findings from the analyses
described in this paper into a more comprehensive investigation that could address
some of the limitations.
2. Perhaps the most significant limitation of the paper is that the authors claim several
times to be examining “impact” but they use a research design and a set of statistical
methods that do not support the causal statements that they make throughout the
paper. The analysis relies on simple OLS models that examine relationships but cannot
determine causality, even though the third research question is framed in causal terms
(“Which attitudes variable shave the most effect on the use of data?”). Words like
“effect” and “impact” appear throughout the discussion of findings and are used in
ways that are inconsistent with the analyses that were actually carried out. Correlations
between practices and attitudes could stem to a large degree from factors such as the
quality of school curriculum or leadership, and correlations between practices and
province could also be influences by factors that are related to both rather than
reflecting a true causal relationship. The language needs to be much more cautious and
should make it clear that this is an exploratory, descriptive analysis.
3. A second limitation is that the authors state that their study examines how teachers
use assessment data to improve their instructional practices, but the measures on
which they report in this paper provide a very narrow window into the kinds of datause strategies and practices that teachers might implement. The “teaching to the
curriculum” and “teaching to the test” survey items do not seem clearly
distinguishable; the curriculum scale includes an item about seeking additional
personally sanctioned for poor results (as in the
US… sorry).
Responses:
a) some additional relevant related studies have
been referenced. b) the limitations were reviewed
a section added on page 16; c) the implications
are found mainly in the conclusions section.
Responses:
A limitations section was included on page 16.
Addressing these limitations will require further
research and publication. I am working on that.
More careful wording has been attempted in this
draft. I agree that there are correlations not
causational links here, so this has been addressed
in the text with a careful edit. The third research
question has also been subjected to this scrutiny.
This is not a new criticism, but it is one that I
have tried to address here and also for Reviewer
1. Developing the unique reactivity table meant
drawing a line between some instructional
practices which teach most directly to a specific
assessment instrument and those that are
3
resources relate to testing, and an item about group study sessions (which could be
focused on test prep; the topics of these study sessions are not specified). Similarly, the
testing to the test scale includes an item about focusing more on tested subjects, which
could involve devoting more attention to the curriculum in those tested subjects.
effective in more general terms. I cannot improve
upon Daniel Koretz’s phrasing quoted on page 4:
“Preparation that leads to improved mastery of
the domain is appropriate… Preparation that
generates gains largely or entirely specific to the
given test is inappropriate because it generates
inflation rather than meaningful gains.” I think
these divergent streams are distinguishable, and
they should be.The STF code of conduct was a
useful added (in this draft) instrument for this
purpose.
4. Along similar lines, there is no effort to explore the various types of assessments that While this is certainly true, it was not the topic of
are available for teachers to use to inform instruction (LSAs, benchmark assessments,
this paper or the larger study. To focus on other
teacher-developed tests, data from intelligent tutoring software, etc.). A teacher’s
kinds of assessments would have muddied the
propensity to rely on LSA data for decision making is likely to be influenced in large
waters.
part by the extent to which he or she has access to higher-quality (i.e., more detailed or
timely, or more closely aligned with the curriculum) information from other forms of
assessment.
5. The paper would benefit from more detail about the survey scales that are used as
As mentioned by reviewer 1, the survey outline
dependent variables in the regression models, including information on the items
and scales have been added to the appendix.
included in each.
6. Finally, the theoretical boundaries between teaching to the test and teaching to the
Teachers and administrators appear to be aware
curriculum are not always clear. According to the theory of action that underlies
from their responses that test preparation which
concepts of standards-based reform, in an ideal world the curriculum would be fully
is focused on a specific instrument, of whatever
aligned with the assessment such that teaching to the test would constitute teaching to quality, is focused in such a way that instruction
the curriculum. Of course this perfect alignment never occurs in practice, but the
narrows.
extent to which teaching to the test and teaching to the curriculum can be separated
There are many ways to display curriculum
depends in part on the degree of alignment between the two. Moreover, the extent to
mastery and most tests (with the exception of
which teaching to the test is problematic is influenced by the quality of the test,
likely the International Baccalaureate exam) do
including whether the items are predictable from one year to the next. These complex
not allow for unbounded creativity in responses.
aspects of standards-based reform and assessment are not explored in this paper.
I don’t believe this is a matter of having ‘better
tests’ or even their alignment with curriculum,
which is simply a starting point for any valid
assessment. This is a matter of teaching practices
4
working in a reality where the entire curriculum
cannot be practically tested nor can enough high
quality unpredictable items be practically and
suitably marked in a standardized way.
There are good issues to explore outside of this
paper, agreed. But to focus on either test design
(items types) or miss the difference between testfocused and more generalized instruction
methods misses the thrust here, somewhat.
N.B. I do have a paper on teaching to the test
specifically which has been approved pending
revisions for publication in Assessment in
Education: Principles, Policy Practice which
explores this one issue in some depth.






Specific guidelines from the editorial board
See comments related to both of the instruments you apparently used, and the
need for much more detail, and evidence, there.
See related comments as per your methods in that much more detail is required
there, as well, particularly in terms of data analysis (e.g., what evidence is there
that results matched how you theoretically positioned items into constructs, see
also comments provided by Reviewer 1 and 3 about items and evidence that
the items/constructs used actually map onto the constructs purportedly being
assessed). See also need for additional analyses for evidence (e.g., factor
analysis, alpha, perhaps alpha with items deleted).
See comments on “context” in general, and specifically provided via Reviewer
1 and 3 (e.g., on the stakes attached to LSAs as per your theoretical framework,
other options and whether this, or the lack thereof, conflated your results).
What are the specific recommendations you are making regarding policy,
teaching and learning (see comments provided by Reviewers 1 and 2).
Resituate findings within the literature (see comment provided by Reviewer 2).
Also strengthen, with literature, the sections in which you discuss teaching to
-
Survey added to appendix
-
Details about the alpha and Spearman’s
analyses were clarified. The survey responses
were also correlated better to a guiding code
of professional conduct (as in the original
study).
-
Appendices added to clarify context as well
as text added to try to illuminate the unique
Canadian context of the study
-
Each of the three main conclusions has an
added sentence related to a specific
recommendation.
Findings were situated more with existing
literature
-
5



the curriculum and teaching to the test (see also comments provided by
Reviewer 3).
Need a section on the limitations of this study (see comments provided by
Reviewers 2 and 3).
Need a section, with citations, on generalizability for both the survey section,
given the low response rate, and the interview section. See also comments
provided by Reviewer 3 about concerns about your evidentiary warrants.
Please adjust the causal language used/implied throughout, also in accordance
with the comments provided by Reviewer 3.
-
Limitations section added
-
Citations and text added related to the
generalizability of the survey data and the
different, purposive selection of the
interview subjects
-
Causal language checked and corrected
Accept with Revisions Original Email: Copy/paste “accept with revisions” email with date here
6