“The private history of a campaign that failed”: Possible lessons from a Three-Year School Effectiveness Randomized Control Trial Eugene Schaffer University of Maryland—Baltimore County [email protected] Sam Stringfield University of Louisville [email protected] A presentation at the International Congress for School Effectiveness and Improvement, Limassol, Cyprus, January 6, 2011 1 Objective Our objective is to present the design, implementation, outcomes and possible policy implications of a three-year randomized control trial (RCT) of ―effective schools correlates‖ in elementary schools serving relatively high concentrations of high poverty students1. Background For at over 40 years, school effectiveness researchers have sought to identify ―correlates‖ of school effectiveness. Their methods have ranged from case studies to HLM. These efforts have produced laudable progress (Teddlie & Reynolds, 2000; Townsend, 2007; Weber 1971). For over 70 years in the U.S.—and perhaps longer elsewhere—well regarded educational scholars and researchers have attempted to demonstrate that they could produce measurable gains in student achievement (Nunnery, 1998; Stringfield & Teddlie, in press). This enterprise has faced a consistently daunting set of challenges, ranging from process and product definitions and instrumentation to The Implementation Gap (Supovitz & Weinbaum, 2008). Fortunately, the urge toward documented educational reform is strong, and a broad range of persons has periodically documented success. Perhaps the best documented of these is Success for All (SFA, Slavin & Madden, 2000; Borman et al., 2007). However, Borman, Hewes, Overman, and Brown (2003) conducted an extensive meta-analysis of over two dozen whole school reform designs and hundreds of studies in the United States, and found that most lacked empirically strong evidence of their effectiveness. The juxtaposition of stable correlates with often frustrated change efforts—nowhere more clearly discussed than at ICSEI—raises the potentially troubling specter taught to first year graduate students, that ―correlation is not causation.‖ Perhaps the ―correlates‖ are real but not useful as levers, or perhaps these correlates can be among the levers but are necessary but not sufficient. What has been needed has been a series of methodologically rigorous studies testing the ability of school effectiveness correlates as affecters of schools’ levels of effectiveness. The Effective Schools for the 21st Century (ES-21) project The purpose of ES-21 was to test the ability of seven school effectiveness correlates (Taylor & Bullard, 1995) to improve the academic achievement of students attending relatively 1 Mark Twain is considered by some to have been both America’s finest humorist and finest novelist. He was a young man when America’s Civil War broke out, and for less than a week he was a member of a Southern militia. Twenty years after the war he was asked by a magazine to write an almost certainly humorous “private history” of his military “service.” In 1885 he authored, “The private history of a campaign that failed.” It remains a classic short story in the American store of literature, and its title serves as a “jumping off point” for a reflection on the study to be presented at ICSEI. 2 high poverty schools in the United States. The remainder of this paper will describe the research methods, the intervention, the study’s results, and possible conclusions. Design The overarching design for ES-21 was a ―gold standard,‖ random control trial (RCT) effectiveness study. As described by Shadish, Cook & Campbell (2002), proactive random assignment ―reduced the plausibility of alternative explanations for observed effects‖ (p. 247). For several years in the early evolution of the federal ―What Works Clearinghouse‖ (WWC, http://ies.ed.gov/ncee/wwc/) a substantial priority was placed on randomization of units of analysis2. Having seen the challenges faced by Robert Slavin and others in attempting to create a truly RCT design for a federally funded study of Success for All (Borman, Slavin, Cheung, Chamberlain, Madden, Chambers, 2007), and Cook’s RCT and near-RCT studies of the Comer School Development Program (Cook, Habib, Phillips, Settersten, Shagle, & Degirmencioglu, 1999; Cook, Murphy, & Hunt, 2000), the team was curious and guardedly optimistic about conducting a school effectiveness RCT. The unit of analysis was the school. Power analyses indicated that the project would need at least 30 schools (15 experimental and 15 control) to have a reasonable probability of finding significant differences where significant differences existed, so the design called for random assignment of four sets of four experimental and four controls in a total of four states (with apologies to T.S. Elliot, ―The Four Quartets‖). Though the federal Institute of Educational Sciences (IES) had not made the distinction at the time of the proposal to conduct this study (e.g., early 2004), today IES describes such an ambitious project as ―Goal Four‖ or ―effectiveness trial.‖ IES effectiveness trials assume a fully developed and previously implemented interventions in which the majority of funds go to the research effort, and not to development. IES funds effectiveness trials at up to $6 million over five years (IES, 2009), more that quadruple the level of funding of ES-21. Sampling Frame and Final Sample The primary conditions for the sampling frame were as follows: 2 While the WWC retains an understandable preference for RCTs, it has evolved an increased tolerance for carefully designed and conducted quasi-experiments. In a keynote speech at the Society for Research on Educational Effectiveness, Tony Bryk declared that he never again wanted to hear the phrases “Randomized Control Trial” and “Gold Standard” uttered in the same paragraph. It would appear that the primacy of the RCT is waning. 3 a. A key issue in creating a ―gold standard‖ true experiment is the random assignment of units (in ES-21, schools) to experimental and control conditions (see Shadish, Cook, & Campbell, 2002, chapters 8 and 9). In the proposal to the Olin Foundation, we proposed conducting such a true experiment. Hence, the first sampling requirement was that the LEAs and schools potentially involved had to agree to forward twice as many schools as would eventually be provided services, and to let the research team either flip a coin or consult a table of random numbers to choose experimental and control schools. b. Given the power analysis’ identification of a minimum school sample size of 30, the team proposed obtaining participation from 32 schools set as four groups of four experimental and four controls in each of our districts (the experimental sample would thereby be ―Four Quartets.‖). c. The schools were to come from four states. This was to minimize the likelihood that changes in policy in any one state would dramatically affect results in the over-all ES-21 study. (For an example of such disruption, see Datnow, et al., 2003). d. To the extent practical, all schools in any one state were to come from the same LEA. The goal was to ensure that all schools, experimental and control, within each state operate in the same local policy environment. When the team was unable to fill the frame with eight relatively high poverty schools per district, an exception was made for two small LEAs in South Carolina. The two LEAs offered four relatively high poverty schools each that met the criteria. In that case, the matched controls for each experimental school did come from their same LEA. e. The effective schools movement and research base have both had a strong equity focus, and the sample was drawn with a conscious effort to represent that perspective. To the extent possible, all schools were to serve student populations that were above their LEAs average for poverty (measured by percentages of students qualifying for the federal free and reduced-price meals program, FARM). f. At the student level, the ES-21 sample consisted of all of the students in the second and third grades of all experimental and control schools at the beginning of the project. These samples were followed for three years. Students who left the experimental or control schools were dropped from the sample, and the sample was not refreshed with entering students. (Data on initial and final sample characteristics are presented later in this report.) The team was aware of the challenges that had been faced both by the Success For All (SFA) RCT (Borman, et al., 2007) and others. For the SFA project, identification of a large enough sample had required repeated adjustments to the offer made to schools and LEAs (including, at one point offering all schools that ―lost‖ the coin toss and could not implement SFA for three years, a cash consolation prize of over $10,000 each.). However, we believed that because the Effective Schools procedures would be less prescriptive than those in SFA, and because many districts had already implemented some components of the original School 4 Effectiveness principles, we would face fewer challenges in identifying a full sample of 32 schools. We were wrong. Over the first 18 months of the ES-21 project, the team invited 17 LEAs in six states to nominate schools for participation. All expressed initial interest, but 12 declined to participate. In each case, the stated cause of their declining to participate was the random assignment requirement. The 12 LEAs were unwilling to have either coin flips between demographically matched schools or use of a table of random numbers to determine which schools would or would not participate. In the end, all five of the LEAs that agreed to participate did so in part as a result of longstanding relationships with one or more of the research team members. In Kentucky and South Carolina, the prior connection was with Dr. Stringfield and his associates. Dr. Schaffer’s prior relationships were critical in the North Carolina LEA, and Dr. Chrispeels’ prior work with a specific district in California resulted in entre into that district. In every case, the stumbling block that had to be overcome through careful negotiation was the issue of random assignment of schools to treatment or control conditions. (In one case, even after the LEA agreed to participate, a central office person wanted to ―pick‖ which schools were ―randomly assigned‖ to which condition. The ES-21 team insisted on controlling the random assignment process.) Once districts and subsets of their schools had agreed to the randomization process, schools were demographically matched, and involvement in the experimental vs. ―business-asusual‖ comparison sites was determined through coin flips. Two further sampling facts merit noting. First, the California district was sufficiently enthused about the project that they requested that the study admit 10 schools to the assignment process and five to the experimental condition (as contrasted to the ―8 and 4‖ specified in the design). This request was granted, so that the total number of schools in the study rose to 17 experimental and 34 total. Second, one school’s principal apparently agreed to participate in the project on the presumption that her school would be a control. In fact, the school was randomly assigned to the experimental condition, and that school withdrew from participating in the training at the end of year one. That school has been dropped from third year analyses. Students. Two cohorts of students were followed for three years in each school. Second- and third-graders (in the U.S., typical ages 7 and 8) were followed for three years. Outcome Measure. None of the districts agreed to allow the research team to administer a study-wide standardized achievement measure, so four different state’s stat-wide tests were accepted as outcome measures. States’ and Local Education Authorities (LEAs) norm5 referenced achievement tests in Reading and Mathematics were gathered and used in all outcome analyses. This essentially changed the outcome-focused portion of the study from being one study with 34 schools to four studies with 8-10 schools each. Four cycles of achievement data (pre-, and end of years 1, 2, and 3) were gathered on all students who remained in their cohorts over the three years. Process measures. In addition to the quantitative data, extensive observations of professional development activities, interviews with principals, and with focus groups of teachers were gathered and analyzed, following Miles and Huberman (1994). These were gathered by teams of observers and interviewers over the three years of the project. Additional data was gathered from the developers and by the developers during this period. Process data was analyzed by a team of qualitative specialist whos major findings are reported independently of this report and in a dissertation (Pickup,2010) on the implementation of the program in the Kentucky schools. The ES-21 Intervention: Content The content of the intervention was to be a state-of-the-art compilation of school effectiveness information (ex., Datnow, Lasky, Stringfield, & Teddlie, 2006; Purkey & Smith, 1983; Teddlie & Reynolds, 2000; Taylor & Bullard, 1995). The core content of the three years of professional development was derived from Taylor’s review of correlates of effective schools, ―OHCFISH,‖ (pronounced ―oh see fish‖): Opportunity to Learn/Time on Task High Expectations Clear School Mission Frequent Monitoring Instructional Leadership Safe and Orderly Environment, and Home-school Relations. The initial development team, primarily Drs. Stringfield, Charles Teddlie, Debbie McDonald, and Janet Chrispeels, concluded that given the changing contexts of the 21 st century (especially the addition of the federal No Child Left Behind act, NCLB) and the ongoing evolution of the school effects/school improvement research bases (Teddlie & Reynolds, 2000; and others) that three additional dimensions needed to be added to the most fundamental aspects 6 of the OHCFISH definition of school effectiveness. The first was an increased focus on standardized test scores. Under NCLB all schools are required to gather and report scores on state-chosen tests for over 95% of all students in all grades, both in total and disaggregated several ways. The first result was an increase in testing in most states, and the second was greatly increased focus on raising the scores on each state’s tests for individual students and various aggregations of students in each school. This has led to a greater national focus on the use of test scores as part of the ―Frequent Monitoring‖ component of effective schools correlates. It has also led to a remarkable level of ―teaching test-taking skills‖ and ―teaching to the test.‖ For the ES-21 team, the important point was to increase our focus on creating maximally thoughtful, student-relevant use of data as a major component of virtually every phase of the project. To have done otherwise would have been to be seen as irrelevant in many contexts. Second was explicit involvement of the school district. If the district administration doesn’t signal the importance of a reform and ―buy in‖ to the steps necessary to initiate and sustain a reform, then the reform’s chances of deep implementation and long-term survival are minimal (Datnow & Stringfield, 2000; Datnow, Lasky, Stringfield, & Teddlie, 2005). Similarly, Chrispeels, Castillo, and Brown (2000) had pointed out the value of working through ―leadership teams‖ consisting of principals and groups of key teachers, when working to improve elementary schools. Hence, we explicitly added leadership teams as the prime movers of change within individual schools. Finally, Nunnery (1997) had reviewed over a half century of research on school change efforts and had concluded that in virtually every study, variance in levels of implementation had been much larger than developers had anticipated, and had been an excellent predictor of the levels of success achieved by individual schools (See also Stringfield, Millsap & Herman, 1997; Supovitz & Weinbaum, 2008). Hence, the team worked to build higher levels of reliability into the implementation process3. Initially, the team believed that adequate or nearly adequate training materials could be used ―off the shelf.‖ For example, the Olin Foundation had paid Phi Delta Kappa International, to develop a several inch thick set of materials for trainers to use in running school effectiveness workshop series. Similarly, a group had been funded at the University of Wisconsin to operate a School Effectiveness (SE) center. Dr. Larry Lezotte and others had offered SE training for years, and had a great many materials that Dr. McDonald believed we could use. This proved to be an overly-optimistic assessment of each aspect of the situation. An early, expensive (in time and resources) lesson was learned. The PDK materials were inadequate and existing SE teams would not share their materials (although Dr. Lezotte did sell 3 For an alternative example of efforts to build High Reliability Organization processes into a school improvement design, see Stringfield, Schaffer, & Reynolds (2008) 7 dozens of volumes of one of his books to the ES-21 project.) ES-21 would have to develop its own materials. The ES-21 project hired a nationally-known trainer to pull together materials for the three years of workshops. After examining the Phi Delta Kappa materials, the intended-to-be-full-time trainer and Dr. McDonald determined that those materials were not sufficiently developed and refined for practical usage. (This judgment was later agreed to by Drs. Chrispeels and subsequent developer/presenters.) We then examined a range of other materials. After six months of minimal progress and a range of other issues, the services of the first developer/trainer were discontinued, and the team hired Dr. Eugene Schaffer of the University of MarylandBaltimore County to work with Drs. McDonald and Chrispeels on further development. Dr. Schaffer offered decades of experience working on a range of teacher- and school-improvement projects, including a nine-nation school effects study (Reynolds, Creemers, Stringfield, Teddlie, & Schaffer, 2002) and the High Reliability Schools reform effort in England and Wales (Stringfield, Reynolds, & Schaffer, 2008). Drs. McDonald and Schaffer, aided by Dr. Stringfield and, over time, Drs. Janet Chrispeels, Peggy Burke and others, began turning research on School Effectiveness into a three-year series of workshops. While a rough outline of three years was completed during the first 12 months of the grant, substantial refinements of the process continued through the full three years of implementation. The project was hampered by the fact that several of the processes necessarily were still being developed and/or significantly modified over the course of the project. The ES-21 Delivery Process The ES-21 content was delivered through a multi-layered, multi-school, three-year training of leadership teams. The leadership teams included members of each Local Education Authority (LEA), each school’s principal, and a teacher leadership team from each school. The teacher leadership team was to include one teacher from each grade. Training was offered at each of the five LEAs. The choice of engaging multiple layers of the local educational hierarchy was based on relatively recent research (ex., Datnow, Borman, Stringfield, Rachuba, & Castellano, 2003; Datnow, Lasky, Stringfield, & Teddlie, 2006;) that each of the levels needed to be actively engaged in the reform, and that their efforts needed a coordinated focus. These multi-layered teams received training from a team of two well prepared, highly experienced trainers on multiple occasions throughout each school year. Typically, these within-district professional development experiences were offered in six full-day workshops spread over each of the three school years. An additional layer of training was provided for the sets of local leaders via annual crossdistrict trainings. This final level of design-team-led, cross-LEA training copied training provided by virtually every comprehensive school reform design group in the U.S. (ex., Success for all (Slavin & Madden, 2000), the Comer School Development program( Comer, 2004); the New American Schools designs (Stringfield, Ross, & Smith, 1996) and the U.K (Stringfield, 8 Reynolds, & Schaffer, 2009)). The implementation model called for the trained local leaders who had participated in the six days of training each year to then provide professional development and coordinated ES-21 leadership for the remaining teachers in the ES-21 experimental schools. The control schools were to receive no intervention, and continued in a ―business as usual‖ fashion. This topic will be re-visited in the findings section. Results Qualitative findings. Implementation was uneven across the sites with changes in administration and teaching staff a powerful influence on the ability to make significant progress in implementation of the elements of ES 21. Many changes were incorporated in classrooms particularly the use of data as a decision-maker in the schools. The translation of data into instructional change was less obvious and appears to be a complex outcome that involves many efforts on the part of classrooms, grades and schools. The use of horizontal and vertical ―slice‖ or integration of curriculum based on student learning proved to be an eye opener for most teachers and schools and takes significant efforts to build an effective instruction that crosses grades and classrooms. At the grade level and class room level, the impact of ES21 was diffused or not understood by all teachers as a reform. This may not be a negative as the integration of the process and direction of the reform into the very fabric of the school. Quantitative (Achievement Test Analyses) Because each state used a different testing program, and because some states either change testing series across grades or have changed tests during the years of the ES-21 project, all achievement scores for all students (and aggregated to schools and experimental conditions) have been normalized and are presented as Z-scores. The effect is that in each case, the students from a given district will have scores with a mean of 0.0 and a standard deviation of 1.0. This follows the logic of ―true experiments,‖ in that the comparisons are exclusively between experimental and control students, not against a state or national norm. The study design called for following two cohorts of elementary students over three school years. The cohorts were in second and third grades as the study began. Given that no district would allow the research team to conduct fall of second grade testing, none of the second grade cohorts have a pretest. Whereas analyses of students in third grade cohorts include free and reduced-price meals, race, and second-grade pretest scores as covariates, the second grade cohorts have no first-grade pretest. This necessarily puts greater emphasis on the third grade cohort, as the only cohort to have a full set of achievement measures, for pre- through post- (e.g., 9 from second through fifth grade). The research team gathered data on both cohorts, and regards the presence of two cohorts’ analyses as more valuable than one cohort. None the less, in all cases greater weight should be placed on results from the third grade cohort data. A full set of detailed quantitative analyses are available from the authors. A summary table of achievement contrasts (ES-21 vs. controls) by district and state is provided in Table 1. Table 1: Effective School for the 21st Century Three-Year Achievement Summary Significant Differences Favoring Experimental or Control Groups by State, Grade Cohort, and Content Area (Reading and Mathematics) _____________________________________________________________________________ State Content Area Reading Mathematics_______ Cohort Grade 2-4 Grade 3-5 Grade 2-4 Grade 3-5 Kentucky n.s. n.s. n.s. Controls Controls n.s. n.s. n.s. District 1 n.s. n.s. n.s. n.s. District 2 Controls n.s. Controls n.s. Controls Controls n.s. Controls North Carolina South Carolina California Overall, thirteen (13) of twenty comparisons produced non-significant results, and seven (7) produced results favoring the control schools. In this ―true experiment,‖ none of the analyses of achievement test scores favored the ES-21 groups of schools. 10 Discussion The three-year student achievement patterns summarized in Table 1 are counter-intuitive. Literally decades of research suggest that an effective schools intervention, delivered by strong, highly-experienced, practically-grounded trainers, and should produce achievement results favoring the experimental schools. But that did not happen. In this paper we consider possible explanations, in reverse order of what we believe to be their validity in explaining the findings summarized in Table 1. The explanations derive from a combination of formal and informal qualitative observations of the three years of ES-21 development and implementation. In the pages that follow, the explanations we regard as least likely are presented first. 1. “School-effectiveness”-based whole-school reform efforts are not valid. They do not “push the right levers.” We reject this explanation. There is far too much prior research indicating the value of the seven ―OHCFISH‖ correlates to justify this conclusion. Further, teachers and several principals reported valuing the workshops, and the California district has contracted with the ES-21 trainers to continue their work. The ES-21 program appears to have ―face validity.‖ 2. The schools and teachers didn’t “buy in.” If teachers and principals rejected the ideas in ES-21, the reform was doomed. Members of the research team have studied literally dozens of reform efforts, and a consistent finding from those efforts was that if teachers and principals harbor deep reservations about a reform, it is doomed. Beginning in year one, we conducted teacher focus groups and informal data gathering activities. Our team probed consistently, and yet heard very few reservations about ES-21. The substantial majority of teachers and principals in the ES-21 study reported buying into both the correlates and the leadership-team implementation approach. Further, the fact that 16 of 17 experimental schools continued in the program through three years of intervention suggests that most principals and teachers saw value in ES-21. 3. The intervention was not sufficiently intense to produce desired results. From the beginning, Dr. Chrispeels expressed the belief that the study lacked sufficient intensity to produce positive effects on state tests. She cautioned that six whole-day workshops per year with only telephone follow-up were simply not sufficient to produce the desired impact on student outcomes. Stringfield’s observation on the other side was that Stringfield, Reynolds, and Schaffer (2008) had delivered a similarly low-intensity intervention in two British local authorities and had gotten very substantial gains in student outcomes from a somewhat similar intervention. It seemed plausible that this effort could have similarly positive effects after a total of 18 days of workshops over a three year period. On the other hand, there were few competing efforts in the United Kingdom to diffuse the efforts of the reform. 4. Inadequate funding to provide a sufficiently intense intervention. Research teams that win Institute for Education Sciences ―Goal 4‖ (large-scale, random assignment, 11 5. 6. 7. 8. effectiveness trials), are funded at up to $6.0 million dollars, just for the research components. Goal 4 assumes the presence of a well developed program and supporting materials. That is over four times the funding for ES-21. The ―New American Schools‖ designs (see Stringfield, Ross, & Smith, 1996) each received millions of dollars in external support before being asked to serve more than a handful of schools. Certainly additional funding would have helped, and the ES-21 team applied for IES funding, but did not receive the additional, external support. The efforts need more time to show results on state test scores. This is plausible, and the research team intends to follow the schools over several more years to see if they produce a ―sleeper effect‖ not unlike was found in the English district in the High Reliability Schools project (Schaffer, Stringfield, & Reynolds, 2008). Knowledge from ES-21 training migrated into control schools as well. In the California schools, there was extensive documenting of the extent to which ES-21 concepts and processes were being picked up by all the schools in the district. There was less formal data suggesting that a similar ―bleeding over‖ of information and skill development in some of the control schools in the other districts. Obviously, if controls were doing more than ―business as usual,‖ their gains might be evidence of ES-21 effects. However, it is not intuitively obvious why control schools, receiving any ES-21 information in less depth, would achieve greater gains in several areas than the experimental schools. Frequent lack of systemic supports. Four of the five LEAs changed superintendents during the project, and all of the new superintendents had priorities that did not focus on ES-21. Schools often faced conflicting demands and worse. In one LEA, a year’s professional development days were carefully worked out in advance by the ES-21 team and the principals. At every step as the schedule was being developed, the LEA’s ES-21 liaison was included in the emails. Only after the principals, teachers, and ES-21 staff had agreed on what they believed was a fixed, final schedule did the LEA person inform the group that all workshops that had been planned on Mondays or Fridays (e.g., all of the workshops) would have to be rescheduled, because the LEA would not allow Monday or Friday professional development release time. While probably unintended, the message that principals and teachers inferred was that ES-21 wasn’t important to the LEA. Contradicting evidence was provided, however, by the California district. Their Superintendent came to virtually every workshop, repeatedly took extra steps to make the reform happen, and has now hired the California ES-21 team to train three more schools. Yet the California achievement data was, if anything, less encouraging than data from other, less-supported LEAs. Attempting to conduct an effectiveness trial while having to develop the intervention materials. Over the past five years, the Institute of Education Sciences has developed a multi-layered set of options for seeking funding 12 (http://ies.ed.gov/funding/ncer_progs.asp). A first level of funding involves examination of data sets to examine relationships among variables, and to potentially identify relevant, malleable variables. The second level, ―Goal 2‖ involves ―development‖ studies, in which programs that are theoretically promising but not well developed are given three years of funding to work out the specifics of a reform design in a few schools. ―Goal 3‖ is concerned with efficacy trials, in which fully developed reforms are tried out in a moderate number of schools. Goal 3 studies are funded for four years and up to $3 million. ―Goal 4‖ studies are ―gold standard‖ RCTs. They are funded for five years and up to $6 million. ―The Four Quartets‖ was proposed under the impression that multiple sets of fully developed, high-quality school effectiveness training materials existed and would be made available to the ES-21 team. This proved to be inaccurate. Members of consecutive development/implementation teams found the Phi Delta Kappa materials inadequately developed to be of practical use. Other ―effective schools‖ implementation teams that offer for-profit training declined to make their materials available for the project. As a result, the ES-21 team was left with doing ―Goal 2‖ development work while attempting to conduct a ―Goal 4‖ RCT effectiveness trial. ES-21 was conducted on a ―Goal 1‖ budget. The ES-21 team made substantial sacrifices throughout the project to avoid having to shut the project down before the completion of three years of intervention and data gathering. As one example, zero percent of the P.I’s salary has been charged to this grant in over two years. Were it not for the generous understanding of two consecutive College of Education deans, the project would have run out of funds. In retrospect, the Olin Foundation grant, for which we have all been very appreciative, was stretched far too thin. When additional development work, on-site intervention work, data-gathering and analytic work were needed, the efforts came in non-funded time-contributions. 9. The sheer instability in the ES-21 schools, and perhaps in much of American education today, made establishing and sustaining reform daunting at best and probably even more challenging than that. The ES-21 schools faced instability at every level. During three years of intervention, four of five LEAs changed superintendents. At least 12 of 17 schools changed principals, and one school had five principals in a single year. The total number of principals serving the 17 schools totaled nearly 40. In three years, several of the schools had over 100% turnover in their teacher leadership teams. A few had near-100% turnover annually. When one North Carolina ES-21 principal moved to the principalship of a newly opening school in the same district, she hand-picked over a half-dozen fine teachers from her original school to go with her. The excellent principal and half of the leadership team were gone in a single stroke. In the majority of cases, the persons replacing the original principals and teachers were competent, caring professionals. But in every case they either needed to be tutored in the 13 specifics of ES-21, or, understandably, they were less than fully supportive of the program. When attempting to provide professional development to middle school science teachers in a high poverty LEA, Ruby (2002) found that teacher turnover meant that entry-level training became a near-constant part of the task. In ES-21, the task was more complex, as the level of professional re-development could be at the superintendent, principal, or teacher levels, or all three at once. 10. The No Child Left Behind legislation has so altered the landscape that interventions must also be substantially altered. Throughout ES-21, developers, trainers, and evaluation team members were struck with the extent to which the schools and districts’ perceived sets of options had been altered by NCLB. a. In general, schools and LEAs now have a much greater focus on test scores and methods for raising test scores. Teaching ―test taking skills‖ appears to be a nearuniversal component of schools’ curricula. This was true in every state. It is possible that the ES-21 schools spent substantially less time in ―teaching to the test‖ than most controls. While this more balanced focus may have increased student achievement, it would do so without necessarily raising test scores in the experimental schools. b. Curricula in all districts were more focused on reading and mathematics. c. School days were altered to focus on areas measured under NCLB. d. ―Data‖ of all sorts, but particularly data from each state’s annual, NCLBmandated testing program is now ubiquitous in schools. Interestingly, districts professional development for teachers and administrators on the uses of data appeared to be of indifferent quality, and praise for the ES-21 data-use workshops was nearly universal. Aspects of those workshops were clearly being shared across several of the districts. Still, in follow-up interviews after the three years of intervention, ―data use workshops‖ were among the most remembered, most praised-by-teachers-and-principals parts of the program. e. Whereas in many previous studies conducted by the P.I., district officials had been content to let interventions ―run their course‖ and then evaluate the results; in the post-NCLB world, district staff were often eager to observe trainings and, to the extent they could, replicate aspects of the training in other schools, including control schools. Implications for Future Research and Practice Implications for Future Research The first implication of the ES-21 project for research is an endorsement of the current IES Goal sequence. ES-21 was to be, in IES’s current language, a ―Goal 4 effectiveness trial.‖ 14 Those should only be undertaken when the reform in all of its specifics is well developed and previously tested. The second cautionary implication is that the full costs of an effectiveness trial easily can be underestimated. A shortage of resources can hobble both the intervention and the research sides of a reform. Third, substantial efforts were made in ES-21 to protect both the internal and the external validity of the design. The random assignment of over 30 schools to experimental or control conditions was one of several steps taken to guard internal validity. The involvement of schools in four states was a deliberate effort to maximize external validity. Armed with 20-20 hindsight, the team would have been well advised to pick one or the other and settle, as Cronbach et al. (1981) advised, for making an imperfect but well-crafted contribution to the field, and not trying to produce a single, utterly-compelling trial. The fourth implication is a speculation. We wonder about the viability of school-level random assignment to experimental and control conditions. In the Borman et al. (2007) random assignment study of Success For All, the research team required two full years to identify a group of schools that would hold a teacher vote and get an 80% commitment to the program, and then agree to a coin-flip as to (first) participation and (in the second year of recruitment) to implementation in K-2 or 3-5, but not both. The resulting study produced a smaller effect than several previous SFA studies. It is possible that the smaller effect was the result of only involving schools that were willing to chance not being involved. Logically, such schools could be presumed to being less committed to the reform effort. Similarly, in the British High Reliability Schools project, the two districts that involved all the secondary schools in their districts in a team effort achieved laudable results (Stringfield, Schaffer, & Reynolds, 2008; Schaffer, Stringfield, & Reynolds, 2008), but the district in which half of the secondary schools participated got no benefit in terms of student outcomes. It is possible that a significant part of the value in a reform is achieved by all principals talking together about—and learning from—their shared experiences. Thirteen districts that wanted to participate in ES-21, and expressed a willingness to be highly supportive, withdrew rather than have their schools go through a lottery to see whether they would be allowed to participate in the reform. While one cannot know ―what would have been if…‖, one can know that the late Matt Miles most fundamental advice when considering involvement in a school reform was, ―Pick a reform and go at it hard!‖ If a group wants to go at something hard, why would they risk a lottery? While we fully endorse the logic of ―true experiments‖ at the individual student and classroom levels, we are inclined to believe that in the SFA case, and even more in this case, forcing a lottery system on potentially willing samples of schools may bias the selection process 15 generally, and specifically reduce the probability that schools that want to ―go at it hard,‖ will agree to participate. Implications for Future Reform Efforts First, picking up on the last research implication, if a school isn’t committed to a vigorous, long-term improvement process, they shouldn’t bother to start. Several of the ES-21 schools were curious and somewhat interested throughout, but never displayed a widely-shared passion for academic improvement. Perhaps the most driven, committed principal in the study used ES-21 to produce dramatic two year gains at her school. She was then promoted to central office and the hard-won gains collapsed. Second, groups seeking achievement gains should assume that they are making a commitment for several years. One, two, or even three years of work are more than a prelude but less than an institutionalized reform. Third, as Fullan and Miles (1992) observed, change is ―resource hungry.‖ It is not practical to assume that all of the responsibility for creating change will be borne by an outside group. Some of the ES-21 principals found ways to leverage the training ES-21 provided. Others did not. The external group can not do it alone. Fourth, NCLB is changing the educational landscape and the imperatives facing schools. ES-21’s P.I. edits a journal focused largely on Title I/NCLB, and yet he and the full team were struck with the extent to which a focus on raising scores on single state tests has come to dominate the decision processes in districts and schools. Obviously, to the extent that this forces a focus on the academic achievements of all students, this change is a very good thing. We saw some of that. However, to the extent that NCLB testing is producing a greater focus on test scores than on larger issues of deep learning for all students, the focus risks becoming a harmful monomania. While the balance is getting harder for educational leaders to find and maintain, surely it is worth the effort. We believe that, considered in total, the effective schools correlates can assist leaders in finding that balance. Conclusions 1. Schools that achieved an initial focus on achievement and set clear directions early on seemed to do better on state tests, so long as the school benefitted from stable leadership at both the principal and teacher leadership team levels. 2. Across the 17 ES-21 schools and five LEAs, leadership teams consistently reported that they had gained in their ability to work and solve problems as teams. In a variety of ways, teachers reported being more tightly networked for the purposes of solving practical, work-related problems. This is a likely correlate of institutionalization of 16 continuous progress in any organization. This observation bodes well for the long-term impact of ES-21. 3. The substantial majority of participating teachers and principal reported thinking ES-21 was valuable for their schools and for them personally. 4. Teachers and principals found the training and experiences in teaming very helpful. They reported spending more time having much more extended and thoughtful conversations about content and about the acts of teaching. Many teachers reported having never observed in other schools and other classrooms, and having found in their own teaching. 5. Similarly, both teachers and principals reported that the intensive training in data analysis and in the folding of quantitative training into their own school and classroom practices very helpful. 6. In several schools, both teachers and principals reported that they thought that the ES-21 experiences put them several years ahead of their within-district colleagues in terms of upcoming change efforts. Many reported that their districts were in the process of moving to teacher leadership teams, and to increased data use, and that they believed that they were years ahead in these reforms. 7. More than one teacher and principal in the eastern LEAs reported feeling that the first year implementation processes, while valuable, occasionally felt disjointed. (In fact, some of the workshops were being developed in something very close to ―real time.‖ Reassuringly, this was not a comment heard in the California schools, which were implementing a year later. 8. Goals may change with leadership 9. Reforms were based on funding 10. LEA support was generally lacking, but was not effective when present 11. A new leadership model was used with the implementation of the program 12. Data was either not in place or ineffectively used for instruction when available 13. Personnel was either seen as interchangeable or recruited away from schools 14. Formal leadership was often replaced or essential positions left unfilled. 15. Leadership was often neophytes hired into very challenging positions 17 References Aikin, W. M. (1942). The story of The Eight Year Study. New York: Harper. Borman, G., Hewes, G., Overman, L., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73 (2), 125-230. Borman, G., Slavin, R., Cheung, A., Chamberlain, A., Madden, N., Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Educational Research Journal, 44(3) 701-731. Chrispeels, J., Castillo, S., & Brown, J (2000). School Leadership Teams: A Process Model of Team Development. School Effectiveness and School Improvement, 11(1), 20 – 56 Datnow, A., Borman, G., Stringfield, S., Rachuba, L., & Castellano, M. (2003). Comprehensive school reform in culturally and linguistically diverse contexts: Implementation and outcomes from a four-year study. Educational Evaluation and Policy Analysis, 25(2), 143-170. Datnow, A., Lasky, S., Stringfield, S., & Teddlie, C. (2005). Systemic integration for educational reform in racially and linguistically diverse contexts: A summary of evidence. Journal of Education for Students Placed At Risk, 10 (4), 441-453. Datnow, A., & Stringfield, S. (2000). Working together for reliable school reform. Journal of Education for Students Placed At Risk, 5(1&2), 183-204. Edmonds, R. (1979). Effective schools for the urban poor. Educational Leadership, 37, 15-27. Fullan, M.G., & Miles, M.M. (1992). Getting reform right: What works and what doesn’t. Phi Delta Kappan, 73 (10), 744-752. Hargreaves, A., & Fink, D., (2006). Sustainable Leadership. San Francisco: JosseyBass/Wiley. Miles, M B., & Huberman, A. M. (1994). Qualitative data analysis. 2nd Edition. Thousand Oaks, CA: Sage. Nunnery, J. A. (1998). Reform ideology and the locus of development problem in educational restructuring. Education and Urban Society, 30 (3), 277-295. Slavin, R. & Madden, N. (2000). One million children: Success For All. Thousand Oaks: Sage. Stringfield, S., Reynolds, D., & Schaffer, E. (2008). Improving secondary students' academic achievement through a focus on reform reliability: 4- and 9-year findings from the High 18 Reliability Schools project School Effectiveness and School Improvement, 19(4), 409428. Stringfield, S., Ross, S., & Smith, L. (Eds.). (1996). Bold plans for school restructuring: The New American Schools designs. Mahwah, NJ: Lawrence Erlbaum Associates. Supovitz, J. & Weinbaum, E. (2008). The implementation gap. New York: NY: Teachers College. Taylor, B., & Bullard (1995). The revolution revisited. Bloomington, IN: Phi Delta Kappa. Teddlie, C. & Reynolds, D. (2000). Handbook of research on school effectiveness and improvement. London: Falmer. What Works Clearinghouse. Downloaded July 1, 2010 from http://www.whatworks.ed.gov/. 19
© Copyright 2026 Paperzz