R305R070025 1 Validation of the Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol & Every Classroom, Every Day Efficacy Trial Final Report on R305R070025 (Originally titled: Scaling Up the First Things First Reform Approach) Edward L. Deci, Principal Investigator Submitted to the Institute of Education Sciences U.S. Department of Education Edward L. Deci Diane M. Early J. Lawrence Aber Richard M. Ryan Juliette K. Berg Stacey Alicea Yajuan Si R305R070025 2 Table of Contents I. Overview of Accomplishments ................................................................................................. 5 Overview of the Reformulated Grant Research ........................................................................ 5 Accomplishments Regarding Validity of the EAR Classroom Visit Protocol Project ............. 5 Accomplishments Regarding the ECED Efficacy Trial Project ............................................... 7 II. Introduction to the Research ................................................................................................ 10 History of Every Classroom, Every Day ................................................................................ 11 Meta-Analyses Regarding Middle and High School Instruction ............................................ 14 Multi-Faceted Approaches to Improving High School Instruction ........................................ 17 Professional Development to Change Teacher Practices and Increase Student Achievement 23 Every Classroom, Every Day (ECED): The Model ................................................................ 26 Hypotheses .............................................................................................................................. 42 III. Method................................................................................................................................... 44 Study Design ........................................................................................................................... 44 School Selection...................................................................................................................... 45 Study Teachers ........................................................................................................................ 49 Study Students ........................................................................................................................ 54 Teacher Questionnaires ........................................................................................................... 61 Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol .................................. 68 Student Questionnaires ........................................................................................................... 76 Student Demographics ............................................................................................................ 82 Course Enrollment .................................................................................................................. 83 Standardized Test Scores ........................................................................................................ 83 Student Performance (Attendance, Credits, GPA) ................................................................. 89 Missing Data ........................................................................................................................... 89 Suitability of the Data for Analyses ........................................................................................ 95 Analysis Plan .......................................................................................................................... 96 Unconditional Models ........................................................................................................... 101 IV. Implementation ................................................................................................................... 105 Measuring Variation in Implementation ............................................................................... 105 Implementation Strengths and Weaknesses .......................................................................... 109 R305R070025 3 V. Results for Teachers’ Attitudes, Experience, and Observed Practice ............................ 111 Data Analytic Strategy .......................................................................................................... 111 Results for Math Teachers .................................................................................................... 114 Results for ELA Teachers ..................................................................................................... 126 VI. Impacts on Students ........................................................................................................... 128 Data Analytic Strategy .......................................................................................................... 128 Point-in-Time Analyses Predicting Students’ Attitudes Toward School ............................. 133 Point-in-Time Analyses Predicting Individual Student Survey Scales................................. 135 Point-in-Time Analyses Predicting Math and ELA Achievement ....................................... 136 Point-in-Time Analyses Predicting Performance (GPA, Credits, and Attendance) ............. 139 Growth Curves Predicting Students’ Attitudes Toward School ........................................... 142 Associations Between ECED Implementation and Student Outcomes Across All Study Schools ............................................................................................................................ 147 Associations Between ECED Implementation and Student Outcomes in Intervention Study Schools ............................................................................................................................ 150 VII. Implementation and Data Collection Challenges........................................................... 154 Challenges in Recruiting Schools to Participate ................................................................... 154 Challenges in Implementing ECED ...................................................................................... 156 Data Collection Challenges................................................................................................... 161 Implications of the Challenges for the Impact Evaluation.................................................... 163 VIII. Discussion ......................................................................................................................... 164 Summary of Most Important Findings .................................................................................. 165 Implications........................................................................................................................... 166 Strengths of the Design and Analyses .................................................................................. 171 Limitations ............................................................................................................................ 172 Future Analyses .................................................................................................................... 173 Conclusions and Recommendations ..................................................................................... 175 IX. References ............................................................................................................................177 Appendix 1: Change in Project Focus ..................................................................................... 184 Appendix 2: Findings From the First Component of Revised Project: Validation of the EAR Classroom Visit Protocol ............................................................. 186 Appendix 3: Sample Memorandum of Understanding ......................................................... 226 Appendix 4: Recruitment and Participation Diagram ......................................................... 236 R305R070025 4 Appendix 5: Teacher Survey Items ......................................................................................... 237 Appendix 6: EAR Protocol Training for ECED Efficacy Trial Data Collectors ................ 248 Appendix 7: Student Questionnaire Administration Procedures ........................................ 249 Appendix 8: Student Questionnaire Items ............................................................................. 251 Appendix 9: Restructuring the Course Files for Use in Analysis ......................................... 254 Appendix 10: Test Scores Received for Grade Cohort 1....................................................... 256 Appendix 11: State-Specific Decisions Regarding Combining Test Scores ........................ 258 Appendix 12: Indicators of Variation in Implementation..................................................... 260 Appendix 13: Teacher-Level Correlations Among Outcome Variables ...............................264 Appendix 14: Student-Level Correlations Among Outcome Variables ...............................266 Appendix 15: School-Level Correlations Among Outcome Variables .................................268 Appendix 16: Child-Level Interactions ....................................................................................274 Endnotes......................................................................................................................................275 R305R070025 5 I. Overview of Accomplishments Originally, this grant was funded as a Randomized Field Trial to evaluate the effectiveness of the First Things First (FTF) approach to Comprehensive School Reform in high schools that serve large percentages of disadvantaged students. That reform was designed and implemented by the Institute for Research and Reform in Education (IRRE). In the first year of the grant we encountered major difficulties in relation to the FTF project, so we worked with the program division of NCER to reformulate the research. See Appendix 1 for a discussion of the difficulties and the process of reformulating the grant research. Overview of the Reformulated Grant Research The reformulation was based on the idea that high-quality classroom instruction is the core of effective schooling. Indeed, the National Research Council’s Committee on Increasing School Students’ Engagement and Motivation to Learn (2004) argued forcefully that, although school-level policies and efforts to restructure schools may benefit students in a myriad of ways, student learning is most directly and deeply affected by how and what teachers teach. The reformulation had two projects related to high-quality classroom instruction: (1) a validation study of the Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol to assess the quality of classroom instruction; and (2) a school-level randomized field trial (S-RFT) to examine the efficacy of the instructional improvement strategy contained within FTF, referred to as Every Classroom, Every Day (ECED) when it is implemented without the other two FTF strategies, namely, Small Learning Communities and a Student and Family Advocacy System. Accomplishments Regarding Validity of the EAR Classroom Visit Protocol Project The project to assess inter-rater reliability and predictive validity of the EAR Classroom Visit Protocol had two studies. In Study 1 of this first project, district personnel, professional R305R070025 6 development providers, and outside educators conducted 2,171 EAR Protocol visits to all types of courses in four racially and economically diverse high schools. Intra-class correlations indicated adequate inter-rater reliability. The observations that were conducted in classes for math (125 observations of 33 teachers) and English/language Arts (ELA) (102 observations of 25 teachers) were analyzed in multi-level models (using Hierarchical Linear Modeling) to predict students’ standardized achievement test scores, controlling for their previous year’s scores. Findings indicated that engagement, alignment, and rigor were each significantly (p < .05) or marginally (p < .10) associated with math and ELA achievement scores, with standardized coefficients ranging from .06 to .17. Additionally, students’ self-reports of their engagement in school were predictive of test scores in models that also included perceived academic competence as well as observed engagement, alignment, or rigor. Study 2 of this first project was intended to replicate the predictive-validity findings of Study 1 using data collected in eight racially diverse, low-income high schools. Study 2 included 261 observations of 63 math and 64 ELA teachers. Observed engagement again emerged as a significant predictor of test scores in both subjects, but alignment and rigor did not. The results of the two studies of this first project assessing the validity of the EAR Classroom Visit Protocol are summarized in a manuscript that is under editorial review. The complete manuscript appears in Appendix 2 of this report. The second project of the reformulated grant represents the core of the grant research and of this Final Report. It sought to evaluate the Every Classroom, Every Day (ECED) instructional improvement model for high school math and literacy instruction, with the goal of increasing standardized achievement test scores in these two areas. As noted, the components of ECED were first implemented as the Instructional Improvement strategy of the First Things First approach to school reform, so this efficacy trial of ECED represents the first time these R305R070025 7 components were implemented in the absence of First Things First’s other two components. Accomplishments Regarding the ECED Efficacy Trial Project Central to the intervention were the concepts of engagement, alignment, and rigor as markers of high-quality instruction. Across math and literacy content areas, teachers received professional development and on-going coaching and supports from external change agents and internal school personnel to teach in ways that are more Engaging for students, more Aligned with local, state, and federal standards, and more Rigorous for all students (EAR). Math teachers also received initial training and ongoing supports in IRRE’s “I Can…” Math Benchmarking process and tools. Literacy teachers were provided with a two-year curriculum called Literacy Matters and wrap-around supports in its use. Trained independent raters, blind to the intervention status of the schools, observed classrooms as an important source of research data. Teachers and students completed questionnaires, and school districts provided information about students, including demographic characteristics, scores on standardized achievement tests in math and English/language arts (ELA), progress toward graduation, grade point average (GPA), and attendance. Our primary hypothesis was that ECED’s instructional improvement interventions would improve math and ELA achievement as measured by standardized test scores and analyzed experimentally. Secondary hypotheses were: that the intervention would improve other student performance outcomes such as attendance, GPA, and progress toward graduation, and would enhance teacher and student attitudes. Finally, we hypothesized that both the fidelity of implementation and the number of semesters students were in intervention schools would predict better school outcomes in non-experimental analyses. Twenty high schools (5 districts, 4 schools per district) were randomly assigned to either R305R070025 8 the treatment (n = 10) or control (n = 10) condition, with two high schools from each district being assigned to each condition. There were five primary types of data collected: (1) student surveys, (2) teacher surveys, and (3) classroom observations, using the EAR Classroom Visit Protocol, collected in four waves, once near the start and once near the end of each of the two academic years that the school participated; as well as (4) variation-in-implementation interviews with math and literacy coaches and/or department chairs and the school principal or assistant principal, and (5) student records collected twice, once at the end of each academic year that the school participated. Findings from this evaluation provide some evidence that ECED was efficacious in improving student achievement in math. Students in the treatment schools scored significantly (p = .04) higher on standardized tests of math than did students in the control-group schools, after controlling for pre-intervention math achievement and school district, although that result became only marginal (p = .06) when student demographic controls were added to the models (see p. 138). In addition, there was some evidence that fuller implementation of ECED was linked to better student outcomes such that students in schools with higher implementation scores had marginally higher math achievement (see p. 149), significantly higher grade point averages (see p. 152), and significantly higher credits toward graduation (see p. 152). As well, the number of terms students were enrolled in ECED schools moderated the association between implementation and attendance, such that those students with more semesters in schools with higher ECED implementation had better attendance (p. 149). In contrast to these impacts, the ECED intervention did not improve achievement in English language arts (see p. 138), nor was the degree of overall ECED implementation related to ELA achievement (see p. 149 & p. 152). Preliminary analyses of the classroom observations for math indicated that across the two years, R305R070025 9 schools that participated in ECED showed increased rigor of math classroom instruction relative to control-group schools (see p. 123 & 125). Although there was indication that the ECED intervention led to some positive outcomes, including math achievement and rigor in math classrooms, there was also evidence that these improvements might have come at some cost to students and teachers. Among students in ECED treatment schools, students who were enrolled more terms reported worse attitudes toward school; whereas among students in control-group schools, those who were enrolled more terms reported better attitudes toward school (see p. 135). This finding appears to be primarily driven by a single component of student attitudes regarding school, namely perceived academic competence (see p. 136), which could be a function of the significant increase in the rigor of math instruction observed in the ECED treatment schools. Further, math teachers in treatment schools reported less mutual support among colleagues than did those in control schools (see p. 117). And, across the two years of the study, math teachers in ECED schools who taught more semesters of the courses targeted by the intervention reported that their districts’ leadership was less committed to change, while the opposite was true for teachers in control schools (see p. 117). Finally, based on preliminary analyses of the classroom observations it appears that ECED had a negative impact on observed student engagement in math classes (see p. 123 & p. 125). No effects of ECED were seen for ELA teacher attitudes and experiences (see p. 127). In sum, ECED Math appears to be a valuable path to improved student standardized test scores in math. In schools from five districts in four states, which were fraught with problems and served high percentages of students from economically disadvantages homes, the use of the ECED math approach resulted in improved math scores relative to control-group schools. We found this effect with a relatively small sample size (10 schools per condition) and stratified R305R070025 10 random assignment within districts, leaving only about 14 degrees of freedom, which is extremely low for finding statistically significant effects. Further, of the 10 treatment schools, two had dropped out by the end of the first year, seriously weakening the intervention implementation. Still there was indication that the approach did enhance achievement in math. Nonetheless, the intervention did have some effects on teacher and student experiences and self-perceptions that are cause for concern, effects that might dampen ECED’s positive effects over time, although it is possible that the negative effects would fade away over time as students and teachers adjusted to the new instructional strategies and experienced improvement in outcomes. II. Introduction to the Research In this Introduction, we first describe the history of ECED. Next, we review the existing literature on models for improving instruction, followed by a review of the literature on effective professional development with teachers. Following those literature reviews, this chapter presents a detailed description of the Every Classroom Every Day (ECED) intervention. Following the Introduction, this report includes a chapter that details the Method of the ECED Efficacy Trial (Chapter III), a chapter describing variation in implementation of the ECED strategies in the treatment schools (Chapter IV), one summarizing the findings regarding changes in teachers’ attitudes and practices (Chapter V), another summarizing the findings regarding impacts on students (Chapter VI), and one outlining challenges IRRE faced in implementing ECED and challenges the research team faced in collecting the evaluation data (Chapter VII). The final substantive chapter is the Discussion, which summarizes the findings in the context of these challenges, the implications of the findings, the study’s limitations, and future plans (Chapter VIII). The Reference list is presented in Chapter IX. R305R070025 11 History of Every Classroom, Every Day Every Classroom, Every Day is the instructional improvement component of the First Things First (FTF) approach to school reform. First Things First (FTF)—developed by the Institute for Research and Reform in Education (IRRE)—is a comprehensive educational reform model with three key strategies: (1) Small Learning Communities (SLC), which involves breaking large schools into smaller, thematically focused “schools-within schools;” (2) the Family and Student Advocate System, which involves small groups of 15 to 20 students within an SLC who meet regularly with one teacher or administrator who serves as an advocate for the students in the group for the entire time the students are in the school and is the liaison to the students’ families; and (3) Instructional Improvement, which involves supporting teachers to create engaging learning activities that are rigorous for all students and aligned with the district, state, and national standards. Accordingly, engagement, alignment, and rigor (EAR) are the central elements used in assessing the quality of instruction and providing feedback to schools. To achieve improved instruction, administrators and teachers are provided with professional development on identifying and observing what engaging, aligned, and rigorous instruction looks like in the classroom and are trained to use a classroom observation protocol on an ongoing basis to provide feedback and to guide instructional activities. The proximal goal of FTF is to improve the quality of supports the teachers and students need to be successful. The distal goal is to increase learning, with a particular emphasis on literacy and math as evidenced by higher scores on standardized achievement tests as well as improved attendance, decreased dropout, increased graduation rates, and increased enrollment and completion of college. FTF also targets enhancement of non-academic factors, such as relationships with teachers, school engagement, and motivation. Improvements on these various outcomes are expected, in turn, to lead to R305R070025 12 increased academic achievement, and greater numbers of graduates enrolling in post-secondary education and going on to meaningful careers and citizenship. FTF has been evaluated in two quasi-experimental efficacy studies (Gambone, Klem, Summers, Akey, & Sipe, 2004; Quint, Bloom, Black, Stephens, & Akey, 2005), each concluding that the FTF model was very promising, with significant gains in some districts. Gambone and her colleagues compared outcomes in Kansas City, Kansas (KCK)—the first district to implement FTF—to those from all schools in other districts within the state. The mean effect size on achievement in KCK was .33, which compared favorably to the results of a meta-analysis of studies of the 29 most widely adopted CSR models, which found an effect size of .15 on student achievement (Borman, Hewes, Overman, & Brown, 2003). It is important to note, however, that the Gambone et al. study and 97% of the studies included in the Borman et al. meta-analysis were non-experimental, so it is likely that they overestimate the effect sizes. Gambone et al. also found evidence that White, African American, and Latino students all benefited from FTF, but students of Color benefited more than White students. Quint and colleagues used interrupted time-series analyses to investigate the impact of FTF in Kansas City, Kansas and four districts from other states at varied stages of FTF implementation. Comparison schools were selected for each intervention site, matched on preintervention test scores and other variables such as school size, racial/ethnic make-up, and eligibility for free or reduced-priced lunches. Findings indicated that in KCK high schools and middle schools academic outcomes improved substantially over comparison schools. The findings were inconclusive in the other four districts included in this study where FTF had been implemented for a shorter period of time. Further, neither the Gambone et al. (2004) study nor the Quint et al. (2005) study attempted to examine the individual components of FTF, such as the R305R070025 13 instructional improvement strategy, which was examined in the current trial. FTF’s Instructional Improvement strategy has evolved since it was first introduced. During the past few years an enhanced version, pilot tested in a small number of schools, led to substantial achievement gains when implemented within the context of the full FTF intervention. For example, four high schools in Kansas City, Kansas implemented the enhanced strategies and saw rapid acceleration in their student achievement gains both in mathematics and reading on the state’s high-stakes assessments. On the Kansas 10th-grade math achievement test, the percentage of students meeting or exceeding the state proficiency standard increased by over 30 percentage points (18% to 54%) over a four year period. On the Kansas 11th-grade reading test, the percentage of students meeting or exceeding the state proficiency standards during those same four years increased by 15 percentage points (from 46% to 61%) (Kansas State Department of Education, n.d.). However, these findings must be interpreted with caution because they are nonexperimental and could be explained by a number of factors other than FTF’s Instructional Improvement efforts. School test scores may have already been on upward trajectories, the population of students or teachers providing the instruction may have changed during implementation, or the schools may have implemented other changes that lead to these gains. The current study was designed to use a highly rigorous experimental design to evaluate the efficacy of the FTF’s enhanced instructional improvement strategy, called Every Classroom Every Day (ECED), in schools that are not implementing the full FTF reform model. Due to pressures of the No Child Left Behind federal legislation, prior to the start of this study, many districts and high schools had already undertaken some type of school restructuring effort (e.g., block schedules, reduced class sizes, 9th-grade academies) and viewed FTF as too comprehensive for their current needs. Instead, they were looking for ways to systematically improve instruction R305R070025 14 using a more targeted approach, especially in literacy and math. Based on the quasi-experimental and pilot work described above, Every Classroom, Every Day appeared to be a promising avenue to meet schools’ current needs. In sum, although there is some evidence that ECED strategies are linked to higher test scores within the context of the larger FTF reform, there has not been a rigorous experimental evaluation of their efficacy as an intervention. Further, prior to the current study, the strategies had never been used in schools that were not implementing the full FTF school reform model. The ECED Efficacy Trial described in this report is a school-level randomized field trial (S-RFT) involving 20 high schools. Half of the schools were randomly assigned to receive the ECED supports (n = 10); the other half were assigned to a ‘business as usual’ control condition (n = 10) in which schools continued with whatever school improvement efforts they already had underway without adding any elements of ECED. The primary outcomes of interest were students’ standardized test scores in math and English Language Arts (ELA), students’ attitudes toward school, observed teacher instruction, teachers’ experiences of support for instructional innovation and improvement at school, and the other student outcomes of grade point average, attendance, and progress toward graduation. Meta-Analyses Regarding Middle and High School Instruction 1 Several meta-analyses of studies conducted in middle- and high-schools suggest that broad-based reform models employing multiple methods and components may result in better literacy and math outcomes than those that are narrowly targeted on a single curriculum or instructional technique. For example, a comprehensive meta-analysis of effective reading programs for middle and high schools conducted by Slavin, Cheung, Groff and Lake (2008) systematically reviewed four types of approaches: (1) reading curricula, (2) mixed-method R305R070025 15 models (i.e., methods that combine various instructional approaches such as large and smallgroup instruction and computer activities), (3) computer-assisted instruction, and (4) instructional process programs (i.e., methods that focus on providing teachers with extensive professional development in implementing specific instructional methods such as cooperative learning and individualized instruction). Thirty-three studies met the inclusion criteria, which included the use of quasi-experimental or experimental designs (i.e., randomized or matched control groups), study duration of at least 12 weeks, and valid achievement measures independent of experimental treatments. Findings suggest that programs designed to change daily teaching practices had substantially greater impacts on student reading comprehension compared to those focused on curriculum or technology alone. Positive achievement effects were found for mixed-method programs and instructional process programs, especially those involving cooperative learning. Across nine studies involving approximately 10,000 students, the weighted mean effect size for mixed-method models was +0.23 in predicting reading comprehension test scores. Across seven studies of instructional processes, programs involving cooperative learning approaches to school reading had a weighted mean effect size of +0.28. In a meta-analytic review of mathematics programs for middle- and high-schools, Slavin, Lake, and Groff (2009) found similar results. Of the 26 studies that met inclusion criteria (e.g., use of a randomized or matched control group, study at least 12 weeks long, and quality at pretest), effect sizes were greatest for instructional process programs, such as cooperative learning and classroom motivation and management programs and other approaches that focused on changing teacher and student behaviors during daily lessons. For example, the median effect size for cooperative learning programs for middle and high school studies in predicting math test scores was +0.32. Studies of these instructional process programs were also more likely to have R305R070025 16 used random assignment to treatments. These findings were further supported by another metareview focused solely on Algebra I outcomes with similar inclusion criteria (e.g., quasiexperimental or experimental designs with comparison groups, targeted learning of algebraic concepts, student academic achievement measures) by Rakes, Valentine, McGatha, and Ronau (2010). They found that while a variety of math interventions resulted in increased student achievement in algebra, instructional strategies (i.e., cooperative learning, mastery learning, multiple representations, and assessment strategies) produced some of the higher effect sizes in comparison to other interventions. Importantly, all three meta-reviews highlighted a number of limitations. First, the majority of interventions that met inclusion criteria were relatively short term interventions in the range of 12 weeks to 1 year, with no data to ascertain their long term effects. Second, many of the interventions and their evaluations were poorly designed—small sample sizes, implemented at only one or too-few schools—resulting in insufficient power to detect effects and limited generalizability of impacts to other school contexts. Third, the vast majority of studies employed matching and randomization of samples primarily at the student-level, limiting the ability of evaluators to take into account the nested structure of the data (i.e., students are nested in teachers, classrooms, and schools) and, in turn, prohibiting a rigorous assessment of teacher and classroom-level impacts associated with interventions. Nonetheless, existing meta-reviews provide some evidence that instructional learning processes, specifically cooperative learning approaches that target teachers’ instructional behaviors rather than math or literacy content alone, produce better literacy and math outcomes for middle and high school students. Still, findings suggest relatively small gains on traditional student-level academic outcomes. Given findings across literacy and math reviews, Slavin and colleagues suggest educators and R305R070025 17 researchers should: (1) pay more attention to classroom processes that maximize student engagement and motivation rather than focusing solely on implementing and testing new textbooks and curricula, and (2) consider the potential additive effects of creating multicomponent interventions. Multi-Faceted Approaches to Improving High School Instruction Surprisingly, despite literature suggesting that student achievement at the high schoollevel would benefit from more comprehensive interventions (Heller & Greenleaf, 2007; Quint, 2006) research on multi-faceted approaches to improving high school instruction is limited. Scholars have identified four key, overlapping areas that should be addressed in order to build high-quality secondary schools that will increase student achievement and graduation rates, and prepare students for college work and citizenship: (1) clear definitions of teacher roles and responsibilities, (2) clear definitions of skills that must be taught, (3) provision of ongoing professional development in teaching skills, and (4) clear sets of state standards and accountability, and district rules and regulations (Heller & Greenleaf, 2007; Quint, 2006). These recommendations speak to the need for evidence-based models for high school improvement that have the capacity to effect setting-level change. Few projects have bundled these intervention elements, while simultaneously working with school and district administrators to ensure successful implementation of school wide comprehensive models. And, the work that has been done focuses almost entirely on student-level changes, without paying attention to changes at the teacher or and classroom levels. While students influence their classrooms and school-level settings, those settings also affect students. Settings provide an important point of entry for making meaningful changes in the lives of students, especially students of color and students from disadvantaged backgrounds (Tseng & Seidman; 2007; French, Seidman, Allen, & Aber, R305R070025 18 2006; Seidman, Allen, Aber, Mitchell, & Feinman, 1994). In the past decade, a number of interventions targeting academic outcomes in high schools have emerged. Several of these models have addressed previous design and analytic limitations through rigorous efficacy trials; however, results from these quasi-experimental and experimental evaluations have been mixed. While some interventions have shown significant and promising preliminary academic gains for students with effect sizes between .18 and .33 (Gambone et al., 2004; Quint, 2006; Quint et al., 2005; Lang et al., 2009); many have resulted in null findings (Cavalluzzo, Lowther, Mokher, & Fan, 2012; Corrin et al., 2012) or small effect sizes (Corrin, Somers, Kemple, Nelson, & Sepanik, 2008; Kemple et al., 2008). Quint (2006) highlighted three promising high school whole school interventions that have undergone quasi-experimental impact evaluations with promising results: First Things First (one component of which is Every Classroom, Every Day, as discussed above), Talent Development (TD), and Career Academies (CA). These three interventions have been implemented in more than 2,500 high schools across the United States. Additionally, they all contain multiple intervention components intentionally linked to underlying theories of change targeting whole school reform. Quint (2006) argued that these studies, although quasiexperimental in nature, provide compelling evidence that interventions targeting the whole school can improve student achievement via the structural promotion of positive school climate, focusing on students with poor academic skills, improving curricula and teacher instructional content and practice, and preparing students for post-secondary education and employment. In addition to whole school models highlighted by Quint (2006), some recent schoollevel intervention models have produced some promising results. Lang and colleagues (2009) conducted a 3-year impact study of four intensive reading interventions for 1265 9th-grade R305R070025 19 struggling readers in 89 classrooms in seven high schools in Florida. In addition to instituting new reading curricula, teachers underwent ongoing professional development and the research team worked with school officials to address issues related to implementation and fidelity. Students were identified as struggling readers based on the prior year’s state reading performance test, and placed in one of two risk groups: Level 1 (i.e., high risk)—reading below a fourth-grade reading level, or Level 2 (i.e., moderate risk)—reading between a fourth- and sixth-grade level. Students were then randomly assigned, within school and within level, to classrooms where one of four intensive reading interventions were taught. A 2 × 4 (Risk Level × Intervention Group) linear mixed model with random coefficients was used to model students’ gains in reading developmental scale scores. Although gains made by students in the high risk group varied by intervention, these students demonstrated improvements that were more than twice the magnitude of the state benchmark for expected annual growth across all four interventions. For the moderate risk group, results indicated that while gains were not as great for this group or significant across all four interventions, students in this group showed greater average gains in state tests scores compared to other 9th-grade students statewide. It is important to note that despite attempts to estimate causal effects, this study suffered from several limitations, including cross-contamination concerns due to randomization at the classroom-level, failure to account for nested structure of data in analyses, large amounts of missing data, and limited generalizability to other educational contexts. Nonetheless, this study supports the use of school-wide models in enhancing reading comprehension, particularly for low-achieving, at-risk students in high schools. Conversely, a small number of more recent studies of instructional interventions using rigorous causal evaluation methods have resulted in mixed results or no impact on student R305R070025 20 academic achievement outcomes. The Enhanced Reading Opportunities (ERO) study evaluated two literacy interventions embedded in a larger school reform model targeting reading comprehension and school performance among 2,916 low performing 9th-graders in 34 high schools in 10 school districts across the U.S. Students who scored between two and five years below grade-level on reading comprehension standardized test scores were considered at-risk for poor reading outcomes, and were included in the study. Literacy programs were designed as fullyear courses, which replaced a 9th-grade elective and supplemented the regular English curriculum. Teachers were provided with ongoing professional development. The ERO evaluation utilized a two-level random assignment research design. Within each district, high schools were randomly assigned to use one of the two ERO literacy programs. Low performing students within each high school were then randomly assigned to enroll in ERO classes, or to remain in a regularly scheduled elective class. In each of two cohorts, ERO programs produced a relatively small but significant effect size of +.08 on reading comprehension (Kemple et al., 2008; Corrin et al., 2008). Despite the significant change, 77% of ERO students in the second cohort were still reading two or more years below grade-level after participation (Corrin et al., 2008). The Content Literacy Continuum (CLC) combines whole-school and targeted approaches to supporting student literacy and content learning, placing emphasis on greater academic support for students with increased academic needs. CLC is a high school-level equivalent to tiered response-to-intervention (RTI) reading instruction frameworks used in elementary schools with some success (see Faggella-Luby & Wardwell, 2011). CLC’s comprehensive framework includes structural components (e.g., specialists working with school leaders to establish a literacy leadership team or creating supplemental reading classes), professional development of R305R070025 21 core content and reading teachers, and a targeted student tiered curriculum that addresses students’ varying comprehension and reading skill levels. A rigorous experimental efficacy study of CLC in 33 high schools in nine districts across four Midwestern states evaluated program impacts on students’ reading comprehension test scores and accumulation of course credits in core content areas using a cluster randomized trial design. Participating high schools within each district were randomly assigned either to implement CLC or to continue with “business as usual”. Student outcomes were compared for schools randomized to the CLC and non-CLC schools. In both Year 1 and 2 of the study, there were no statistically significant differences in reading comprehension scores or students’ accumulation of core credits between CLC schools and non-CLC schools in either grades 9 or 10. Analyses also revealed no significant differences in effects for subgroups (e.g., defined by 8th-grade reading proficiency, being over-age for grade, or special education status) of students or districts (Corrin et al., 2012). Another example of an instructional intervention evidencing null findings can be found in the Kentucky Virtual Schools Hybrid program, which aimed to improve student math achievement and course enrollment outcomes by enhancing hybrid (i.e., bundled online and faceto-face classroom) instructional practices via intensive teacher professional development. Fortyseven Kentucky high schools volunteered to participate in the impact evaluation. Schools were randomized at the school level to the hybrid program or “business as usual”. Intent-to-treat analyses were conducted using two-level hierarchical linear models that nested students within schools, and assessed differences in outcomes between treatment and control schools. Outcome measures included scores on a pre-algebra/algebra standardized college assessment test (i.e., PLAN) and 9th-grade students’ enrollment in 10th-grade math courses. The findings indicated that the treatment had no statistically significant effect for either outcome. Additional exploratory R305R070025 22 analyses further revealed no differences in impacts among subgroups of students or schools (Cavalluzzo et al., 2012). Across ERO, CLC and the Kentucky Virtual Schools Hybrid studies, evaluators cited poorly aligned implementation and fidelity as the primary possible reason for small effect sizes or null findings. ERO evaluators cited a number of implementation issues in year one leading to long delays in enrolling students in ERO classes. Similar to RTI evaluations in middle schools, CLC evaluators cited challenges to implementing the intervention’s structural components and highlighted teacher training as needing significant improvement. For example, in year one of CLC, roughly 75% of experimental schools implemented five or fewer of nine structural components at an adequate level or better. In the case of the Kentucky hybrid intervention, evaluators cited poor fidelity of implementation across schools, and low levels of engagement and participation among teachers and students assigned to the treatment, the voluntary nature of the study, and lack of study generalizability beyond Kentucky. Indeed, past literature has provided evidence of potential and important barriers to implementation of instructional improvement models, including lack of model specificity, inconsistent policies across the school or district, and student, teacher and leadership mobility (Berends, Bodilly, & Kirby, 2002; Desimone, 2002). Taken together, studies reviewed suggest that instructional improvement models must address a number of existing limitations, most notably: (1) the need for clear theories of change that incorporate increased attention to structural components of the instructional improvement models that support reform efforts at all relevant setting levels (e.g., district, school, classroom, teacher, and student), (2) implementation and fidelity challenges that may threaten the ability of interventions to produce theorized meaningful and sustainable impacts for targeted populations, R305R070025 23 and (3) the need to adopt appropriate and rigorous evaluation methods that take into account the complex nature of whole school reform efforts (e.g., nested data structure, estimating impacts at multiple ecological levels, estimating causal impacts, etc.). Professional Development to Change Teacher Practices and Increase Student Achievement At its core, the Every Classroom, Every Day model seeks to change instructional practices in ways that are likely to benefit students learning, while carefully addressing several of the critical shortcomings of past school reform efforts. Consistent with the literature reviewed above, ECED has multiple components including use of instructional coaches, summer institutes, classroom observations with feedback, multiple half-day professional development sessions, a literacy curriculum, and restructuring of the pace, sequence, assessment, and grading of math courses. Professional development, traditionally thought of as workshops, college courses, and study groups for teachers, is now defined more broadly to encompass any activity aimed at improving instruction or teachers’ skills and knowledge (Desimone, 2009). Using this broad definition, ECED is a professional development model. Even with this ever expanding definition, several authors have argued that the field of education has reached a near consensus regarding the key aspects or critical features of high quality professional development that are likely to result in change in instructional strategies and increased student learning (Darling-Hammond, Wei, Andree, Richardson, & Orphanos, 2009; Desimone, 2009; Elmore, 2002; Garet, Porter, Desimone, Birman, & Yoon, 2001). These authors have argued that the form of the professional development (e.g., workshop versus coaching) is less important than the extent to which it embodies these critical features. The exact names and framings of the critical features vary a bit from author to author, but all include the same core ideas. For the current exploration, we will use the names assigned by Desimone (2009). She R305R070025 24 makes a cogent case that this consensus surrounding these critical features is based both on research and the experience of experts in the field. The five critical features according to Desimone are: (1) content focus, (2) active learning, (3) coherence, (4) duration, and (5) collective participation. Importantly, the Every Classroom, Every Day intervention evaluated in this report strives to embody each of these key components of effective professional development that together are thought to lead to changes in teacher practice and, ultimately, to changes in student learning. Content focus. Effective professional development is focused on extending and intensifying teacher knowledge of a subject area, such as math, science, or English Language Arts (ELA), and how children learn that specific content (Garet et al., 2001). Darling-Hammond and colleagues (2009) argued that effective professional development should emphasize “concrete, everyday challenges involved in teaching and learning specific academic subject matter” (p. 10). Content focused professional development stands in contrast to more general efforts to improve instruction via discussion of pedagogy that is not tied to specific content, abstract educational principals, or non-content issues, such as classroom management, which tend to be less effective. Active learning. This critical feature refers to the extent to which teachers engaged in the professional development have opportunities to work directly with the concepts, as opposed to passively listening as material is presented. In effective professional development, teachers are engaging with and analyzing the material through activities such as reviewing student work, discussing a videotaped lesson, and working in groups to apply the information to their own practice (Garet et al., 2001). Active learning stands in contrast to simply listening to a lecture or reading a book. R305R070025 25 Coherence. Teachers experience a wide range of professional development activities. Coherence refers to the extent to which a professional development activity is aligned with other professional development activities in which the teacher is participating and with the school and district’s standards and culture. Professional development will not engender changed instruction or improved student outcomes if teachers’ various professional development activities contradict one another or if school and district administrators do not support the types of changes encouraged by the professional development (Darling-Hammond et al., 2009). Duration. As with all learning that promotes change in behavior, high quality professional development must be sustained over time (Darling-Hammond et al., 2009). Longer activities can provide more time to explore new ideas in-depth and multiple sessions devoted to related concepts allow teachers to practice and discuss their experiences and receive feedback (Desimone, 2009; Garet et al., 2001). Further, the cognitive psychology literature indicates that repeated exposure, as well as a requirement to actually reproduce or use the new material, optimizes memory and application (Roediger, Agarwal, McDaniel, & McDermott, 2011). Collective participation. High quality professional development promotes professional communication and collaboration among teachers by including all teachers (from a department, school, or district). Collective participation allows teachers to support one another in changing practice and sustaining that change by providing a shared set of goals and language. The call for collective participation is based on the idea that educators will learn better if they are working in concert with others who are facing the same issues and that learning is essentially a collaborative process (Elmore, 2002). Additionally, teachers from the same setting often share curricula and students, making it easier to apply the professional development to their specific setting (Garet et al., 2001). Unfortunately, districts often undermine this critical feature by creating an array of R305R070025 26 professional development opportunities from which teachers select the ones that appeal to them. Thus, sessions may include teachers from many different schools, grades, and school-cultures. Such a system weakens the experiences of all teachers by preventing them from working toward a common set of goals. Every Classroom, Every Day (ECED): The Model Every Classroom, Every Day (ECED) was designed to provide 9th- and 10th- grade English/Language Arts (ELA) and math teachers, as well as instructional leaders, with two years of intensive professional development and curricular support, using tools and processes developed by the Institute for Research and Reform in Education (IRRE). In keeping with the findings from the meta-analyses reviewed above (Rakes et al., 2010; Slavin et al., 2008; Slavin et al., 2009) and the professional development literature (Desimone, 2009), ECED employs a broad range of strategies and components, including instructional coaches, professional development sessions that are content-focused and encourage active participation, and curricular and assessment support to improve instruction. All literacy and math teachers within intervention schools take part in the same activities, with a relatively long duration compared to most interventions reviewed here (viz., two years). Coherence is created by a continual focus on three instructional goals: (1) Engagement of all students in their learning, (2) Alignment of what students are being asked to learn with state and national standards and high stakes assessments; (3) Rigor in how all students are taught and the level of content all students are being asked to learn. These instructional goals – referred to as EAR – form the core of all of the ECED training, coaching supports, curricula, and instructional tools and processes. ECED has three major components: EAR Classroom Visit Protocols, ECED Math, and Literacy Matters. This section describes each component as IRRE intended for it to be R305R070025 27 implemented in the treatment schools in the ECED Efficacy Trial. Of course, there was variation in the implementation at the schools that participated in the evaluation. Later in this report, we describe that variation, both quantitatively and qualitatively. EAR Classroom Visit Protocol. The EAR Classroom Visit Protocol is a cornerstone of the ECED process. It is designed for use by school and district personnel, as well as technical assistance providers, to equip districts, schools, departments, instructional coaches, and teachers with up-to-date information about the quality of teaching and learning. It provides schools and districts with a common language for discussing and promoting high-quality instruction. See Early, Rogge, and Deci (2013; which appears in Appendix 2) for a description of the protocol and its psychometric properties. The EAR Classroom Visit Protocol is a 15-item observational tool completed by trained observers during and following a 20-minute observation. Typically teachers receive multiple 20minute classroom visits across the school year to gain a full picture of instruction and student learning in their classroom(s). All users of the EAR Protocol must go through a rigorous set of trainings in order to use the tool. Data are uploaded to a central server that provides reports at different levels (teacher, subject area, department, grade, school, district) for use in professional development and reflective conversations with individuals and teams of teachers. Classroom visitors use two items on the protocol to assess engagement: one measures the percentage of students who are on-task and the second measures the percentage of on-task students who are actively and intellectually engaged in the work. The first item is scored based entirely on observations of students at work. This second item is scored using a combination of observations of students at work and, when possible, brief conversations with some students. The conversations, which take place only if they will not disrupt the class, include questions such as R305R070025 28 “What does your teacher expect you to learn by doing this work?” and “Why do you think the work you are doing is important?” To assess alignment, classroom visitors make eight binary judgments about whether the learning materials, learning activities, expectations for student work, and students’ class work reflect relevant federal and state standards and high stakes assessments, as well as designated local curricula. Rigor is assessed with five indicators (three binary, two percentages) that relate to both the cognitive level of the material and student work expected and the extent to which students are required to demonstrate mastery of the content. Items concern whether the learning materials and student-work products are appropriately challenging, whether students are expected to meet/surpass state standards, and whether all students have an opportunity to demonstrate proficiency and be retaught material to master. Training. At the start of the first year of a school’s participation in ECED, instructional leaders—including school-level instructional supervisors, math and literacy coaches, teacher leaders, and district-level instructional staff—engage in four days of training to develop common definitions and understanding of engagement, alignment, and rigor and to learn how to use the EAR Classroom Visit Protocol. These trainings ensure that district efforts to improve instruction in the participating schools, as well as the activities provided by IRRE, are being viewed through the same lens. These trainings also engage instructional staff in action planning around data emerging from EAR Classroom Visits, planning that helps shape later ECED activities. The initial training consists of: (1) two full days of group instruction, including several classroom visits followed by scoring discussions, (2) a two to three week window during which those participating in the training make at least 10 practice visits as teams to calibrate their scoring, and (3) two additional full days of group instruction focusing on calibration and use of the subsequent data for instructional improvement. At the start of the second year of ECED R305R070025 29 implementation, IRRE provides a 2-day condensed EAR Classroom Visit Protocol refresher training to participating schools. In addition to these trainings for instructional leaders, all instructional staff members in ECED treatment schools receive a 90-minute orientation to the EAR Classroom Visit Protocol to build their awareness and begin building school-wide common definitions of engagement, alignment, and rigor. Use of EAR Protocol in ECED. Upon completion of the EAR Classroom Visit Protocol training, participants visit at least five classes per week, using the EAR Protocol. Assuming that five individuals are trained in each school and 28 weeks are available for visits each year, 700 EAR visits are made in each ECED treatment school each year. This tool is meant to monitor and improve instruction throughout the school, not just in literacy and math classes, so schools are encouraged to use the protocol in all subject areas. Data from these visits are uploaded to the secure server and used to generate reports about the state of teaching and learning. Those reports are a key source of information for ongoing teleconferences between school leaders and the IRRE consultant, IRRE site visits, and school-level discussions about improving teaching and learning. Further, this tool provides the entire school with a common lens and language for identifying and discussing high quality instruction. (Note: As detailed in the Method chapter, in order to assess changes in instruction, individuals unrelated to the schools made EAR Protocol Visits to ELA and math classes in both treatment and control schools. Those visits were for purposes of the evaluation only. Those data were not uploaded to the servers accessed by the treatment schools and were not included in the data reports used for improving instruction.) ECED Math. The math component of ECED is not a curriculum but a system for delivering math instruction and assessing student progress that is specifically targeted to state and national math standards. Additionally, ECED math involves a school-based math coach and R305R070025 30 site visits from IRRE that include professional development for teachers and coaches. ECED Math was based on the work of James Henderson and Dennis Chaconas in Kansas City, Kansas in the late 1990s and in Kansas City, Missouri starting in 2009. Two high schools in Kansas City, Missouri using the math benchmarking system reported an increase of more than 23 percentage points in the number of students scoring proficient or higher on their Algebra 1 test after three years of implementation (Robertson, 2013). In keeping with the recommendations from metaanalyses conducted by Slavin and colleagues (2009) and Rakes and colleagues (2010), ECED Math involves multiple strategies for reorganizing instruction and maintaining student engagement, with a focus on mastery learning and improved assessment strategies, as well as changing teacher and student attitudes toward mathematics. ECED Math could be used in any or all math courses, but treatment schools participating in the ECED Efficacy Trial used it only in Algebra 1 and Geometry classes. These are the classes in which the highest proportions of 9thand 10th-graders are typically enrolled. Organization of instruction and assessment. ECED Math begins with IRRE consultants working with Algebra 1 and Geometry teachers and math coaches from treatment schools to identify key standards or outcomes that their students must be able to demonstrate on high-stakes accountability measures, such as state- mandated testing programs, the ACT, and the SAT, and to be successful at the next course level. Once these teams of teachers make critical decisions about what students must know and be able to do, IRRE consultants continue to assist with prioritizing and grouping those standards into meaningful sequences of skills and units of study referred to as benchmarks. Rather than using textbooks to determine the day-to-day classroom activities, instruction in ECED Math courses is focused on a specific benchmark, phrased in student-friendly terms called “I Can…” Statements. After each lesson or unit of instruction, each R305R070025 31 student should be able to say, for example, “I Can…find the slope of a line,” or “I Can…solve quadratic equations.” Once the initial curriculum work is completed, the teacher teams, with ongoing support from IRRE consultants and local math coaches, design and develop a pacing guide to ensure that all standards/benchmarks are addressed within a timeframe that allows maximum exposure to the standards and testing formats prior to the state or district testing window(s). To ensure that all students have mastered the key standards (i.e., benchmarks) identified by the teacher teams, the Algebra 1 and Geometry teachers develop a series of assessments that are administered to students at the end of instruction around a particular “I Can…” statement. In addition to the built in opportunities teachers have to check each student’s understanding throughout each lesson, these short (five-question) benchmark assessments provide a culminating check for mastery that is aligned with the original set of standards/benchmarks. Students must achieve mastery of at least 80% on each “benchmark assessment” (i.e., answer four of the five items correctly). In addition, students are asked to show mastery again by taking “capstone” assessments, which integrate a small number of related individual benchmarks into a coherent application of logically related concepts and skills. These capstone assessments might also include performance items that move beyond the selected-response format common to most classroom chapter or unit tests, and provide exposure and practice to the types of assessments required in high stakes testing programs. Benchmark assessments are typically given every three to five days and capstone assessments are typically given every two to three weeks. Students do not receive “credit” for a benchmark until they demonstrate “double mastery”: first on the benchmark assessment and a second time on the capstone assessment. When students do not master one or more benchmarks, the classroom teacher is expected to give R305R070025 32 students corrective instruction, using small group or individual instructional strategies. A second form of the benchmark assessment is administered by the classroom teacher for students who did not master the original benchmark assessment. Evidence of student achievement is posted publicly in the classroom in the form of a chart that indicates which benchmarks each student has mastered. Additionally, students have cards indicating which benchmarks they have mastered so that their other teachers and parents are aware of their progress and can encourage them to master the needed benchmarks. This restructured instruction and assessment clearly defines the teachers’ roles and responsibilities and delineates the skills that must be taught, all in keeping with clear state and district accountability standards, as supported by the work of Heller and Greenleaf (2007) and Quint (2006). Grading. One of the key features of the ECED Math program is the implementation of mastery grading. Students are graded solely according to the number of benchmarks on which they demonstrate double-mastery during the grading period. For example, if the district determines that 90% is an A, then the student must demonstrate double mastery of at least 90% of the benchmarks to receive that A. Individual schools can decide what percentage of benchmarks a student must master for each grade, but ECED Math requires that the grades be based solely on the percent of benchmarks mastered. Until students master enough benchmarks to attain a C (typically 70%) they are assigned an I (incomplete). If a student still has an I when the grading period ends, that I appears on his/her report card. Students are given multiple opportunities, described below, to change that I to a C or higher. If all opportunities are exhausted prior to attaining a C, the I is changed to an F. This grading system represents a major shift in thinking for secondary math teachers, students, and parents for two reasons. First, scores on different assessments are not averaged R305R070025 33 together. So, a student who consistently answers 70% of the assessment items correctly will not master any benchmarks and will not pass the class, even though that same student might have received a C under more standard grading practices. The second major shift is that only benchmark mastery—not homework, classwork, effort, or behavior—is used to determine student grades. Some teachers find it challenging to motivate students when these other factors are not part of their grade; however, ECED Math is based on the concept that only mastering the content is truly important for determining success in math. IRRE provides multiple supports for teachers (see below) as they make these significant changes. Supports for struggling students. As noted above, a student who does not master a benchmark on the first attempt receives additional support in the classroom and is then given a second form of the benchmark assessment. If the student again does not master the benchmark at the 80% level, she or he can participate in the “Benchmark Café”, where math teachers provide additional individual assistance and validate student mastery of the benchmark. The Benchmark Café is a room in the school where students can go if they need additional support in mastering their benchmarks. Typically, certified teachers will take turns staffing the Benchmark Café, and each school establishes a schedule for when it is open to students (e.g., during lunches, before and after school, during vacation, etc.). At times the Benchmark Café is staffed by instructional aides, college math students, or more advanced high school math students. In addition to the Benchmark Café, ECED Math requires that schools provide summer school where students receive additional instruction and opportunities to show mastery in order to raise their grade or change an I to a C. Again, the summer program is staffed by math teachers or other qualified individuals. Supports for ECED Math teachers. ECED Math requires significant changes in the daily R305R070025 34 practice of math teachers. Such changes require support and feedback, which are provided by a summer institutes led by IRRE; by math coaches at each school who make regular classroom visits, provide feedback and facilitate meetings on an on-going basis; and by professional development sessions led by IRRE consultants throughout the school year. Summer institutes. Supports for ECED Math begin during summer prior to the first year of implementation. Teachers slated to teach ECED Math, along with their school’s math coach, participate in three full days of curriculum mapping and common assessment training with experienced IRRE consultants. Once these teams of teachers make critical decisions about what students must know and be able to do, IRRE consultants continue to assist with prioritizing and grouping those standards into benchmarks or ‘I Can’ statements described above. During the second summer a two-day summer institute is held to refine the ‘I Can’ statement, as well as the benchmark and capstone assessments, as needed. Newly hired teachers are introduced to ECED math during this time. Coaching and use of EAR Protocol. IRRE’s approach to instructional coaching is based largely on the work of Joyce and Showers (2002). They describe their model for professional development as one that fulfills teachers’ needs to learn the new skill or knowledge, to experience that new learning, practice it, reflect on the practice and receive feedback and finally to participate in coaching around the new learning in the context of their own classroom. In order to participate in ECED, each school must agree to employ a math coach with at least 50% full-time equivalent devoted to coaching the ECED Math teachers. One of the main strategies by which coaches support teachers is through the use of the Engagement, Alignment, and Rigor Classroom Visit Protocol described above. Coaches are trained to score this 15-item tool, during and following a 20-minute classroom visit. The information is uploaded to a R305R070025 35 centralized data base that allows coaches and others at the school to create various reports at the teacher, department, or school level. Each coach is expected to make at least five EAR Protocol visits each week. Coaches use the information they gather during EAR Protocol visits to support individual teachers through the use of reflective questioning and one-on-one meetings. Coaches and IRRE consultants make EAR Visits together during site-visits and use the information to plan professional development activities and target supports. In addition to EAR Protocol visits, coaches facilitate weekly meetings with ECED Math teachers. The time is used to discuss emerging issues around use of ECED Math, including reflection on lessons taught, data discussions based on student mastery levels, preview and modeling of upcoming lessons, and discussion of modifications for struggling students. Coaches also provide targeted support to teachers through strategies such as co-teaching, demonstrations of lessons and strategies, and arranging for teachers to visit one another’s rooms. Professional development led by IRRE. As part of the supports that IRRE provides to ECED schools, IRRE consultants make four site-visits to each school each year. The site visits last three days, and during that time each ECED Math teacher participates in a half-day professional development session. In order to be considered for participation in the ECED Efficacy Trial, districts had to agree to pay the cost of substitute teachers while their teachers took part in these professional development activities. The content of the professional development is determined jointly by the IRRE consultants, math coaches, and school and district administrators. This decision is informed by the EAR Protocol data that have been uploaded to the server. Examples of topics include using relevance to support student engagement, increasing expectations by creating an effective learning environment, and embedding learning through reflection. R305R070025 36 Supports for math coaches. Math coaches are trained by IRRE, starting in the summer before the start of implementation, and supported throughout the project via conference calls that take place every other week and four three-day site visits each year. During the site-visits, the coaches make EAR Protocol visits with the IRRE consultants, debrief about what they saw and what supports teachers need, discuss reflective coaching strategies, and plan how the coaches can best support each teacher. Additionally, IRRE consultants model the coaching process by working directly with teachers to support improved instruction, while the coaches observe, and IRRE consultants observe the coaches during a coaching session with a teacher and give the coach feedback for enhancement. Each site visit ends with a meeting that includes the principal and other members of the school leadership to discuss progress and plan for continued improvement. The work of the site-visits is sustained and supported through regularly scheduled conference calls between the IRRE consultants and the math coaches. Links to EAR. IRRE intentionally designed the ECED Math to focus on their three core instructional goals: Engagement, Alignment and Rigor (EAR). ‘I Can…’ statements are intended to make math engaging and personally relevant by showing students what skills they will acquire during each lesson. Alignment is assured because each school creates its own pace and sequence based on its state and local standards. The double mastery system, coupled with grading based solely on mastery, ensures rigor by holding all students to the same high standards for understanding. The frequent benchmark assessments and the larger capstone assessments provide continual checks for understanding and ensure that the information is being incorporated into the students’ larger knowledge base of mathematics. Literacy Matters 2: ECED’s Literacy Component. The centerpiece of ECED’s approach to literacy is a two-year curriculum delivered in a stand-alone class that supplements R305R070025 37 the regular 9th-and 10th-grade English curriculum. Additionally, Literacy Matters involves a school-based literacy coach and site visits from IRRE that include professional development for teachers and coaches. This multi-pronged approach to supporting improved literacy instruction is in keeping with the Slavin and colleagues’ (2008) meta-analysis findings that indicated that instructional process programs that provide teachers with extensive professional development to implement specific instructional strategies are most effective. Further, as supported in the literature reviewed earlier, the curriculum itself employs mixed-methods of small and large group instruction, (Slavin et al., 2008), clearly defines teacher roles and responsibilities and skills that must be taught (Heller & Greenleaf, 2007; Quint, 2006), and employs a school-wide model specifically aimed at enhancing reading comprehension (Lang et al., 2009). Secondary students’ literacy skills – their ability to read, write, speak, and listen – form a fundamental building block both for academic achievement in high school and for life-long success. Many of the schools also face the challenge of building these skills with students for whom English is not their primary language. To meet these pressing needs, ECED provides a two year, researched-based, structured literacy curriculum that supplements traditional English courses by using authentic, real world, expository texts and engaging activities which provide students with additional time and opportunities to foster and strengthen their literacy skills and habits on a daily basis. The first year of the curriculum, called Reading Matters 3, aims to strengthen students’ abilities to comprehend and gather information, helping students to identify ways to make learning easier. Students refine critical thinking skills and learn how to work well with others through activities such as debates, exploring career interests, and writing speeches to express the change they want to see in the world. The curriculum includes four units each of which is R305R070025 38 dedicated to a different text genre: (1) Who are we? (persuasive), (2) Our footprints on society (expository), (3) The Change we want to be (research/analytical/persuasive), and (4) Learning with others (research/analytical/descriptive). Each unit supports students in answering the question: “What is our personal role and responsibility in seeing the need for and creating positive changes in society?” Teachers use a collection of interdisciplinary instructional strategies, called the Power 10, that equip students with transferable skills for comprehending, organizing, and remembering information in multiple disciplines. Students work collaboratively in small groups and teams on daily basis to share their thinking, expand their ideas, and reach consensus, as well as listen to and present information with others. The curriculum includes seven project based, rubric-driven assessments, so that students receive feedback regarding mastery. The second year of the curriculum, called Writing Matters 4 aims to strengthen students’ abilities to share and communicate information with others, helping students to identify ways to express and personalize their knowledge. Using six traits of writing, instructional strategies, and relevant topics, students explore what writing is and who writers are, the art of sharing narratives, how to analyze the audience, and how to make their voices heard through oral, written, and visual communication. The curriculum includes four units, each of which is dedicated to a different text genre: (1): Who are writers? What do they do? (descriptive), (2) If walls could talk, if hearts could speak (narrative), (3) The audience – Who is really listening?! (analytical/persuasive), and (4) A call to action – Making our voice heard (research/persuasive). Each unit supports students in answering the question: “In a culture of visible communication, how can I increase my communication proficiency so that I can confidently, creatively, and curiously explore, interact with, express, and make sense of self and society to impact the R305R070025 39 world?” The curriculum embeds multiple activities and opportunities to reflect and determine growth, including the development of a writer’s portfolio, so that students make connections to the skills they are developing and their growing proficiency as writers. As with the first year of the curriculum, teachers use the Power 10 strategies to equip students with skills for comprehending, organizing, remembering, and communicating information in multiple disciplines. The curriculum includes five project-based, rubric-driven assessments, allowing students to receive feedback towards mastery. As a prerequisite of participation in the ECED Efficacy Trial, schools had to agree to enroll all of their 9th- and 10th-grade students in this supplemental literacy course. The only exceptions were students in self-contained special education and “newcomers” whose English skills were too low to be enrolled in the regular high school curriculum. In schools on a traditional six- or seven-period schedule, this course was intended to meet for one period per day for the entire school year. In schools on an accelerated four-by-four block schedule, it was intended to meet for one block (approximately 90 minutes) per day for one semester. During the first year of implementation, both 9th- and 10th-grade classes used the 9th-grade Reading Matters curriculum, because none of the students had been exposed to the curriculum at that point. During the second year of implementation, 9th-grade classes used the 9th-grade Reading Matters curriculum, and 10th-grade classes used the Writing Matters curriculum. Students were also expected to be enrolled in the regular 9th- or 10th- grade English/language arts course, again either for one period per day all year or one block per day for one semester. Thus, the ECED Literacy course effectively doubles the amount of ELA exposure each 9th- and 10th-grader receives. Links to EAR. The Literacy Matters curriculum, like ECED Math, was designed with an R305R070025 40 intentional focus on IRRE’s three core instructional goals: Engagement, Alignment, and Rigor (EAR). Making the material personally relevant is a well-established path to encourage engagement (National Research Council and the Institute of Medicine, 2004). Literacy Matters uses texts and assignments that address personal responsibility, positive societal change, and one’s personal impact on the world to ensure high personal relevance and engagement. IRRE works with school districts and states to map the Literacy Matters curriculum onto their state and local standards, ensuring all standards are met and supported. Rigor is ensured through appropriately challenging texts and the Power 10 strategies and assessment rubrics provide students and teachers with on-going information about mastery. Supports for Literacy Matters teachers. When schools start participating in ECED, the literacy curriculum is new for all teachers and many of the Power 10 strategies are also new. Thus, teachers require significant support for successful implementation. As with ECED Math, supports for teachers using the Literacy Matters curricula come in the form of summer institutes led by IRRE; ELA coaches at each school who make regular classroom visits, provide feedback, and facilitate meetings; and professional development sessions led by IRRE consultants throughout the school year. Summer institutes. Supports for Literacy Matters begin during summer prior to the first year of implementation. Teachers who will be teaching ECED Literacy, along with the school’s literacy coach, participate in three full days of ECED Literacy introduction, modeling, and practice, with experienced IRRE consultants. During the second summer, a two-day summer institute introduces the second year of the curriculum and introduces newly hired teachers to ECED Literacy. Coaching and use of EAR Protocol. Literacy Matters’ coaching and use of the EAR R305R070025 41 protocol closely parallel ECED Math, based largely on the work of Joyce and Showers (2002). As part of the agreement to take part in ECED, each school must employ a literacy coach with at least 50% full-time equivalent devoted to coaching ECED literacy teachers. Literacy coaches are trained to use the Engagement, Alignment, and Rigor Protocol and use the information they gather to support individual teachers using reflective questioning and one-on-one meetings. Each coach is expected to make at least five EAR Protocol visits each week Also, coaches and IRRE consultants make EAR Visits together during site-visits and use the information to plan professional development activities and to target supports. As with ECED Math, literacy coaches also facilitate weekly meetings with ECED Literacy teachers. The time is used to discuss emerging issues around use of the curriculum, including reflection on lessons taught, preview and modeling of upcoming lessons, and discussion of modifications for struggling students. Coaches also provide targeted support to teachers through strategies such as co-teaching, demonstrations of lessons and strategies, and arranging for teachers to visit one another’s rooms. Professional development led by IRRE. As part of the supports that IRRE provides to ECED schools, IRRE consultants make four site-visits to each school each year. The format of these visits parallels the ECED Math site visits. They last three days, and during that time each ECED Literacy teacher participates in a half-day professional development session. The exact content of the professional development is determined jointly by the IRRE consultants, literacy coaches, and school and district administrators. This decision is informed by the EAR Protocol data that has been uploaded to the server. Examples of topics include using relevance to support student engagement, increasing expectations by creating an effective learning environment, and embedding learning through reflection. R305R070025 42 Supports for literacy coaches. The supports for literacy coaches are similar to those for math coaches. They are trained by IRRE, starting in the summer before implementation begins, and supported throughout the project via conference calls and four site-visits each year. During the site-visits, the coaches make EAR Protocol visits with the IRRE consultants, debrief about what they saw and what supports teachers need, discuss reflective coaching strategies, and plan how the coaches can best support each teacher. IRRE consultants model the coaching process by working directly with teachers to support improved instruction, while the coaches observe. IRRE consultants also observe the coaches during a coaching session with a teacher and give the coach feedback for enhancement. Each site visit ends with a meeting that includes the principal and other members of the school leadership to discuss progress and next steps. The work of the sitevisits is sustained and supported through regularly scheduled conference calls between the IRRE consultants and the literacy coaches. Hypotheses In order to evaluate the ECED approach to instructional improvement described above, a school-clustered randomized trial was conducted in which 10 schools were randomly assigned to receive all ECED support for two years and 10 were assigned to a ‘business as usual’ control condition. The primary hypothesis was that student achievement in math and ELA, as measured by standardized test scores, would increase as a result of participation in ECED. However, we see achievement as related to many other aspects of education and hypothesized that other parts of the achievement equation would be affected by ECED as well. Thus, we have six main hypotheses driving the methods and analyses. They are presented here in the order in which they are presented in this report: 1. Participation in ECED will improve teachers’ attitudes toward their school and work, R305R070025 43 including the extent to which the feel supported by their school and district administrators, their self-reported engagement in teaching, the extent to which they see their colleagues as supportive. 2. Participation in ECED will increase engagement, alignment, and rigor, as measured by the EAR Classroom Visit Protocol. 3. Students’ attitudes toward school (i.e., self-reported engagement, experience of teacher support, perceived competence) will improve as a result of participation in ECED. 4. ECED will lead to increases in student achievement, as measured by math and ELA standardized tests. 5. Student academic performance and commitment, as measured by grade-point average (i.e., performance), attendance, and progress toward graduation (i.e., commitment), will improve as a result of participation in ECED. 6. Within the ECED treatment schools, the extent to which the ECED components were implemented as intended by IRRE will be associated with greater changes in teacher attitudes, EAR Protocol scores, student achievement, student attitudes, and student performance and commitment. R305R070025 44 III. Method Study Design The Every Classroom, Every Day Evaluation is a school randomized field trial examining the efficacy of a high school instructional improvement intervention. The intervention was created and administered by the Institute for Research and Reform in Education (IRRE) and the evaluation was conducted by a team of researchers through a US Department of Education, Institute of Education Sciences grant to the University of Rochester. Twenty high schools (5 districts, 4 schools per district) were randomly assigned to either the treatment (n = 10) or control (n = 10) condition, with two high schools from each district being assigned to each condition. 5 The five districts were spread across four states: two in California, one in Arizona, one in Tennessee, and one in New York. Each school participated for a two year period. Schools in the first recruitment group (n = 8) participated in 2009-10 and 2010-11. Schools in the second recruitment group (n =12) participated in 2010-11 and 2011-12. There were five primary types of data collected: (1) student surveys, (2) teacher surveys, and (3) classroom observations, using the Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol, were collected in four waves, once near the start and end of each academic year that the school participated; (4) variation-in-implementation interviews with math and literacy coaches and/or department chairs and the school principal or assistant principal, and (5) student records were collected twice, once at the end of each academic year that the school participated. Each type of data is discussed in detail later in this report. Treatment schools received all ECED supports (e.g., site visits from IRRE consultants, curriculum materials) free of charge 6. Control schools were given a $10,000 honorarium ($5,000 per year) to thank them for their participation in the data collection activities. R305R070025 45 This study sought to address many of the design concerns raised in the three metaanalyses regarding improving classroom instruction. The intervention and data collection lasted two years. Whereas many school reformers might argue that this is still too short to see meaningful change, it is considerably longer than most past work in this area. Additionally, there are 10 schools in each group, spread across five districts and four states. This is a relatively large and diverse sample as compared to others randomized field trials of school improvement models, allowing us to feel confident that the results can generalize to a wide variety of schools and settings. Most importantly, schools were randomly assigned to condition, allowing for the strongest form of causal inference. School Selection School recruitment. Initially, the goal was to include only schools that enrolled at least 220 9th-graders and had a minimum of 30% of students eligible for free/reduced price lunch. To be considered for participation, a district needed to include at least four high schools that met the recruitment criteria and were interested in participation. As a first step in recruiting, the Common Core of Data (US Department of Education) was used to create a list of all districts in the country that met the criteria for inclusion. This list included over 150 districts throughout the US. Districts where IRRE or the research team had personal contacts were contacted directly, generally via telephone. The remaining districts, received an email message describing the study, a letter containing a one-page description about the study, and/or a phone call. Additionally, a letter was sent to each state superintendent of education with at least one eligible district, outlining the study and encouraging them to contact districts within their state that might benefit from participation. R305R070025 46 Follow-up contact was made with districts that expressed interest. This generally took the form of several phone calls between IRRE leadership, the research team, and the district’s leadership. If the district remained interested, a site-visit was conducted. Each site visit involved one member of IRRE’s leadership and one member of the research team. Site visits included a meeting with the principal from each school considering participation to explain the ECED’s instructional improvement model and the research requirements (including the random assignment), visits to at least two schools to see classrooms and meet a larger group of school leaders, and a meeting between the research team member and the district’s research director. After the site visit, interested districts signed a memorandum of understanding (MOU) outlining all implementation and research requirements, including the random assignment procedures. A sample MOU appears in Appendix 3. Following the recommendations of the Consort Transparent Report of Trials (Schulz, Altman, & Moher, 2010), a flow diagram indicating the numbers of schools identified during each step of recruitment appears in Appendix 4. Random assignment. In order that districts and schools felt certain that a truly random process was used to assign schools to the treatment versus control condition, the drawing was broadcast via webcam. It involved placing four slips of paper in a bowl, each with a different school’s name on each, and then having an individual unaffiliated with the project pull out two that would participate in the treatment condition. District and school leadership teams were invited to watch. Random assignment took place as soon as the MOU was signed, generally in the spring prior to the start of the district’s participation in the study. 7 Characteristics of participating schools. Based on data from the Common Core of Data (CCD) during the year the school began participating (2009-10 for the first recruitment group and 2010-11 for the second recruitment group), the schools were generally large, with an average R305R070025 47 enrollment of over 1,300 (SD = 690; median = 1,151), but the range of school sizes was also large (156 to 2,553). 8 As described above we sought to include only schools with over 220 9thgraders. The average number of 9th-graders was over 350, but there were five schools that fell below the 220 threshold. This was largely due to the open enrollment policies in several districts through which students could choose to attend any school in the district, making it difficult for districts to predict the following year’s enrollment. The racial/ethnic composition of the schools varied widely. In ten schools over half the students were Latino, in five schools over half were African American, and in three schools over half were White. In the remaining two schools there was no racial/ethnic group that made up over half of the student body. On average, 70% of students in the participating schools were eligible for free or reduced price lunch (FRPL) and schools ranged from 46% to 98% FRPL (median = 69%). There was a wide range of pupil/teacher ratios from 8.8 to 26.1 (mean = 18.3, median = 19.7, SD = 5.3). Most of the schools (13 out of 20) served the traditional high school grades of 9th through 12th. Five schools were combined middle and high schools, serving either 6th- through 12thgrades 9 or 7th- through 12th-grades 10. Two schools—one treatment and one control, 11 from the same district—opened their doors for the first time the year their participation in ECED began. Those schools enrolled only 9th-graders during their first year of operation. During their second year they enrolled 9th- and 10th-graders and were slated to add a grade each year until they were serving 9th- through 12th-grade. Half of the schools were located in mid-size cities (n = 10; 5 treatment, 5 control). The remainder were in large suburbs (n = 5; 2 treatment, 3 control), large cities (n = 3; 2 treatment, 1 control); or fringe rural areas (n = 2; 1 treatment, 1 control) (US News and World Report, 2013). R305R070025 48 Site attrition. Two treatment schools stopped their participation part way through the project. The first 12 stopped its participation in the ECED supports after the first semester. Staff buy-in was low at that school from the beginning, especially among teachers in the math department. Further, the district superintendent who had originally supported ECED and encouraged the schools to participate was let go during that first semester. That school did not take part in data collection in the spring of the first year (Wave 2) or the fall of the second year (Wave 3), but did participate in the fall of the first year (Wave 1) and the spring of the second year (Wave 4). In Wave 4, the school administered surveys to 10th-grade students and allowed EAR Classroom visits in 10th-grade rooms only, because 9th-grade students and teachers had had no exposure to ECED. Additionally, in Year 2 school personnel participated in variation in implementation interviews. They had not taken part in those interviews at the end of Year 1, but did provide information about Year 1 during the Year 2 end-of-year interviews. The district provided student records for both years for that school. The second school to leave the project 13 participated for one year and then pulled out of the ECED supports as the second year got underway. The school had not implemented the reform well in Year 1 and the teachers had been unhappy that they were asked to participate in this reform effort. That school had three different principals in the first year it participated and was assigned a fourth new principal as the second year started. The new principal was not interested in taking part in the supports. In the second year (Waves 3 and 4), the new principal allowed the research team to collect student surveys, but he did not allow teacher surveys, EAR Protocol visits, nor interviews. The district did provide student data for both years at this school. Leadership turnover. The participating schools and districts experienced a very high level of leadership turnover during the course of the study. From the time we began working R305R070025 49 with them in the spring prior to their first year of participation to the end of the second year of participation, the superintendents in four of the five participating districts left their positions. The fifth kept his position throughout the project, but announced his resignation as toward the end of the second year after having serious health problems during the project’s second year. At the principal-level, there was one or more change in principals in 11 out of the 20 schools (six treatment, five control) between the time the school was recruited into the study and the end of their second year of participation. For more specifics on leadership changes and other disruptions, see Chapter VII. Study Teachers Study teachers are individuals who taught a target course (Algebra 1, Geometry, 9thGrade English, 10th-Grade English or ECED Literacy) at any point during the four terms that their school participated. 14 Demographic information about teachers was collected from both district records and teacher surveys. Between these two sources we have demographic information for between 74% and 95% of teachers, depending on the variable. Table 1 provides demographic information for all study teachers, separated by subject taught and condition. Math study teachers. Math study teachers are individuals who taught Algebra 1 or Geometry at any point during the four terms that their school participated in the study. There were 238 math study teachers in all; 116 in treatment schools and 122 in controls. 15 16 As seen in Table 1, roughly half of math study teachers were male. Over half of math teachers were White, with the non-White teachers roughly equally split among Latino, Black, and Asian. In general, math teachers in the study had a lot of teaching experience, with about two-thirds having more than 6 years of experience. There were no significant demographic differences between math teachers in treatment versus control schools. Turnover was high among math teachers in the R305R070025 50 study. As seen in Table 1 (in the section called “Study semesters teaching at the school”) only 57% of math teachers at treatment schools and 62% of teachers at control schools taught at the school during all four terms of the intervention. The remainder started after the intervention began, left before the intervention ended, or both. In addition to many teachers leaving or joining the target schools during the two years of the study, a fairly high number of teachers remained at a target school throughout the four terms their school participated, but only taught target math courses some terms. Thus, considering whether study math teachers were teaching target math courses of Algebra 1 and Geometry in all terms, turnover was even higher. The average number of terms a math target teacher taught a target course was less than three and just over one-third taught a target math course during all four terms that their school participated. This represents a combination of teachers leaving the school and changes in teachers’ assignments. For instance, the fact that 57% of teachers in treatment schools were at the school all four terms, but only 36% taught a target course all four terms, means that 21% of teachers were present at the school but were not assigned to teach a target course during at least one term. Both teacher turnover and teacher reassignment have potentially major impacts on the intervention’s success. Teachers who were not teaching a target course during a particular term did not participate in the ECED supports that term and therefore we might anticipate that they benefitted less from the supports than they would have with four terms of exposure. Similarly, teachers who were assigned to target courses after the first study term would not have received the full training and supports and would not have participated in the summer institute during which the ‘I Can…” statements were created and the benchmark assessments drafted. English Language Arts (ELA) study teachers. ELA study teachers are individuals who R305R070025 51 taught 9th- and/or 10th-Grade English 17 or ECED Literacy at any point during the four terms that their school participated in the study. There were 298 18 19 ELA teachers in all; 166 in treatment schools and 132 in control schools. As seen in Table 1, about two-thirds of ELA teachers were female and about two-thirds of ELA teachers were White, with Hispanic/Latino teachers making up the second largest racial/ethnic group. In general, ELA teachers had a lot of teaching experience, with about two-thirds having more than 6 years of experience. There were no significant demographic differences between ELA teachers in treatment versus control schools. However, the ELA study teachers at treatment versus control schools are not entirely comparable because ECED Literacy was an additional course that was added in treatment schools as part of the intervention; control schools did not include such a course. As such, comparisons between them are non-experimental and should be interpreted with caution. Most treatment schools selected 9th-/10th-grade English teachers to teach their ECED Literacy courses, so 9th-/10th-grade English teachers in control schools are being used for comparison. We cannot, however, know that these same individuals would have taught ECED Literacy had their school been selected to participate in the ECED treatment condition. Further, in treatment schools only teachers who taught ECED Literacy received the ECED supports. Although there was high overlap between the 9th-/10th-Grade English teachers and the ECED Literacy teachers in treatment school, there were some 9th-/10th-Grade English teachers at treatment schools who never taught ECED Literacy and therefore never participated in ECED supports. Among the 166 ELA teachers at the treatment schools, 77 (46%) taught both 9th-/10th-grade English and ECED Literacy at some point during the study, 52 (31%) taught 9th/10th-grade English but not ECED Literacy, and 37 (22%) taught ECED Literacy but not 9th/10th-grade English. Thus the 31% of ELA teachers at treatment schools who never taught ECED R305R070025 52 Literacy did not have any exposure to the intervention. They are nonetheless included in our intent-to-treat analyses because ECED is intended to be a school-level reform. As with math, turnover was high among ELA teachers. As seen in Table 1 (in the section called “Study semesters teaching at the school”) only 49% of ELA teachers at treatment schools and 61% at control schools were teaching at the school during all four terms of the intervention. The remainder started after the intervention began, left before the intervention ended, or both. As with math teachers, considering whether the ELA study teachers were teaching target ELA (i.e., 9th-/10th-grade English and ECED Literacy) all terms, turnover was even higher. The average number of terms a target teacher taught a target ELA course was only about 2.6, and less than one-third of ELA teachers taught a target ELA course during all four terms. This is due both to teachers leaving the schools and changes in teachers’ assignments. For example, in treatment schools, 49% of teachers were at the school all four terms, but only 33% taught a target course all four terms, meaning that 16% of teachers were present at the school but were assigned to teach only non-target courses during at least one term. As with math teachers, both teacher turnover and teacher reassignment have potentially major impacts on the intervention’s success. As with math, ECED supports were only provided to study teachers who were teaching ECED Literacy that term or year. 20 R305R070025 Table 1. Characteristics of Study Teachers 53 116 43.8 Math Control 122 52.1 1.27 .20 166 31.4 65.3 11.2 13.3 8.2 2.0 56.5 12.0 10.9 17.4 3.3 -1.24 .13 -.50 1.59 .39 .22 .89 .62 .11 .69 71.8 17.7 4.8 4.0 1.6 7.9 10.9 22.8 22.8 22.8 12.9 3.06 (1.15) 7.3 10.4 17.7 26.0 26.0 12.5 3.16 (1.16) .53 7.8 13.3 20.3 30.5 19.5 8.6 2.91 (1.16) 4.6 12.0 19.4 32.4 23.1 8.3 3.17 (1.11) 1.93 .06 12.1 26.7 4.3 56.9 2.66 (1.10) 13.9 18.0 6.6 61.5 2.58 (1.21) .59 14.5 28.9 7.8 48.8 2.55 (1.10) 10.6 22.7 6.1 60.6 2.55 (1.12) -.01 .99 12.9 44.0 6.9 36.2 23.0 33.6 5.7 37.7 15.1 47.0 5.4 32.5 17.4 42.4 7.6 32.6 Tx N Gender (% male) Race/ethnicity (%) White (non-Hispanic) Hispanic/Latino Black (non-Hispanic) Asian/Pacific Islander Other Years teaching (%) < 1 year 1-2 years 3-5 years 6-10 years 11-20 years 20 or more years Study semesters teaching at the school: Mean (SD) 21 % study semesters teaching at the school (any course, target or not) 1 term 2 terms 3 terms 4 terms Study terms teaching target course: Mean (SD) Study terms teaching target course (%) 1 term 2 terms 3 terms 4 terms t .64 -.54 p Tx ELA Control 132 31.5 t p .02 .99 71.0 .22 15.0 -.47 10.3 1.65 0.9 -1.38 2.8 .71 .83 .64 .10 .17 .48 R305R070025 54 Study Students The target population of students for this study was all students who were in 9th- or 10thgrade in either of the years of implementation who met the inclusion criteria (ni = 22,131; ni = 10,515 treatment and 11,616 control). Inclusion/exclusion criteria. 22 Students in 9th- or 10th-grade were excluded from both the intervention and the evaluation if they were (1) in a self-contained special education class, meaning that they had a disability that was too involved to participate in the regular curriculum, or (2) a ‘newcomer’ to the country with such limited English skills that they were excluded from the regular curriculum, as defined by their school district. Across all districts, a total of 497 students (2.2% of otherwise eligible students) were excluded because they were in self-contained special education; of these 261 were in treatment schools (2.4%) and 236 were in control schools (2.0%). Across all districts, 292 students (1.3%) were excluded because they were newcomers; of these 160 were in treatment schools (1.5%) and 132 were in control schools (1.1%). Note that these are small sub-groups of the special education and English language learner (ELL) populations; most special education and ELL students were included in this study. Student rosters were obtained from the districts two times each year (four times across the study) to ascertain exactly which students were in 9th- or 10th-grade and to organize the survey administration. Rosters were generally received about one month after the start of the school year and two months prior to the end of the school year. Any student who appeared on any of the four rosters is considered part of the study population (aside from the two groups described above), regardless of how long they attended the school. A few students may have been excluded because they enrolled after the fall rosters were generated and left before the spring rosters were generated. Although we think this group was very small, there is no way to R305R070025 55 know its precise size. Additionally, this strategy means that some students are counted as part of the study population who actually attended a study school for very few days or not at all, because their names were on the roster the day it was created for the study. Some schools were slow to remove students from rosters, so some students who enrolled but never attended remained on the rosters and are therefore counted as part of the study-students. At the start of the study, parents were sent a brief description of the study and given a form to return if they did not want their child to participate in the questionnaires or to have their child’s records released. Across all districts, parents of 366 students (1.7%) returned this form to exclude their child; in treatment schools, parents of 114 students (1.1%) returned the form and in control schools, parents of 252 (2.2%) returned the form. Because they are still part of the target student population, all values for those students were considered missing and were imputed. Movement between schools. The total target student population of 22,131 actually represents only 21,641 individual students. During the two years of the intervention, 476 students (2.2%) moved from one target school to another target school one time and 7 (<1%) students moved from one target school to another target school two times. Data were collected from and about these students in all schools whenever possible. For analyses purposes, they are treated as different students and appear in the data set two or three times. Student demographic characteristics. Students fell into three different grade cohorts: Grade Cohort 1 included students in 9th-grade in Year 1 and/or 10th-grade in Year 2, Grade Cohort 2 included students in 10th-grade in Year 1, and Grade Cohort 3 included students in 9thgrade in Year 2. Students who were in 9th-grade both years (n = 382) are part of the first grade cohort, whereas students who were in 10th-grade both years (n = 170) are part of the second grade cohort. The analyses in this report include only Grade Cohort 1 because those students had R305R070025 56 the potential of being exposed to the intervention for two years, and therefore are the most likely to evidence intervention effects. Table 2 presents demographic information about study students, separated by grade cohort and condition. Demographic information comes primarily from records provided by the districts. When demographic information was missing from the district records, we used the students’ report of race/ethnicity and gender from the student surveys. This happened in roughly 1.3% of cases. After combining data sources, race/ethnicity was missing for 2.8% of cases, gender was missing for 2.5% of cases, and other demographic variables were missing for roughly 4.1% of cases. No attempt was made to collect student survey data from students who left a participating school and did not enroll in a different participating school. School records were requested for all students in the target student population; however, districts often could not provide records about course enrollments for students who left during the year and those students did not typically take the standardized tests. The last section of Table 2 indicates how many terms students were enrolled in the study schools. For Grade Cohort 1 students in treatment schools to have the full benefit of the ECED they would need to have been enrolled for all four terms of the project. Among students in Grade Cohort 1 (9th-grade in Year 1 and/or 10th-grade in Year 2), just over half (54%) were enrolled all four terms, meaning 46% either arrived after the first wave of data collection, left prior to the last wave of data collection, or both. As seen in Table 2, there are some differences between the study students in the treatment and control conditions, with control schools enrolling slightly more students of color, students who were eligible for free or reduced price lunch, and English language learners. To account for these differences, these demographic variables are used as covariates in the analyses regarding the impacts of ECED on student outcomes (see Chapter VI). R305R070025 Table 2. Demographic Characteristics of Study Students 57 Grade Cohort 1 (9th-Grade in Y1) N Tx Ctrl 3,999 4,434 t Grade Cohort 2 (10th-Grade in Y1) p Tx Ctrl 3,108 3,539 t Grade Cohort 3 (9th-Grade in Y2) p Tx Ctrl 3,408 3,643 t p Gender (% male) 53.1 52.3 -0.74 .46 50.1 51.3 0.89 0.37 49.7 50.6 0.72 0.47 Race/ethnicity (%) Hispanic Black, non-Hispanic White, non-Hispanic Asian/Pacific Islander Other 49.7 22.5 17.1 8.2 2.6 52.0 2.10 26.0 3.66 11.8 -6.84 7.9 -0.40 2.3 -0.75 0.04 0.00 0.00 0.69 0.43 50.0 20.9 17.9 8.9 2.3 53.8 3.10 22.9 1.91 11.2 -7.73 9.3 0.60 2.8 1.19 0.00 0.06 0.00 0.55 0.24 50.6 22.2 17.0 7.2 3.0 58.7 19.4 11.3 7.9 2.7 6.73 -2.87 -6.72 1.05 -0.75 0.00 0.00 0.00 0.30 0.46 Free/Reduced Price Lunch (either year) (%) 72.7 78.6 6.17 0.00 68.6 75.9 6.60 0.00 69.7 74.7 4.57 0.00 ELL (Y1) (%) 19.9 24.3 4.49 0.00 19.8 26.4 6.29 0.00 NA NA NA NA 5.4 5.6 0.43 0.67 4.2 5.2 1.95 0.05 NA NA NA NA Mean age at baseline in years (SD) 23 14.69 (.66) 14.71 (.70) 0.17 15.63 (.59) 15.65 (.63) 0.24 13.64 (.57) 13.61 (.54) -2.01 0.05 Mean terms enrolled (SD) 3.06 (1.18) 3.01 (1.17) -1.81 0.07 1.86 (.44) 1.85 (.45) -1.51 0.13 1.84 (.37) 1.82 (.38) -1.77 0.08 16.4 83.6 NA NA 18.3 81.7 NA NA 16.0 84.0 NA NA 17.6 82.4 NA NA Special Education (Y1) (%) Terms enrolled (%) 1 term 2 terms 3 terms 4 terms 16.5 17.2 10.1 56.2 16.7 18.1 12.5 52.7 1.37 1.19 R305R070025 58 Enrollment in Literacy Matters. As described in Chapter II, as part of the ECED treatment all target students in treatment schools were supposed to be enrolled in a Literacy Matters 24 course. Target students in Grade Cohort 1 at treatment schools on a traditional schedule or an AB block schedule were supposed to be enrolled in one period or block of Literacy Matters during each of the four terms. Those in treatment schools on a 4 X 4 block schedule were supposed to be enrolled for one-block for one semester each year. 25 As seen in Table 3, only about one-quarter of the 3,999 students in Grade Cohort 1 at a treatment school took the full amount of Literacy Matters prescribed by IRRE. Over 40% of students were enrolled for less than the full four terms, and therefore could not have taken the full amount of Literacy Matters prescribed. Another 10% were enrolled in one of the two schools that left the study. Those schools did not offer Literacy Matters in Year 2. As seen in the last column of Table 3, looking just at those students who were enrolled for the full four terms of the project, almost half took the full amount of Literacy Matters prescribed by IRRE. As described in more detail later in this report (see p. 109 and 159), two schools that were on a traditional schedule only offered Literacy Matters for one term each year. Many of the students who did not take the full amount of Literacy Matters were in one of those two schools. Note that all students are included in the intent-to-treat analysis regardless of whether or not they were enrolled in Literacy Matters, making the comparison very conservative. Of course, we expect the strongest effects for students who were exposed to the full amount intended by IRRE. R305R070025 59 Table 3. Enrollment in Literacy Matters Among Grade Cohort 1 Students in Treatment Schools Number Percent of All Grade Cohort 1 Students at Treatment Schools (n = 3,999) Percent of Students in Grade Cohort 1 and Enrolled All Four Terms in a Treatment School (n = 2,247) Enrolled in treatment school all four terms and… took amount of Literacy Matters prescribed took some Literacy Matters but less than prescribed took no Literacy Matters 1,036 596 25.9 14.9 46.1 26.5 68 1.7 3.0 408 10.2 18.2 unknown, did not receive all course schedules but did take at least some ECED Lit 53 1.3 2.4 unknown, did not receive all course schedules, no ECED Lit on schedules received 86 2.2 3.8 1,752 43.8 - at one of the two schools that left ECED 26 Enrolled at treatment school less than four terms Enrollment in target math. There was no requirement that students in ECED treatment schools be enrolled in particular math courses, but only Algebra 1 and Geometry courses employed the ECED Math strategies and only teachers of Algebra 1 and Geometry courses received the ECED supports. In most high schools it is standard practice for the majority of 9thand 10th-graders to be enrolled in Algebra 1 and/or Geometry, thus we anticipated that most target students would be enrolled in those courses. As seen in Table 4, about one-third of students in Grade Cohort 1 took the expected pattern of at least one term of Algebra 1 and one term of Geometry. Approximately another 20% took either Algebra 1 or Geometry, but not both. As noted above, a large percentage of students were not enrolled all four terms (44% in treatment and 47% in control), so they could not take the expected patterns of courses. Because this is a school-level intervention, all target students are included in the intent-to-treat analyses regardless if they were enrolled in targeted courses or not. That said, we would anticipate the effects to be strongest for those students who were enrolled in the target courses of Algebra 1 and Geometry. R305R070025 60 Table 4. Enrollment in Algebra 1 and Geometry Among Grade Cohort 1 Students in Treatment and Control Schools Treatment Number Percent of All Grade Cohort 1 Students (N = 3,999) Control Percent of Students in Grade Cohort 1 and Enrolled All Four Terms (N = 2,247) Number Percent of All Grade Cohort 1 Students (N = 4,434) Percent of Students in Grade Cohort 1 and Enrolled All Four Terms (N = 2,337) Enrolled in treatment school all four terms and… took one or more Algebra 1 class and one or more Geometry class 1,224 30.6 54.5 1,377 31.1 58.9 took one or more Algebra 1 class and no Geometry class 466 11.7 20.7 378 8.5 16.2 took no Algebra 1 class and one or more Geometry class 364 9.1 16.2 372 8.4 15.9 took neither Algebra 1, nor Geometry 49 1.2 2.2 35 0.8 1.5 unknown, did not receive all course schedules but did take at least some Algebra 1 or Geometry 99 2.5 4.4 139 3.1 5.9 unknown, did not receive all course schedules, no Algebra 1 or Geometry on schedules received 45 1.1 2.0 36 0.8 1.5 Enrolled at treatment school less than four terms 1,752 43.8 2,097 47.3 R305R070025 61 Teacher Questionnaires Administration schedule and procedures. Teacher questionnaires were administered four times, at the beginning and end of each year the school participated in the study. At each wave, we attempted to include (1) all teachers who were teaching a target course that wave, and (2) all teachers who had taught a target course during a pervious wave and were still employed at the school. Target courses were Algebra 1, Geometry, ECED Literacy, 9th-Grade ELA, and 10thGrade ELA. The fall (Wave 1 and Wave 3) teacher surveys were administered on paper. Teachers received their survey either from the individual conducting the EAR Protocol Classroom Visits (see below), or it was placed in their mailboxes at school. Teachers were given stamped, addressed envelopes to return surveys. The teacher’s name appeared on a cover letter requesting their participation. Teachers were asked to remove the cover letter prior to returning the survey. The survey itself contained only the teacher’s study identification number. Teachers were reminded to complete the surveys on several occasions via email from the research project director or from the ECED research contact within the school. 27 Spring surveys (Wave 2 and Wave 4) were administered on-line. 28 Teachers received the web address for the survey (i.e., the url) and their personal access code via both email and notes placed in their school mailboxes. The survey access codes were different from the teacher study identification number and each one was used only one time. Teachers who did not complete the survey were reminded on several occasions via email and prompts from their ECED research contact within the school or from the research project manager. Response rates. Response rates varied considerably by administration wave and school. Table 5 shows the response rates for each wave, separately by treatment and control schools, as R305R070025 62 well as the reasons for non-response when known. Considering only teachers who were at the school when the surveys were administered, the response rate for math teachers ranged from 52% at Wave 1 control schools to 74% at Wave 4 treatment and control schools. For ELA, they ranged from 43% at Wave 1 control schools to 72% at Wave 4 control schools. Response rates were lower when looking at all teachers in the study, rather than just those who were teaching at the school at the time of administration. This was due to turnover. Many teachers were not employed at a study-school in all waves and therefore could not have participated in the survey. R305R070025 63 Table 5. Teacher Survey Response Rates Math Teachers Wave 1 N Participated (%) 29 Wave 2 ELA Teachers Wave 3 Tx Wave 4 Ctrl Tx Wave 1 Ctrl Tx Wave 2 Ctrl Tx Wave 3 Ctrl Tx Wave 4 Tx Ctrl Tx Ctrl Ctrl Tx Ctrl 116 50.0 122 40.2 116 50.9 122 116 122 116 122 48.4 39.7 48.4 55.2 59.0 166 132 166 132 166 132 166 132 34.3 34.1 48.8 53.8 38.0 44.7 44.0 55.3 10.7 26.5 22.0 Reasons for non-participation (%) not teaching target course 30 school did not allow 31 14.7 14.8 6.9 0 0 11.2 no longer/not yet at school 24.1 22.1 20.7 18.9 26.7 23.0 25.9 20.5 other 32 11.2 23.0 10.3 22.1 19.0 23.8 17.2 20.5 65.9 51.6 64.1 59.6 54.1 62.8 74.4 74.2 Participation among those teaching at the school at the time of administration (%) 33 3.4 4.9 0 0 0 11.2 0 1.7 0 0 8.4 16.7 0 11.4 4.8 6.1 4.2 0.8 0 10.8 0 1.8 0 30.7 21.2 22.9 15.9 27.7 23.5 29.5 22.7 8.4 22.7 8.4 13.6 18.7 25.8 20.5 21.2 49.6 43.3 63.3 64.0 52.5 58.4 62.4 71.6 R305R070025 64 Teacher questionnaire items and scale development. The items on the teacher questionnaires came primarily from IRRE’s past research. Appendix 5 lists all items in the teacher questionnaire, along with the order in which the item appeared on the questionnaires, the response options, the construct each was originally intended to measure, and the waves in which the item was included. Most of these items had been administered to teacher’s participating in IRRE’s First Things First supports for many years and had been revised repeatedly. The items specific to ECED implementation and professional development were written for the current project. Factor analyses using the ECED data resulted in a slightly different set of scales than the ones that had previously been used by IRRE. Data reduction and scale development were conducted on items included in the teacher survey. As noted above, the Wave 1, Year 1 survey for teachers in schools in the first recruitment group contained a subset of the full set of items administered in subsequent waves. Thus, Wave 2 data were used for the scale development. Due to the small sample size (n = 129) relative to the number of items, EFA’s were conducted on the full sample of teachers in the first recruitment group using most items. The factor structure was then confirmed using Wave 2 data from teachers in the schools in the second recruitment group. All analyses were conducted in MPlus (Muthén & Muthén, 1998-2009; Version 6.12). Standard errors were adjusted to take into account school cluster sampling. The initial EFA revealed that not all items loaded on their pre-hypothesized factors. The seven items making up the original construct of ‘Perceived Value of Professional Development’ loaded cleanly onto one factor (α =.93) and the three ‘Perceived Competence’ items loaded cleanly onto another (α =.88). These factors were left intact and removed from the exploratory analyses. Based on the initial EFA, we determined that the rest of the teacher items fell into two R305R070025 65 categories: (1) teachers’ ratings of support (teacher collective commitment, support from school administration, support from district administration); and (2) beliefs about change and morale (commitment to change, confidence in change, and teacher morale). A four-factor solution was chosen for the model that included items in the first category. These factors were: (1) teacher collective commitment composed of 3 items (α =.82); (2) teacher mutual support composed of 3 items (α =.85); (3) Support from district administration composed of 3 items (α =.77); (4) Support from school administration composed of 3 items (α =.83). A three-factor solution was chosen for the model with items in the second category. These were the same as the original constructs: (1) District/school commitment to change composed of 3 items (α =.87); (2) Confidence in change composed of 4 items (α =.86); and (3) individual teacher morale composed of 3 items (α =.70). Three items were dropped due to low loadings or cross loadings. 34 Correlations between the factors ranged from .21 for individual teacher morale and teacher mutual support to .76 for support from school administration and support from district administration. Four factors loaded onto a second-order factor with good model fit: (1) Support from district administration; (2) Support from school administration; (3) Commitment to change; and 4) Confidence in change. This factor—which we call Administrative Support for Instructional Innovation and Improvement—measures teachers’ beliefs that (1) administrators are responsive to teachers’ needs; and (2) efforts are being made to improve teaching and learning. The remaining four factors did not fit together well into a second-order factor. These will be analyzed separately. The primary impact analyses on the teacher survey were therefore conducted on one second-order factor and four first-order factors. See Table 6 for the final structure and factor loadings for the separate and second-order factors. R305R070025 66 Table 6. Standardized Results from Final CFA from Teacher Questionnaire Items Loading SE Teacher collective commitment Teachers at this school do what is necessary to get the job done right. Teachers at this school don't give up when difficulties arise. Teachers at this school go beyond the call of duty to do the best job they can. Teacher mutual support Teachers in this school encourage each other to do well. Teachers in this school share resources with each other. Teachers in this school go out of their way to help each other. Support from district administration District administrators are attentive to school personnel and provide the encouragement and support they need for working with students. District administrators allow school staff to try educational innovations that the teachers believe would be helpful for the students. District administrators are responsive to the needs of teachers and staff for professional development. Support from school administration School administrators understand and respond to the teachers' needs. School administrators support teachers making their own decisions about their students. School administrators help teachers and staff get what they need from the district office. Commitment to change How committed do you think the School Board is to making changes that will improve instruction and achievement in your district? How committed are the staff in the District Office to improving teaching and learning in the district? How committed is your superintendent to strengthening the quality of instruction within your district? Confidence in change How confident are you that your school is making changes that will improve the performance of your students? How confident are you that instruction can be improved in your school to ensure that all students experience high quality teaching and learning every day? 0.816 0.710 0.032 0.044 0.824 0.024 0.867 0.707 0.815 0.038 0.047 0.035 0.797 0.035 0.694 0.050 0.729 0.032 0.830 0.023 0.753 0.041 0.805 0.023 0.797 0.049 0.863 0.038 0.845 0.031 0.956 0.012 0.628 0.031 R305R070025 67 Loading SE How confident are you that your school is improving instruction in ways that can be sustained over time? How committed is your principal to supporting changes in the school that will improve the quality of teaching in all classrooms? 0.914 0.024 0.672 0.036 0.685 0.632 0.729 0.071 0.075 0.067 0.798 0.716 0.742 0.026 0.042 0.038 0.819 0.026 0.873 0.011 0.695 0.867 0.048 0.020 Perceived competence I am very confident in my abilities as a teacher. I think I am a very skilled teacher. I feel very competent as a teacher. 0.825 0.848 0.871 0.037 0.029 0.027 Administrative support for instructional innovation and improvement (second-order factor) Support from district administration Support from school administration Commitment to change Confidence in change 0.883 0.837 0.701 0.761 0.037 0.046 0.064 0.056 Individual teacher morale I look forward to going to work in the morning. When I am teaching, I feel happy. When I am teaching, I feel discouraged. (reversed) Perceived value of professional development Helped me to increase student engagement in my classes. Helped me to better understand the subjects I teach. Enhanced my classroom management skills. Helped me to challenge and encourage all students to work at or above grade level. Increased my use of effective instructional strategies for improving academic achievement. Increased the extent to which my instruction is aligned with the course standards and curriculum. Are likely to have a lasting impact on my instructional practices. Note. Results shown are from Wave 2 (Spring of Year 1) of the first and second recruitment groups; N = 281 Some of the original constructs were not included in the EFA. Items contributing to the Relative Autonomy Index (RAI) were left out of the exploratory analyses because they form a well validated scale intended to measure a continuum of motivation (Relative Autonomy Index; R305R070025 68 Grolnick & Ryan, 1989) and will therefore be analyzed separately. 35 Additionally, the items measuring amount and type of professional development and the items measuring implementation of ECED were not included because they were not intended to form a factor (i.e., not expected to be correlated with one another). Rather, they were intended to provide information on variation in experience. Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol The EAR Protocol is a 15-item observational tool completed by trained observers after a 20-minute observation designed to measure classroom-level Engagement, Alignment, and Rigor. Typically the tool is used by instructional leaders within their own school/district and by IRRE consultants working with schools, as a means of providing feedback to teachers and guiding professional development. Use of this tool by leaders and IRRE consultants for purposes of instructional improvement in the treatment schools was one component of the ECED intervention. Additionally, the research staff used the EAR Protocol in order to obtain independent, objective, systematic information from both treatment and control schools. The 15 items of the EAR Protocol appear in Table 7. For more details about the tool, its psychometric properties, and scoring, see Early et al. (2013 which appears in Appendix 2). Table 7 also provides means and standard deviations for the 3,558 EAR observations conducted in the classrooms of target teachers while instructing target classes as part of the ECED Efficacy Trial. This information is provided simply to give a picture of the type of distribution we see across a large number of observations. As described below, for analysis purposes, EAR observations were scored, and the scores were averaged for each teacher, at each wave. In addition to scoring the 15 items, observers recorded: the number of students present, the learning materials used (e.g., calculators, journals, smart boards), and the learning activities used (e.g., cooperative R305R070025 69 learning strategies, individual projects, lecture). Engagement. The EAR Protocol includes two items to assess engagement (labeled E1 and E2 on Table 7): the first item measures the percentage of students who are on-task and the second measures the percentage of on-task students who are actively and intellectually engaged in the work. This second item is scored using a combination of observations and, when possible, brief conversations with students. The conversations, which take place only if they will not disrupt the class, include questions such as “What does your teacher expect you to learn by doing this work?” and “Why do you think the work you are doing is important?” Using the scoring method presented in Early et al. (2013), the final Engagement score is the mean of the proportion of students on task (E1) and the proportion of students actively engaged in the work (E1 * E2). Thus, the final formula is (E1 + (E1 * E2))/2). Alignment. Observers make eight binary judgments about whether the learning materials, learning activities, expectations for students, and students’ class work reflect relevant federal, state, and local standards, as well as designated curricula. Only four of the eight items (A1c, A2c, A3, and A4) are included in the final Alignment score due to low variance on the others. Again using the scoring method presented in Early et al. (2013), the final Alignment score is the proportion of positive answers on those four indicators. Rigor. This construct is assessed with five judgments (three binary, two percentages) that relate to both the difficulty of the material presented and the extent to which students are required to demonstrate mastery of the required material. Items concern whether learning materials and instruction are appropriately difficult, whether students are expected to meet/surpass state standards, and whether they have an opportunity to demonstrate proficiency. Four of the five Rigor items (R1, R2, R3, and R4) are used to calculate the final Rigor score (see R305R070025 70 Early et al., 2013). In order to combine the rigor indicators on a common scale, they are standardized using estimates of population means and standard deviations from 1,551 observations conducted by the IRRE intervention team in 19 high schools in six school districts across the country between 2004 and 2010. After standardization, the four items are averaged together. R305R070025 71 Table 7. EAR Classroom Visit Protocol Items (n = 3,558) Item Engagement E1 % of students on task E2 % of students actively engaged in the work requested. Product of E1 * E2 36 Alignment A1a The learning materials did(1) / did not(0) reflect content standards guiding this class. A1b The learning materials were(1) / were not(0) aligned with the designated curriculum to teach those standards. A1c The learning materials were(1) / were not(0) aligned with the pacing guide of this course or grade level curriculum A2a The learning activities did(1) / did not(0) reflect content standards guiding this class. A2b The learning activities were(1) / were not(0) aligned with the designated curriculum to teach those standards. A2c The learning activities were(1) / were not(0) aligned with the scope and sequence of the course according to the course syllabus. A3 The student work expected was(1) / was not(0) aligned with the types of work products expected in state grade level performance standards. A4 Student work did(1) / did not(0) provide exposure to and practice on high stakes assessment methodologies. Rigor The learning materials did(1) / did not(0) present content at an R1 appropriate difficulty level. The student work expected did(1) / did not(0) allow students to R2 demonstrate proficient or higher levels of learning according to state grade level performance standards. Evaluations/grading of student work did(1) / did not(0) reflect state R3 grade level performance standards. % of students required to demonstrate whether or not they had R4 mastered content being taught. % of students demonstrated threshold levels of mastery before R5 new content was introduced. Mean SD 79.20 72.09 60.59 20.37 22.92 27.01 .96 .19 .96 .19 .90 .30 .95 .22 .94 .23 .89 .31 .79 .41 .42 .49 .93 .26 .53 .50 .32 .47 39.35 39.20 15.64 25.42 R305R070025 72 Classroom Observer Training. Training on the EAR protocol is typically led by IRRE and consists of (1) two full days of group instruction, including several classroom visits with discussion of scoring, (2) a two to three week window during which individuals participating in the training make practice visits as teams to calibrate their scoring, and (3) two additional full days of group instruction focusing on calibration and use of the data for school improvement. Generally, these training sessions are conducted in school districts with all or most of the participants being school district employees, such as assistant principals or academic coaches, who will be using the protocol as part of their district’s instructional improvement efforts. See Appendix 6 for details about each data collectors’ training experiences. The seven individuals who independently collected EAR Protocol data for ECED were all highly experienced educators. All had been classroom teachers and most had been school and district administrators (e.g., superintendent, principal, or assistant principal). Several were currently consulting with districts across the nation. None had any type of relationship (past or present, employee or consultant) with the districts in which they collected data. 37 All were blind to the treatment status of the schools in which they collected EAR data. EAR Classroom Protocol Schedule. At each wave, classroom observers spent two (typically consecutive 38) weeks in each district, conducting EAR Protocol visits. Typically, there were two classroom visitors in each district for both weeks, but in districts with a small number of teachers there was occasionally one classroom visitor one week and two the other week. The research project director and a research associate created schedules for each visitor for each day, indicating which teachers should be visited during each period. The goal was for each observer to conduct 12 observations on each observation day; however, due to teacher absences and other scheduling conflicts, observers often completed 10 or 11 visits in a day. R305R070025 73 At each wave, the goal was to observe each teacher who was teaching a target class a minimum of two times. The target classes were Algebra 1, Geometry, ECED Literacy, and 9thand 10th-grade English/Language Arts. If the teacher taught multiple sections of target classes, the observers attempted to visit each section of his/her target classes at least once. Note that the individuals targeted for EAR visits were slightly different from those targeted for teacher surveys. EAR Protocol visits were only made to target classes, so teachers who were not teaching a target class during a particular term were not visited. This is because the EAR Protocol is specific to instruction in a particular course, so including non-target classes would not have been meaningful. However, not only teachers who were teaching a target class but also any teacher who had taught a target class during a pervious wave was asked to complete the survey. The surveys cover general experiences and attitudes and were not tied to individual courses, so survey responses from teachers who were not currently teaching a target course are meaningful for conveying teachers’ attitudes and feelings in the participating schools. Table 8 shows the mean number of observations completed of each target teacher at each wave. The mean was always between 2 and 3. As seen on the last line of Table 8, when teachers were at the school and teaching a target course, they were almost always observed. However, as noted earlier, the high amount of turnover—both in terms of who was teaching in the schools and who was teaching target courses—led to large amounts of missing data (see the second to last line of Table 8). R305R070025 74 Table 8. Number of EAR Classroom Visit Protocols Collected Math Teachers Wave 1 Wave 2 ELA Teachers Wave 3 Wave 4 Wave 1 Wave 2 Wave 3 Wave 4 Tx Ctr Tx Ctr Tx Ctr Tx Ctr Tx Ctr Tx Ctr Tx Ctr Tx Ctr n 116 122 116 122 116 122 116 122 166 132 166 132 166 132 166 132 Observed (%) among all study teachers 39 59.5 55.7 56.9 65.6 50.0 56.6 64.7 63.9 41.6 51.5 48.8 64.4 45.2 53.0 51.8 61.4 2.73 2.49 2.67 2.64 2.59 2.58 2.76 2.50 .85 .87 .87 .85 .88 1.13 .94 .79 1-4 1-5 1-5 1-5 1-5 1-7 1-5 1-4 2.48 2.34 .95 .82 1-4 1-4 2.58 .92 1-5 2.36 2.77 2.56 2.58 2.37 .90 1.07 1.13 .96 .93 1-5 1-5 1- 6 1-6 1-6 15.5 19.7 11.2 14.8 11.2 19.7 26.5 22.7 12.7 17.4 14.5 22.0 15.7 15.9 Observations per teacher Mean SD Range Reasons for no observation (%) not teaching target course school did not allow observations 40 no longer/not yet at school other 41 Observed (%) among teachers teaching target course 42 0 0 11.2 0 11.2 0 6.9 12.3 1.7 0 0 0 11.4 0 10.8 0 1.8 0 24.1 22.1 20.7 18.9 26.7 23.0 25.9 20.5 0.9 2.5 0 0.8 0.9 0.8 0.9 3.3 30.7 21.2 1.2 4.5 22.9 4.2 15.9 27.7 23.5 29.5 22.7 2.3 1.8 1.5 1.2 0 98.6 95.8 83.5 98.8 80.6 98.6 96.2 95.1 97.2 91.9 75.7 96.6 78.1 97.2 94.5 100 R305R070025 75 Inter-rater agreement. To assess inter-rater agreement, data collectors visited some classrooms in pairs during each wave of data collection. They stayed in the same classroom for 20 minutes and then completed their scoring of that classroom, independently and without discussion. Once each person had finished scoring, they discussed any discrepancies in their scores and settled on a set of consensus scores. They did not change their own scores after the discussion began. Across the four data collection waves, 281 visits were made in which two data collectors were present. The intra-class correlations (one-way random, single measures) across those pairs of original scores (not consensus scores) were Engagement = .73, Alignment = .63, and Rigor = .70. See Table 9 for the intra-class correlations (one-way random, single measures) by wave. As would be expected, the intra-class correlations were higher when observers’ scores are compared with the consensus scores agreed upon by the pair of observers after discussion. Across all visits, the intra-class correlations (one-way random, single measures) with the consensus scores were Engagement = .91, Alignment = .82, and Rigor = .86. 43 As a reference, Cicchetti (1994) referred to ICCs below .40 as “poor”, those between .40 and .59 as “fair”, those between .60 and .74 as “good”, and those between .75 and 1.00 as “excellent.” Table 9. EAR Intra-Class Correlations (one-way random, single measures) By Wave n Engagement Alignment Rigor Wave 1 Wave 2 Wave 3 Wave 4 Overall 77 0.697 0.552 0.540 61 0.671 0.707 0.694 59 0.691 0.788 0.752 84 0.799 0.563 0.752 281 0.726 0.633 0.695 R305R070025 76 Student Questionnaires Administration schedule and procedures. Student questionnaires were administered four times, at the beginning and end of each year that the school participated in ECED. As noted earlier, students in Grade Cohort 1 (9th-grade in Year 1 and/or 10th-grade in Year 2) participated in all four waves of student questionnaires if they were enrolled during all four waves. The other two grade cohorts were each given surveys two times. Grade Cohort 2 was in 10th-grade in Year 1 and took part in the surveys in Year 1 only (Waves 1 and 2). Grade Cohort 3 was in 9th-grade in Year 2 and took part in the surveys in Year 2 only (Waves 3 and 4). Table 10 indicates which survey data we attempted to collect for each grade cohort. Student surveys were typically administered on-line, during the school day. Often surveys were administered during English or Literacy Matters classes because most students were enrolled in those, but that decision was entirely up to the schools. See Appendix 7 for a more detailed description of the administration procedures. Table 10. Data Collection Plan Wave 1 Surveys Wave 2 Surveys Year 1 Student Records Wave 3 Surveys Wave 4 Surveys Year 2 Student Records Grade Cohort 1 (9th-grade in Year 1 and/or 10th in Year 2) X X X X X X Grade Cohort 2 (10th-grade in Year 1) X X X X X X Grade Cohort 3 (9th-grade in Year 2) Response rates. Every effort was made to ensure that all students had a chance to participate in the surveys. Each participating school designated a research contact. The research project director or research associate met with each research contact at the start of each R305R070025 77 administration to review the methods, distribute the tickets and strategize about administration. During the administration windows, the research project director or research associate communicated regularly with research contacts, providing lists of students who had not yet been given a chance to take the survey, reminding them of the importance of a high response rate, and troubleshooting as needed. Nonetheless, student survey response rates varied dramatically by administration wave and school. Table 11 shows the response rates for each grade cohort, as well as the reasons for non-response when known. Considering only students who were enrolled at the time the rosters were created (last line of Table 11), the response rate for Grade Cohort 1 ranged from a low of 64% at Wave 2 treatment schools to a high of 83% at Wave 1 treatment schools. R305R070025 Table 11. Student Survey Response Rates 78 Grade Cohort 1 (9th-grade in Year 1) Wave 1 Tx Total cohort (N) Participated (%) Wave 2 Ctrl Tx 3,999 4,434 3,999 66.9 60.4 Wave 3 Ctrl Tx Grade Cohort 3 th th (10 -grade in Year 1) Wave 4 Ctrl Grade Cohort 2 Tx Wave 1 Ctrl 4,434 3,999 4,434 3,999 4,434 Tx (9 -grade in Year 2) Wave 2 Ctrl Tx Wave 3 Ctrl 3,108 3,539 3,108 3,539 Tx Wave 4 Ctrl 3,408 3,643 Tx Ctrl 3,408 3,643 50.2 51.2 54.3 57.1 54.1 54.2 76.2 63.0 60.3 63.2 69.0 75.9 62.1 73.8 2.5 0.7 1.2 0.7 2.6 0.7 1.4 2.0 1.1 1.1 1.5 1.3 2.3 0.9 0.9 0.6 2.3 0.1 0.8 0.7 2.4 0.7 0.4 1.4 1.1 1.4 0.4 1.4 1.0 1.3 1.0 0.5 0.0 0.6 0.0 0.5 0.2 3.8 0.1 0.1 0.0 0.4 0.1 0.5 0.2 0.6 0.9 0.2 0.2 0.3 0.6 1.3 1.7 0.2 0.3 0.3 0.5 0.2 0.3 1.0 1.1 2.5 1.4 1.0 0.8 1.1 1.1 1.1 0.8 2.8 2.0 2.3 0.5 1.3 1.1 0.0 11.5 0.0 8.4 0.0 0.0 0.0 0.0 0.0 11.1 0.0 12.9 0.0 13.3 0.0 7.5 13.4 11.4 21.4 6.4 13.1 13.6 10.3 9.3 25.1 13.6 20.2 6.7 13.4 11.1 11.5 not enrolled this wave 19.8 20.2 21.2 22.5 25.8 26.5 27.3 29.6 6.4 8.3 10.2 11.1 6.4 7.5 9.6 10.0 Participated (%) among only those enrolled at time rosters were created 83.4 75.7 63.7 66.0 73.2 77.8 74.4 76.9 81.4 68.7 67.1 71.1 73.7 82.1 68.7 82.0 Reasons for non-participation (%) parent refusal 1.4 student refusal 0.5 excluded by 1.1 school 44 enrolled but not 1.8 attending 45 disenrolled 46 1.0 school did not administer this 0.0 wave 47 unknown R305R070025 79 Student questionnaire items and scale development. The items on the student questionnaire came primarily from IRRE’s past research. They had been administered to students participating in IRRE’s First Things First supports for many years and had been revised repeatedly. See Appendix 8 for a complete list of the items and the original construct each was intended to measure. Although the items in the student survey have been used extensively by IRRE, a complete factor analysis using past data was not available. For that reason, we conducted our own data reduction and scale development using factor analysis on items included in the student survey. 48 All analyses were conducted in MPlus (Muthén & Muthén, 1998-2009; Version 6.12). Exploratory factor analysis (EFA) was first conducted in a randomly selected half of the students in the schools in the first recruitment group in Wave 1 (Fall of 2009). The factor structure was then confirmed on a second random half of the students in schools in the first recruitment group. To further support the results, the same analyses were conducted using the students in the schools in the second recruitment group. Standard errors were adjusted to take into account school cluster sampling. Through an initial EFA, we determined that the student survey items fell into two categories: (1) students’ ratings of their teachers (teacher expectations, teacher rigor, teacher support); and (2) students’ ratings of themselves (engagement in school, perceived competence). Exploratory analyses were conducted on each set of items separately. A two factor solution was chosen for each model. The two factors that emerged from the items measuring students’ ratings of their teachers were: (1) Positive teacher support/expectations, composed of 9 items (α =.76); and (2) Lack of teacher support, composed of 5 items (α =.76). The two factors that emerged from the items measuring students’ ratings of themselves were: (1) Engagement in school R305R070025 80 composed of 6 items (α = .61); and (2) Perceived competence composed of 6 items (α =.80). Five items were dropped due to low loadings or cross loadings. 49 Correlations between the factors ranged from .51 for perceived competence and lack of teacher support to .77 for perceived competence and engagement in school. These four factors were then combined and tested as a second order factor. Results showed an adequately fitting second order factor, representing students’ attitudes towards school. The goal of obtaining a second order factor was to avoid a multiple comparisons problem when examining impacts. The primary impact analysis on student survey items were therefore conducted on this second-order factor. Follow-up sensitivity analyses were also conducted on the separate factors. See Table 12 for the final factor structure and loadings for the separate factors and second order factor. R305R070025 81 Table 12. Standardized Results from Final CFA from Student Questionnaire Items Loading SE Positive teacher support/expectations My teachers show us examples of the kinds of work that can earn us good grades. 0.650 My teachers make it clear what kind of work is expected from students to get a good grade. 0.732 0.009 My teachers expect all students to do their best work all the time. 0.661 0.011 My teachers expect all students to come to class prepared. 0.579 0.015 I am asked to fully explain my answers to my teachers questions Our classroom assignments and homework make me think hard about what I’m learning. 0.342 0.012 0.372 0.012 My teachers make sure I understand before we move on to the next topic. 0.620 0.008 My teachers care about how I do in school. 0.735 0.012 My teachers like to be with me. 0.517 0.010 My teachers like the other students in my classes better than me. (R) 0.672 0.009 My teachers interrupt me when I have something to say. (R) 0.692 0.008 My teachers don’t make clear what they expect of me in school. (R) 0.636 0.007 My teachers are not fair with me. (R) 0.818 0.007 My teachers’ expectations for me are not realistic. (R) 0.542 0.009 It is important to me to do the best I can in school. 0.726 0.011 I work very hard on my schoolwork. 0.749 0.011 I don’t try very hard in school. (R) 0.631 0.012 I pay attention in class. 0.724 0.009 I often come to class unprepared. (R 0.411 0.011 A lot of the time I am bored in class. (R) 0.399 0.035 When I’m doing a class assignment or homework, I understand why I’m doing it. 0.606 0.008 I feel confident in my ability to learn at school. 0.751 0.006 I am capable of learning the material we are being taught at school. 0.622 0.011 I feel able to do my schoolwork. 0.726 0.009 I feel good about how well I do at school. I know what kind of work it takes to get an A in my classes. 0.708 0.563 0.011 0.010 Positive teacher support/expectations 0.819 0.007 Lack of teacher support 0.683 0.018 Engagement in school Perceived competence 0.818 0.867 0.011 0.008 0.012 Lack of teacher support Engagement in school Perceived competence Second-order factor Note. Results from Wave 1 from both recruitment groups and all grade cohorts; n = 9,983. R305R070025 82 Student Demographics As a condition of participation, each district agreed to provide students’ school records to the research project. 50 These records were provided attached to student IDs. Student IDs were transformed into study IDs by a consultant (i.e., not a member of the research team), so they could be linked to student questionnaire responses and other data. Records included: demographic characteristics, 7th- through 10th-grade standardized test scores, course schedules, attendance, credits towards graduation, and grade point averages. Additional details about each type of record are provided in the next sections. See Chapter VII for a discussion of the challenges encountered in obtaining the school records. For each study student for each year of participation, we requested the following data: grade in school, gender, ethnicity/race (including Hispanic origin), data of birth, free or reduced price lunch status, English language learner status, and special education status. Because different districts maintain these types of records in different formats, some response categories had to be collapsed to make the variables comparable across districts. 51 As noted above, we obtained demographic information from the districts for almost all study students. When the information was missing from the district files for a student who had completed a survey (~1.3% of cases), we used his or her gender and race/ethnicity as reported on the student survey. After filling in data from the student surveys, race/ethnicity was missing for 2.8% of cases, and gender was missing for 2.5% of cases. We did not use student reports of special education, English fluency or free/reduced price lunch because the questions asked did not map directly onto the district data received. Thus, demographic variables other than race/ethnicity and gender were missing for roughly 4.1% of cases. R305R070025 83 Course Enrollment For each of the four terms that a district participated, we requested each study student’s course schedule, including name of course, teacher ID, period/block, and final grade. Districts provided relatively complete records for students who were enrolled for the entire school year. On the other hand, course file information was often missing for students who had only been enrolled for part of the year. Additionally, because this information was often housed in multiple databases within the district (e.g., often the final grades and period information were not in the same database) districts often sent multiple data files. Information within the files was often contradictory, requiring extensive cleaning in order to create a single, cohesive set. See Appendix 9 for details on how the course files were restructured for analysis. Standardized Test Scores The five participating districts were in four different states. This meant that there were four different sets of standardized tests administered to the students in this study. We requested that the districts send all available 7th-, 8th-, 9th- and 10th-grade math and ELA scale scores for all study students. 52 Below is a description of each state’s tests, followed by a description of how they were combined to make the scores comparable across tests and districts. Appendix 10 shows the types of tests for which we received scores in each district, as well as the number and percentage of scores received for students in Grade Cohort 1. State testing systems. Districts 1 and 2 (California). Two types of standardized tests scores were requested and received from Districts 1 and 2: California Standards Test (CST) and California High School Exit Exam (CAHSEE). The CST is administered in the spring of each year in grades 2nd through 11th including math and ELA tests each year. In math in 7th- and 8th- grade, students either took a R305R070025 84 grade-level test (e.g., 7th-grade math) or a subject specific test (e.g., Algebra 1) depending on their course enrollment. In 9th- and 10th-grade, they took either ‘General Math’ or a subject specific test such as Algebra 1, again depending on their course enrollment. In English, they took a grade specific test each year. District 3 (Tennessee). Students in District 3 took the Tennessee Comprehensive Assessment Program (TCAP) Achievement Tests in the spring of each year from 3rd- through 8th-grade. Starting in 9th-grade students took TCAP End of Course exams in various subjects, depending on course enrollments. There were separate tests for 9th-, 10th- and 11th-grade English, as well as for Algebra 1 and 2. The state of Tennessee did not administer a standardized Geometry test during the time District 3 was participating in ECED. However, District 3 did administer its own geometry test, created by the district rather than the state, to all students enrolled in Geometry. ECED requested and received those scores. District 4 (Arizona). Students in District 4 took the Arizona Instrument to Measure Standards (AIMS) exam in grades 3 through 8. Typically it includes reading, writing, and math; however the 8th-grade writing test was suspended after 2009. Starting in the spring of grade 10, students took the AIMS Exit Exam in reading, writing, and math. Students who did not pass the first administration of the AIMS Exit Exam continued to take it each semester until they passed; however we requested and received only scores from the first time the student took the test. Arizona does not have a state-wide standardized test in 9th-grade; but District 4 9th-grade students took the Stanford 10, a nationally standardized test, in math, language, and reading. ECED requested and received 7th- and 8th- grade AIMS test scores, 9th-grade Stanford 10 scores, and scores from the first administration of the AIMS Exit exam (10th-grade). However, District 4 was a ‘high school only’ district meaning that it oversaw the local high schools only R305R070025 85 and the students entered the high schools from five separate elementary feeder districts. District 4 did not routinely receive 7th- and 8th-grade scores from these feeder districts and had to request them especially for ECED. Some feeder districts were reluctant, resulting in significant missing data, although we did receive at least some 7th- or 8th-grade data from each of the five feeder districts. District 5 (New York). Students in District 5 took part in the New York State Testing Program (NYSTP) in grades 3 through 8 in ELA and math. Starting in 9th-grade, students had to pass a certain number of Regents Exams in order to be eligible for graduation. The exact Regents exams taken each year depended on the student’s course schedule. Most students took a math exam in 9th- and/or 10th- grade; however, very few students took an ELA exam in those grades. Because the ECED intended to use ELA achievement on standardized tests in the 10th-grade as a primary outcome, we arranged for 10th-grade students in District 5 to take the Gates MacGinnitie Reading (GMRT) test at the end of Years 1 and 2. The GRMT is a group-administered, paper and pencil test designed to assess student achievement in reading at all levels (kindergarten through adult). It includes separate tests of comprehension and vocabulary, which can be combined into an overall reading score. ECED paid the costs of purchasing and scoring these exams. Unfortunately, despite a prior written commitment to administer this test to all 10thgraders each year, the actual administration/response rate was low (i.e., 27% of Grade Cohort 1 students took the GRMT in Year 2). Combining test scores. As is clear from the description of each state’s testing system, there was wide variation in the timing and content of the tests across states, but our analyses required that they be combined into comparable scores indicating student performance, relative to peers, in each subject. To that end, for each student in Grade Cohort 1, we sought to calculate R305R070025 86 six test scores: math at baseline, math at Year 1, math at Year 2, ELA at baseline, ELA at Year 1, and ELA at Year 2. We did not intend to calculate Year 2 test scores for Grade Cohort 2 or Year 1 test scores for Grade Cohort 3 because those students were not in the study at that time. The general rules used to create common test scores across districts are outlined below. Following the general rules is a description of the rationale for these rules and descriptions of district specific decisions that were made in order to apply the general rules. 53 The general rules applied for combining tests onto a common scale were: (1) standardize each test within test name and district, but across administration years 54 and grade cohorts; (2) for math, when students had more than one score at baseline, Year 1, or Year 2, use the lowest level test 55 (e.g., if a student had both Algebra 1 and Geometry achievement scores use the Algebra 1 score); (3) for Year 1 and Year 2 ELA, when a student had more than one test in a single year, use the one that corresponded to his or her grade level (e.g., use 10th-grade test for a 10th-grader who took both 9th- and 10th-grade), if both were at the same level (e.g., 9th-grade reading and 9th-grade language), use the average of the two; (4) for baseline ELA, when students had more than one test, take the average of all tests (e.g., mean of 7th- and 8th-grade). Additionally, for math, we created three indicator variables (baseline, Year 1, and Year 2) to indicate the subject and level of the test used for that student’s math score (e.g., 7th-grade math, Algebra 1, Geometry, Algebra 2). The decision about how to treat math scores for Year 1 and Year 2 stemmed from three competing concerns. First, we did not want to discard data if at all possible because that would result in imputing scores in cases where the district had provided data. We believed that the actual score on a standardized test would be better information than the imputed score. Second, we were concerned that math course taking – and therefore math test taking – might be R305R070025 87 influenced by the intervention itself. Third, we were concerned that combining different level math tests would introduce error because the same student might have received a higher score if she or he had been given a lower level test. The first concern was addressed by keeping at least one score per student, even if that student took a test taken by few other students. The second concern was addressed by reviewing test taking in each district and finding no clear pattern of test taking across treatment and control schools. In some districts students took higher tests in treatment schools, in some districts students took higher tests in control schools, and some showed no difference. Further, analyses of math achievement will control for the level of the tests (e.g., Algebra 1, Geometry, etc.) to account for the fact that different districts administered different tests and test level often depended on students’ course schedules. We were only able to partially address the third concern. We did this by selecting the lowest level math score for students who took more than one test in a given year. However, across students we still combined multiple levels of tests (standardized within test) into the same variable. There was no way to fully address that concern. The approach for combing math baseline scores and its rationale were slightly different from the approach and rationale for Years 1 and 2. At baseline, we often had a score for the same student in both 7th- and 8th-grade. In those cases, we used the 7th-grade score for baseline. This was to minimize the range of tests included. In all districts, 7th-graders take a 7th-grade math test, but in many districts the math test taken in 8th-grade depended on his or her course taking, with relatively large groups of students taking the Algebra 1 tests in 8th-grade. In general, we would expect that those are the district’s more advanced 8th-graders, but they are taking a harder test than those taking the 8th-grade math tests, possibly resulting in lower scores despite being advanced students. In order to minimize the number of different tests being combined into a R305R070025 88 single score, we opted to take the lowest math tests at baseline, meaning that the baseline score often comes from the 7th-grade (more than one year before the start of the study). When we did not have a 7th-grade test score, we used the 8th-grade score. Thus, multiple tests were sometimes still being combined, but this strategy minimized the concern. Most students had only one ELA test in Year 1 and one ELA test in Year 2, so we used that one. A few students, however, had more than one. When the two tests were at different levels (e.g., 9th-and 10th-grade), we used the one that matched the students’ grade to avoid combining tests within grade cohort. When the two tests were at the same level (e.g., 9th-grade reading and 9th-grade language), we took the mean of the two scores, to represent the broadest conceptualization of ELA and minimize error. At baseline, when students had more than one ELA score (7th- and 8th-grade) we elected to average the two together, after standardization, to minimize error. Unlike math, in 8th-grade most students took the same ELA tests and the level of the test was not linked to the student’s course taking or past ELA achievement. Additionally, the nature of ELA tests is somewhat different from math. In ELA, each year builds systematically on the previous year, with no qualitative shift in content making averaging across years appropriate. For that reason, it seemed like the average of the two scores would result in the least error. In math, on the other hand, averaging different tests together seemed more problematic because quite abrupt changes in content could be found across course and test. After establishing these general rules for combining scores, we were still faced with several decisions resulting from the unique testing system in each state. Appendix 11 details state-specific decisions to address each state’s unique issues. R305R070025 89 Student Performance (Attendance, Credits, GPA) Attendance. We requested number of days enrolled and number of days present for each student for each year the district participated; however, District 4 was not able to provide attendance data at the student level, so attendance data for those students were imputed 56 Credits earned. In addition to passing certain required courses and exam(s), students must earn a certain number of credits in order to graduate from high school. The number needed varies by district. Additionally, each district specifies a certain number of credits a student must earn each year to be considered on-track for graduation. Thus, the proportion of credits earned out of the number needed to be on-track for graduation can be used as an indicator of progress toward graduation. We requested that each district provide the number of credits each student earned by the end of each year they participated. Four of the five districts provided the needed information for both years. District 2 was only able to provide that information for Year 2, Rather than imputing data for an entire district for Year 1, we analyzed Year 2 credits data only. Grade point average (GPA). We requested that each district provide each student’s total, unweighted grade point average at the end of each year of participation. This value includes all courses that the student took, rather than just English and math courses included in the course file. Four of the five districts provided the needed information for both years. District 4 was only able to provide that information for Year 2. As with credits, rather than imputing data for an entire district for Year 1, we analyzed Year 2 GPA data only. Missing Data Amount of missing data. Table 13 through Table 15 show the percentage of students and teachers (math and ELA separately) for whom we obtained valid (non-missing) data on some of the key variables. The student table includes students who were in Grade Cohort 1. As seen in R305R070025 90 Table 13, we obtained key demographic variables such as ethnicity and free/reduced price lunch for almost all students. The percentage of valid data for surveys and test scores were much lower and varied considerably by district. Among teachers, there is a relatively high amount of missing data of all types. Note that values on all three tables are based on ‘intent to treat.’ That is, they reflect all students or teachers who were enrolled/employed in a study school at any point in the two years their school participated. At each wave, some of the missing data results from the fact that some students were not enrolled and some teachers were not employed (i.e., had left the school or had not yet enrolled/begun employment) and therefore were not available to provide data. Those data are counted as missing at those time-points. Among students, 46% enrolled after the first wave of data collection, left prior to the last wave of data collection, or both. Among math teachers, 41% were not employed at the school during one or more terms of the study. Among ELA teachers, 46% were not employed one or more terms. R305R070025 91 Table 13. Percentage of Grade Cohort 1 Students for Whom We Have Valid Data on Key Variables Total Students Ethnicity Free/ Reduced Lunch Wave 1 Survey Wave 4 Survey Baseline ELA Score Year 1 ELA Score Year 2 Baseline ELA Math Score Score Year 1 Year 2 Math Math Year 2 Score Score GPA Year 2 Credits Year 2 Attendance 57 District 1 District 2 District 3 District 4 District 5 2,628 1,590 1,264 2,222 729 99 93 89 98 99 98 95 86 98 94 63 68 60 68 49 59 55 43 60 38 82 77 72 60 70 82 76 67 69 0 75 76 58 73 32 82 77 72 60 71 78 76 57 69 54 71 73 61 71 49 71 85 90 67 85 71 80 91 47 85 81 72 63 0 86 Treatment Control 3,999 4,434 97 96 96 94 67 60 54 54 75 71 68 68 68 68 75 70 69 70 69 67 77 77 75 66 56 55 Overall 8,433 96 95 63 54 73 68 68 73 70 68 77 70 56 92 Table 14. Percentage of Math Teachers for Whom We Have Valid Data on Key Variables Total Teachers Ethnicity Wave 1 Survey Wave 4 Survey Wave 1 EAR Visit Wave 4 EAR visit District 1 District 2 District 3 District 4 District 5 61 51 38 51 37 72 71 100 98 60 36 39 53 55 46 69 57 71 47 38 49 67 58 63 51 74 65 63 67 46 Treatment Control 116 122 85 75 50 40 55 59 60 56 65 64 Overall 238 80 45 57 58 64 Table 15. Percentage of ELA Teachers for Whom We Have Valid Data on Key Variables Total Teachers Ethnicity Wave 1 Survey Wave 4 Survey Wave 1 EAR Visit Wave 4 EAR visit District 1 District 2 District 3 District 4 District 5 83 53 45 73 44 68 74 98 99 46 22 43 53 34 27 52 62 62 43 25 35 72 56 41 34 49 62 64 63 41 Treatment Control 166 132 81 75 34 34 44 55 42 52 52 61 Overall 298 78 34 49 46 56 Multiple imputation. We used multiple imputation (Rubin, 1987) to handle missing values, which can generate multiple completed datasets and mitigate the uncertainties induced by missing data. For student data, we used a combination of a latent class approach (Si & Reiter, 2013) for categorical data and the R package “mi” for continuous data to generate the multiply 93 imputed datasets. For teacher data, we used the latent class approach exclusively, treating all variables as categorical. The following steps were taken to generate five multiply imputed datasets for both students and teachers. For student data, we implemented a two-step imputation procedure because there was a mix of categorical and continuous variables. We first imputed the student demographic and survey data using the latent class approach proposed by Si and Reiter (2013), which can flexibly and efficiently deal with a large number of categorical variables with complex dependency structures. This approach assumes that the students are divided into several latent classes. Within each class, the variables are conditionally independent, meaning the variables are independent and have the same distributions. The number of classes and class assignment is determined by the data information. The missing values are then imputed within each class. As informative background variables, we included school district, school code, and treatment status. We also included the following student demographic variables: grade, gender, ethnicity (5 categories), age at baseline (in years), English language learner, special education, and free or reduced price lunch. Keeping all survey items in their original nominal scales, we used the latent class approach to impute the missing values of all the categorical variables using Markov chain Monte Carlo computation for 5,000 iterations. Five completed datasets were generated. Conditional on the imputed demographic information from one randomly chosen dataset of the five generated above, we then used the R package “mi” (developed by Andrew Gelman’s research group at Columbia University) to impute the continuous variables which included students’ test scores, attendance, GPA, and credits earned. “Mi” uses an algorithm known as a chained equation approach: the user specifies the conditional distribution of each variable with missing values conditioned on other variables in the data, and the imputation algorithm 94 sequentially iterates through the variables to impute the missing values using the specified models. We used the imputed demographic information, treating them as unordered categorical, from one randomly chosen dataset of the five generated above as covariate to impute the student outcomes in the iterative regression models. We also include an interaction term between state and treatment status as a covariate. We first imputed the baseline math and ELA test scores, as well as the indicator for type of math test taken. We ran “mi” for five chains with 50 iterations for each of them and obtained five completed datasets that included the students’ baseline demographic characteristics and baseline test scores. We then sequentially imputed Year 1 and Year 2 outcomes conditional on each completed baseline dataset. For the outcome variables in Year 1, we imputed standardized math test scores as well as the indicator for type of math test taken, standardized English test scores, standardized proportion of days present, standardized GPA, and credits earned divided by credits needed to be on track for graduation. We treated the standardized variables as continuous and the proportion of credits variable as non-negative continuous. Based on each baseline dataset, we ran “mi” with 50 iterations for one chain and obtained one completed dataset that included Year 1 outcomes and baseline data. Next, conditional on each completed Year 1 dataset, we impute Year 2 outcome variables defined similarly as above in Year 1. We ran “mi” for 50 iterations with one chain conditional on each completed Year 1 dataset. Finally, we obtained five completed datasets that included students’ demographic characteristics, baseline test scores, Year 1 and Year 2 test scores, Year 1 and Year 2 standardized attendance, Year 1 and Year 2 standardized GPA, Year 1 and Year 2 credits earned, and the three (baseline, Year 1, Year 2) math test type indicator. 95 For the imputation of teachers’ demographic and survey data, we assumed the variables were nominal and imputed the data using the latent class approach. The imputation procedure followed that of the student demographic and survey data procedures. Suitability of the Data for Analyses The primary purpose of the ECED Efficacy Trial was to estimate the impacts of ECED on student attitudes towards school, scores on standardized tests of achievement, and performance such as attendance and progress toward graduation, and on teacher attitudes and experiences and their observed classroom practices. The data collection described here was successful in providing data that allow for unbiased comparisons between treatment and control schools. In all cases the two conditions were treated comparably and, when baseline differences between conditions were identified they will be controlled in the analyses. Thus, we are confident that data serve the intended purpose and are appropriate for use in treatment versus control groups, point-in-time analyses of all outcomes of interest. Further, because most data were collected with identical measures across the two years of the study, most of the data are also appropriate for longitudinal growth curve analyses. However, the one exception is the student test scores. Students in the four different states took different tests and there was between-state variation in the goals of the tests (e.g., end of course versus high school exit exams). Further, students’ took different tests of different topics (e.g., Algebra 1 versus Geometry) across the baseline, 9th-, and 10th-grade years, In order to combine across the varied testing systems and time-points, we had to standardize within test. Thus we cannot be certain that across time, the scores are directly comparable, so the test scores are not appropriate for growth curve analyses. In Chapter VI, where student impact results are reported, point-intime analyses are presented for achievement scores, but growth curve analyses are not. 96 Additionally, it should be noted, that although there is no reason to believe these data are biased, they do contain significant measurement error. Throughout this report we described various difficulties encountered in the structural implementation of ECED (e.g., “dosage” issues due to teacher and student movement, and scheduling of targeted classes) and in data collection, including major issues such as two schools ceasing participation, high turnover at the administrative and teacher levels, between-state differences in testing systems, and districtmaintained data systems that were difficult to navigate. In all cases, our solutions have been conservative and erred in the direction of minimizing bias, at times at the expense of adding measurement error. This approach lowers our chances of finding ECED impacts, but increases our confidence in any impacts we do see. Analysis Plan The primary question this study sought to answer was: Is the ECED intervention efficacious in changing students’ math and ELA achievement, school performance and commitment, and attitudes towards school? Additionally, because teachers were key to the success of students, we sought to learn if the ECED intervention was efficacious in improving instruction and teachers’ experience of support, competence, and engagement. As a first step in devising an analysis plan, we considered difference in treatment and control on baseline characteristics, using the imputed data. As seen in Table 16, the randomization did not lead to entirely equivalent group. For that reasons, covariates were added to all models to account for pre-existing differences. 97 Table 16. Treatment versus Control Baseline Characteristics Using Imputed Data Baseline characteristic Student Gender (%) Girls Boys Race/ethnicity (%) Hispanica Non-Hispanic Blacka Non-Hispanic Whiteb Asian/Pacific Islander American Indian/Multi-Racial Free/reduced price lunch (%)a Special education (%) ELL Services (%)a Mean age in years (SD) Mean attitudes toward school (SD) Mean math achievement (SD)a Mean ELA achievement (SD)b Teachers Mean years of teaching (SD) Mean teacher mutual support (SD) Mean teacher collective commitment (SD) Mean support from school administration (SD) Mean support from district administration (SD) Treatment (n = 3,999) Control (n = 4,434) Total (N = 8,433) 46.8 53.2 47.9 52.1 47.4 52.6 49.6 22.5 17.2 8.2 2.6 80.3 5.5 20.0 14.70 (.75) 3.18 (.37) -0.15 (.96) 0.04 (.97) 52.0 25.8 11.8 8.0 2.4 85.4 5.6 24.0 14.70 (.78) 3.17 (.35) -0.07 (1.0) -0.09 (.95) 50.9 24.2 14.3 8.1 2.5 83.0 5.6 22.3 14.70 (.76) 3.17 (.36) -0.10 (.98) -0.03 (.96) 4.68 (1.46) 3.31 (.60) 4.81 (1.38) 3.32 (.58) 4.74 (1.42) 3.31 (.59) 3.37 (.58) 3.36 (.57) 3.37 (.58) 3.06 (.62) 2.78 (.68) 3.03 (.61) 2.77 (.68) 3.05 (.62) 2.77 (.67) Note. aBaseline covariate significantly higher for control than treatment group: Hispanic, t(8341) = 2.22, p < .05; Black, t(8405) = 3.64, p < .001; free/reduced lunch, t(8217) = 5.67, p < .001; ELL, t(8415) = 4.06, p < .001; math achievement t(8403) = 3.88, p < .001. bBaseline covariate significantly higher for treatment than control group: White, t(7900) = -7.10, p < .01; ELA achievement, t(8304) = -6.08, p <.001. Teacher impact analyses and variation in implementation analyses. For the teacher analyses, we used an intent-to-treat approach including all target teachers regardless of how many study terms they taught in a participating school, how many study terms they taught a 98 target class, or the extent to which their school successfully implemented ECED. Math and ELA teachers were always analyzed separately, both because the intervention and supports are quite different for math and ELA teachers and because the ELA teachers are not truly comparable in the treatment versus control schools. That is, the analyses of data from ELA teachers are not experimental because we cannot know which teachers in control schools would have taught the ECED Literacy course had their school been selected to participate in the intervention. Further, as described in the section defining ELA student teachers, in order to make the groups as similar as possible, all 9th - and 10th-grade English teachers were included from both experimental and control schools, even though the 9th- and 10th-grade English teachers at treatment schools who were not also teaching ECED Literacy had no exposure to the ECED supports. To test the effects of ECED on teacher outcomes at the end of Year 1, the end of Year 2, and across the two years we estimated a series of 2-level hierarchical linear models and longitudinal growth curve models accounting for the nesting of teachers within schools. Outcomes included teacher responses to questionnaires and engagement, alignment, and rigor (as measured by the EAR Protocol). For the teacher questionnaire analyses, we first considered the second order factor measuring perceived support for instructional innovation and improvement. That analysis was followed by analyses testing each of the separate teacher questionnaire scales (1) teacher collective commitment, (2) teacher mutual support, (3) support from district administration, (4) support from school administration, (5) commitment to change, (6) confidence in change, (7) individual teacher morale, (8) professional development, (9) perceived competence, and (10) the relative autonomy index. The primary predictor of interest was experimental condition (treatment versus control). Control variables included district, baseline score, gender, race/ethnicity, and years of teaching experience. The models testing engagement, 99 alignment, and rigor also controlled the number of times the teacher was observed, because more observations may increase the reliability of the scores. We followed the intent-to-treat analyses with analyses in which the overall variation in implementation variable replaced the treatment condition variable. The intent-to-treat analyses are the most stringent type to answer the impact question. However, there was large variation in implementation, including that two treatment schools stopped participating before the intervention was complete, making it important to understand the association between implementation and student and teacher outcomes. Thus, following the intent-to-treat analyses, a parallel set of analyses was conducted in which the experimental condition was replaced by the overall indicator of implementation. That indicator is described in detail in Chapter IV. We selected the overall value, rather than one of the six that was specific to subject and year because the six were highly inter-correlated (α = .98). The results from all teacher analyses are reported in Chapter V. Student impact analyses and variation in implementation analyses. As with the teacher analyses, for student analyses, we used an intent-to-treat approach in which all Grade Cohort 1 students in all 20 schools were included in the analyses, regardless of how many study terms they were enrolled in a participating school, the extent to which their school successfully implemented ECED, or whether they were enrolled in the courses targeted by ECED. We focused on the Grade Cohort 1 students because they could have been exposed to the intervention for as much as four terms, whereas the other two grade cohorts had a maximum of two terms of exposure. 58 As with the teacher analyses, to test the effects of ECED on student outcomes at the end of Year 1 and the end of Year 2, we estimated a series of 2-level multi-level linear models 100 accounting for the nesting of students within schools. The six student outcomes were: math test scores, ELA test scores, students’ attitudes toward school (second order factor from the student questionnaire), grade point averages, attendance, and credits towards graduation. For the student questionnaire analyses, we first considered the second order factor measuring students’ attitudes toward school. That analysis was followed by analyses testing each of the separate student questionnaire scales (1) positive teacher support/expectations, (2) lack of teacher support, (3) engagement in school, (4) perceived competence, and (5) relative autonomy index. The primary predictor of interest was experimental condition (treatment versus control groups). Control variables included district, baseline score, gender, race/ethnicity, free/reduced price lunch eligibility, special education, and receipt of English Language Learner services. Those analyses were followed by a series of longitudinal growth curve models. Again, these were three-level, accounting for the nesting of students within schools, using the same predictors and control variables. However, these models were tested only for students’ attitudes toward school, followed by the separate student questionnaire scales. A noted above, the test scores are not necessarily comparable across time and are therefore note suitable for this type of analysis. And, we did not gather baseline information on the performance variables (i.e., GPA, attendance, and credits toward graduation) so growth-curve models could not be performed. Parallel to the teacher analyses, following the student intent-to-treat analyses, we conducted analyses to estimate the effects of the intervention accounting for level of implementation by replacing the treatment variable with the overall variation in implementation variable. Additionally, within the treatment schools only, we used the overall indicator of implementation to predict teacher and student outcomes. The results from all student analyses are reported in Chapter VI. 101 Unconditional Models The variance in the outcomes was partitioned into their within-school and betweenschool components by fitting an unconditional model with no predictors (Bryk & Raudenbush, 1992). Intraclass correlation coefficients (ICC), a measure of the ratio of the variance that lies between schools to the total variance, were calculated for each of the outcome variables. The lower the proportion of school-level variance explained in a measure, the more power we have to detect effects. The ICC(2) takes the group sample size into account and is an estimate of the reliability of the group-mean ratings. The ICC(2) can be calculated by applying the SpearmanBrown formula to the ICC(1). 59 An ICC(2) between .70 and .85 is generally considered to indicate acceptable reliability (Ludtke, Trautwein, Kunter, & Baumert, 2006). The lower the reliability of the measure, the less sensitive the measure can be to intervention impacts. Student outcomes. The ICC for students’ attitudes towards school (second order factor from survey) was .01 in both years of the study, indicating that only 1% of the variance lay between schools. The ICCs for the other survey scales ranged from 0.003 to 0.015. See Table 17. These ICCs indicate that almost all of the variation in the survey outcomes lay between students within schools rather than between schools, suggesting that there was little between-school variation in these outcomes to be explained. Even so, a chi-square test of significance revealed that variability at the school level was significantly different from zero for each of these outcomes. For math and ELA achievement, the percent variance that lay between schools ranged from 6% to 9% across the two years of the study, indicating greater but still modest betweenschool variance (see Table 17). The ICCs for grade point average and attendance were somewhat higher, ranging from .07 in Year 1 to .10 in Year 2. The ICC for credits earned was noticeably 102 higher than any other, with 20% of the variance between schools in Year 1 and 52% of the variance between schools in Year 2. The estimated reliability (ICC(2)) with which schools could be distinguished on students’ attitudes towards school was .77 in Year 1 and .83 in Year 2, indicating high reliability. Reliability estimates for the survey outcomes that made up the second order factor were also high for positive teacher support, lack of teacher support, and perceived competence in Year 2, but were below the acceptable range for engagement in both years and for perceived competence in Year 1. Reliabilities consistently improved in the second year of the study. Reliabilities for math and ELA achievement were high. The reliability estimates for math and ELA achievement were .97 for both subjects and in both years of the study, indicating a very high level of reliability. Grade point average, credits earned, and attendance each had very high reliabilities, ranging from .97 to 1 across the two years of the study. Table 17. Intraclass Correlations and Reliability Estimates for Student Outcomes Outcome ICC Year 1 Attitudes towards school Positive teacher support Lack of teacher support Engagement Perceived competence RAI Math achievement ELA achievement Grade point average Credits earned Attendance 0.009 0.011 0.009 0.003 0.004 0.006 0.085 0.070 0.106 0.523 0.102 Reliability Year 2 0.011 0.015 0.012 0.004 0.006 0.008 0.063 0.062 0.077 0.201 0.072 Year 1 0.765 0.796 0.773 0.564 0.610 0.697 0.972 0.965 0.978 0.998 0.977 Year 2 0.830 0.868 0.836 0.624 0.703 0.764 0.966 0.966 0.968 0.989 0.966 103 Teacher outcomes. As with the student outcomes, most of the variation in the teacher survey outcomes lay between teachers within schools rather than between schools. The ICC for perceptions of administrative support for instructional innovation and improvement (second order factor) was .11 in Year 1 and .06 in Year 2, indicating that between 6 and 11 percent of the variance lay between schools. The ICCs for the other survey scales ranged from 0.01 to 0.21. See Table 18. A chi-square test of significance revealed that variability at the school level was significantly different from zero for all outcomes except perceived competence, individual teacher morale, commitment to change in Year 1, and support from district administration and professional development in Year 2, indicating that these scales had so little school-level variance that we are unlikely to find meaningful program impacts. Reliability estimates for the teacher survey outcomes are shown in Table 18. The estimated reliability (ICC(2)) with which schools could be distinguished on teacher support for instructional innovation and improvement was .70 in Year 1 and .63 in Year 2, indicating moderate to high reliability. With a few exceptions (i.e., teacher collective commitment, support from school administration, confidence in change), reliability estimates for the individual survey scales were low to moderate across the two years of the study. 104 Table 18. Intraclass Correlations and Reliability Estimates for Teacher Survey Outcomes Outcome ICC Reliability Wave 2 Wave 4 Wave 2 Wave 4 Perception of administrative support for instructional innovation and improvement Teacher collective commitment Teacher mutual support Support from district administration Support from school administration Commitment to change Confidence in change Individual teacher morale Professional development Perceived competence RAI 0.107 0.095 0.054 0.212 0.109 0.022 0.133 0.022 0.048 0.011 0.050 0.055 0.079 0.080 0.008 0.073 0.021 0.073 0.024 0.044 0.034 0.030 0.703 0.674 0.530 0.842 0.708 0.309 0.751 0.305 0.501 0.183 0.509 0.626 0.709 0.712 0.180 0.693 0.384 0.692 0.408 0.571 0.501 0.469 105 IV. Implementation As with any school-level intervention, schools in the treatment condition varied with regard to how well the ECED components were implemented. This first part of this chapter describes how variation in implementation was measured. The second part focuses on which aspects of the intervention were implemented with more and less success, as well as the differences between treatment and control schools. Measuring Variation in Implementation Variation in ECED implementation in the 20 schools taking part in the ECED Efficacy Trial was quantified using seven values. Broadly speaking, ECED activities can be broken into three categories: English/Language Arts (ELA), math, and EAR protocol. Each category is hypothetically independent of the others (i.e., it would be possible to fully implement ELA, with relatively weak implementation of math), so we elected to create separate scores for each category. Further, each school participated for two years, and implementation may have varied by year. Thus the seven scores calculated for each school were: (1) Year 1 ELA, (2) Year 2 ELA, (3) Year 1 math, (4) Year 2 math, (5) Year 1 EAR Protocol, (6) Year 2 EAR Protocol, (7) Overall. Treatment and control schools were assigned scores using the same data and scoring systems, making the scores in the two conditions directly comparable. The seven values have a theoretical range of 0 to 100. There were four major steps involved in arriving at these scores: (1) creating indicators and operational definitions to define full implementation; (2) gathering data from multiple sources, including key-informant interviews, and linking it to operational definitions, (3) reliably coding the interviews that provided the bulk of the information about implementation, and (4) combining all information to create final scores. These steps are similar to the first four steps advocated by Hulleman, Rimm-Kaufman, and Abry (2013), although our 106 data collection was somewhat less structured. We address their fifth and final step—linking the measure of implementation to outcomes—in the results chapters (Chapter V and VI). Creating indicators and operational definitions. IRRE senior staff worked with the ECED research staff to create a list of the specific activities that would define ‘full implementation’ of ECED in math, ELA, and use of the EAR protocol. As a group, they identified 30 indicators of full implementation: 12 for ELA, 14 for math, and 4 for the EAR Protocol. The research staff then identified ways to measure each of these 30 indicators. Most of the information came from key informant interviews conducted each spring at each participating school. The research staff created a scoring rubric to assign values to the interviews and other sources of data (outlined below) for each of the full implementation indicators. IRRE’s senior staff provided weights for each indicator. The ELA weights ranged from 4 to 11 and summed to 100. IRRE indicated that each math and EAR Protocol indicator are equally important, so those indicators are all weighted equally, again summing to 100. See Appendix 12 for a list of indicators, data sources, and weights. Gathering data. Most of the information used to judge the extent to which the indicators were met came from key informant, semi-structured, open ended interviews conducted in the spring of each year. The interview protocols were written by the research team and the interviews were conducted by the research project director and a research associate. Information for the final math scores came primarily from interviews with the math coaches. Math department chairs were interviewed when there was no coach. Some information for the math scores also came from teacher questionnaires (e.g., their participation in ECED professional development). Information for the ELA scores came primarily from interviews with the Literacy/ELA coaches. ELA department chairs were interviewed when there was no coach. 107 Additionally, some information for ELA scores came from student records (e.g., proportion for students enrolled in ECED Literacy) and from teacher questionnaires (e.g., how many ECED lessons each teacher covered). Information for the EAR Protocol scores came from interviews with math coaches (or math chairs), ELA coaches (or ELA chairs), and the school principal or assistant principal. Additional information came from IRRE’s EAR Protocol database, which includes information about how many EAR visits were conducted and uploaded. The same sources of information and coding systems were used for both treatment and control schools making the scores directly comparable. For some indicators of full implementation, however, the control schools were automatically set to zero because the indicator represented an IRRE support that was not offered at the control schools (e.g., number of teachers who participated in IRRE’s summer training for ECED). Interview coding. Two individuals worked independently to reduce each interview into a series of very brief (i.e., yes/no) responses that directly addressed the full implementation activities. They compared their responses regularly, ensuring over 90% agreement. Next, one of those two individuals transformed the brief responses into numeric scores using the scoring rubric created by the research team. As a check on this scoring, the research project director also completed two rounds of scoring, each time scoring 10% of the responses. In the first round, the project director’s codes agreed with the numeric scores 81% of the time. After some discussion of coding rules and inconsistencies, the research project director coded 10% more responses. The codes matched 92% of the time in the second round. Combining all information into final scores. Once each indicator had been scored using all available information, the scores were combined using the weights devised by the IRRE senior staff. Table 19 shows descriptive statistics for the final scores on the six indicators, 108 separately for treatment and control schools. As would be expected, the values in the treatment condition are much higher than in the control, but there is variation within both the treatment and control conditions. Also, as would be expected, the eight treatment schools that remained in the study both years had much higher implementation (mean = 71.64, SD = 5.47) than the two treatment schools that stopped participation part way through the project (mean = 39.42, SD = 0.95). As seen in Table 20, the correlations among the six variables are quite high. Thus, an overall score—which is the mean of the six items—was calculated for each school. Cronbach’s alpha for these six items together is .98. Table 19. Descriptive Statistics for Variation in Implementation Scores Treatment (n = 10) Mean Y1 ELA Y1 Math Y1 EAR Y2 ELA Y2 Math Y2 EAR Overall *** p <.001 74.32 80.29 51.59 62.79 66.15 56.06 65.20 Control (n = 10) SD Range Mean SD 11.87 8.39 11.78 24.24 26.84 27.10 14.42 54.33 to 88.38 61.43 to 90.29 37.50 to 71.67 20.16 61 to 90.98 4.14 to 84.07 0 to 90.28 38.74 to 78.34 14.46 18.37 0 60 16.17 16.54 0 10.92 3.53 5.50 0 4.13 6.78 0 2.15 Range 9.32 to 20.56 9.50 to 25.50 10.56 to 23.28 6.50 to 28.36 8.72 to 14.52 t 15.29*** 19.51*** 13.85*** 6.00*** 5.67*** 6.54*** 11.77*** 109 Table 20. Correlations Among Variation in Implementation Scores ELA Y1 ELA Y1 Math Y1 EAR Y1 ELA Y2 Math Y2 EAR Y2 Overall 1.00 .938*** .928*** .857*** .794*** .823*** .938*** Math Y1 1.00 .950** .788*** .787*** .812*** .927*** EAR Y1 1.00 .863*** .850*** .913*** .967*** ELA Y2 1.00 .958*** .956*** .953*** Math Y2 1.00 .959*** .941*** EAR Y2 1.00 .961*** ***p < .001 Implementation Strengths and Weaknesses In general, the eight treatment high schools that participated in ECED for two years implemented ECED’s math components fairly successfully. All eight schools organized instruction around the “I Can…” statements and implemented the benchmarking and capstone assessment system in Algebra 1 and Geometry. Most used the mastery grading system and all had some system in place to help struggling students. The deployment of the math coach was less successful, with only about one-third of schools reporting that they had a math coach who actually spent the recommended one-half FTE on coaching. There was wide variation in the amount of support coaches reported receiving from IRRE. Additionally, few schools held the weekly meetings of the math teachers focused on instruction that are required for full implementation. Likewise, the eight treatment schools that participated in ECED for two years implemented the ECED Literacy components fairly successfully. All offered the ECED Literacy course both years; however two schools only had time in student schedules to offer the course for half of the recommended time. On average, treatment schools enrolled 88% of 9th- and 10thgraders in ECED Literacy in the first year and 84% in the second year. Each year, teachers in 110 about half the schools covered the full ECED Literacy curriculum. ELA coaching was stronger than math coaching, with about two-thirds of schools reporting that they had an ELA coach who devoted the recommended one-half FTE to coaching. As with math, there was wide variation in the amount of support coaches reported receiving from IRRE and few schools held the required weekly meetings of the ECED Literacy teachers focused on instruction. Use of the EAR Protocols by school leaders to improve instruction was the weakest aspect of the ECED implementation. The number of visits completed and uploaded to the server varied widely by school. Full implementation, as defined by IRRE, would require a total of 700 EAR Protocol visits per year conducted by school or district leaders or IRRE consultants. In the first year of implementation the average number of visits was 141 (SD = 144, range = 7 to 381) across the ten schools. In the second year of implementation, the average number of visits was 233 (SD = 262, range = 0 to 691) across the eight schools who continued to participate. The two schools that no longer participated did not use the EAR Protocol in the second year. There was virtually no spill-over or contamination of ECED into the control schools. The interviews revealed that the chairs, coaches, and school administrators at the control schools had almost no awareness of the supports being received at the treatment schools, and none had made any attempt to replicate ECED in their school. Of course, some components of ECED existed in the control schools anyway. For instance, some control schools had math or ELA instructional coaches or held regular meetings of math or ELA teachers, focused on instruction. Thus, there is variation in Implementation Scores among control schools. 111 V. Results for Teachers’ Attitudes, Experience, and Observed Practice This chapter presents the findings for teacher outcomes including teachers’ self-reports of attitudes and experiences from the teacher questionnaires and teacher practice as observed using the Engagement, Alignment, and Rigor Classroom Visit Protocol. We start by presenting overall data analytic strategy. Next, we present findings for math teachers’ self-reports of attitudes and experiences, followed by observed EAR in math classes. After the math teacher findings, we report the findings for ELA teachers. Math and ELA teacher findings are reported separately, because the interventions for the two groups were quite different and because the results for math teacher are experimental; whereas those for the ELA teachers are not. Data Analytic Strategy Point-in-time. For each outcome, we estimated a series of 2-level Hierarchical Linear Models (HLM 6.02; Raudenbush & Bryk, 2002) with fixed effects to consider the impact of the ECED treatment on teacher questionnaire responses and teacher practices at the end of Year 1 (Wave 2) and the end of Year 2 (Wave 4). (Note that a matrix presenting the correlations among all teacher outcomes appears in Appendix 13). It is important to note that only 5 of the 11 outcomes from the teacher questionnaire were reliable enough at the school level to have a strong possibility of detecting effects (see ICCs in Method Chapter). We tested impacts on all 11 outcomes with the knowledge that low school-level reliability would make finding impacts on those outcomes more difficult. The models accounted for nesting of teachers within schools. The Year 1 analyses included all teachers who taught a target course (i.e., Algebra 1 or Geometry for math, ECED Literacy or 9th-/10th-grade English for ELA) during the first year of the study (Wave 1 and/or 2). The Year 2 analyses included all teachers in the study, that is all teachers who taught a target course in either math or English/ELA during either the first or second year (i.e., at 112 any point during the four waves. Imputed data were used for the analyses testing impacts on teacher questionnaire responses. However, imputation has not yet been done on the teacherobservation data so non-imputed data were used for the analyses testing impacts on teacher practices so there are some missing data in these analyses. 62 In this case, the Year 1 analyses included all teachers who were observed in Wave 2 and the Year 2 analyses included all teachers who were observed in Wave 4. For each outcome, a series of six separate models was estimated. Table 21 lists variables included in each model. The first two models included condition (treatment versus control) (Model 1) and condition plus four dummy codes accounting for the five school districts (Model 2) at Level 2 (school). The next set of models added covariates at Level 1 (teacher). The third model added the teachers’ baseline (Wave 1) responses to the dependent variable, if that variable was collected at baseline. The fourth model added teacher baseline demographic covariates: gender, race/ethnic background, and years of teaching. The fifth model added a variable indicating the number of semesters the teacher taught a target class and, for EAR Protocols, a control variable for number of times the teacher was observed. Finally, the last model tested for moderation effects of number of semesters teaching a target class by including cross-level interactions between this covariate and treatment condition. The number of semesters teaching a target class and number of times the teacher was observed are endogenous to treatment and could have been affected by the treatment. For that reason, in this report we typically present Model 4, which contains only variables that are exogenous to treatment and note the findings for the fifth and sixth models. 113 Table 21. Variables Included in Teacher Models Model Condition Four dummy codes for district Baseline (W1) response, if available Teacher demographics (gender, race/ethnicity, years of teaching) Semesters teacher taught target class (plus number of times teacher was observed for EAR analyses) Treatment X semesters Level 1 2 3 4 5 6 2 (school) 2 (school) 1 (teacher) X X X X X X X X X X X X X X X X X X X X 1 (teacher) 1 (teacher) cross-level X Maximum likelihood parameter estimates with robust standard errors were used to estimate the parameters. All covariates were grand-mean centered, following guidelines by Enders and Tofighi (2007) for cluster randomized studies where a Level 2 treatment effect is of interest. In interpreting the results we consider an alpha level of p < .05 as statistically significant, but given the nature of the design (resulting in only 14 degrees of freedom and therefore relatively low power to estimate the intervention effect), we note effects up to the .10 level, particularly in the case of interactions (McClelland & Judd, 1993). Effect sizes were calculated by dividing the estimate of the intervention effect by the raw standard deviation of the dependent variable for the control group (a variant of Cohen’s d, attributed as Glass’s ∆; Cohen, 1992). Growth curve models. Following the point-in-time analyses, growth curve analyses were conducted in order to better understand the pattern of change in intervention impacts over time. Estimates of intervention impacts on change in the teacher outcomes for which we had four waves of data were calculated using a series of three-level hierarchical linear growth models in HLM. In these models, Level 1 represents time (i.e., the repeated assessments of the outcomes of 114 interest for each teacher), Level 2 represents the teacher, and Level 3 represents schools. The same teacher-level covariates as included in the point-in-time models were included at Level 2. Level 3 included an intervention dummy and four district dummies representing the five districts in the study. A series of unconditional models were first estimated and compared to determine the most appropriate functional form for each of the outcomes. These were an intercept only, intercept-slope, and intercept-slope-quadratic model. Models were compared using the likelihood ratio test (Raudenbush & Bryk, 2002) based on change in the deviance estimate between models generated in HLM and the number of parameters in the model. 63 The best-fitting model was used to test program impacts. Variation in implementation. In order to test the degree to which variation in implementation affected intervention impacts on the main outcomes of interest, follow-up analyses were conducted for math teachers in which the treatment/control dummy variable was replaced with the overall variation in implementation indicator variable (see Chapter IV for a description of this variable). Because variation in implementation was not randomly assigned, these analyses are non-experimental and are meant to complement the experimental results. In the interest of parsimony, we limited our analyses here to math teachers—our experimental group—and to main effects without interactions (Models 1 through 5). Results for Math Teachers Point-in-time analyses predicting math teacher attitudes and experiences. The outcome for the first set of models was the math teachers’ perceptions of administrative support for instructional innovation and improvement, which was the second order factor described in the method chapter (Chapter III). That factor measures the extent to which teachers believe that school and district administrators are responsive to their needs and that efforts are being made to 115 improve teaching and learning. Not all items from the second order factor appeared on the Wave 1 questionnaire, so the baseline response added here was only the mean of support from the school administrators. Of the factors measured at Wave 1, support from school administration was mostly highly correlated with both Wave 2 and Wave 4 administrative support for instructional innovation and improvement. Findings from these models largely indicated that ECED did not affect teachers’ perceptions of administrative support for instructional innovation and improvement. Table 22 presents the findings from the fourth model, the one that controls district, baseline, and teacher demographics. The patterns of significance found in Model 4 were the same as found in each of the other models. This model indicates that baseline response was a significant and fairly large predictor of response at the end of the first year and the end of the intervention. However, there was no evidence that the treatment had an effect on teachers’ perceptions of support from school administration and beliefs that efforts were being made to improve teaching and learning, either at the end of the first year or the end of the intervention. 116 Table 22. Predicting math teachers’ perceptions of administrative support for instructional innovation and improvement (second order factor) Year 1 n teachers = 178 j schools = 20 Estimate Treatment (0 = control) District 2 District 3 District 4 District 5 Baseline Gender (0 = male) Race/ethnicity Years of teaching 0.04 -0.18 -0.32 -0.06 -0.14 0.50 0.17 0.07 -0.03 Year 2 n teachers = 239 j schools = 20 SE p 0.14 0.19 0.30 0.22 0.20 0.09 0.14 0.05 0.03 0.799 0.366 0.291 0.778 0.508 0.000 0.238 0.181 0.322 Estimate SE p -0.17 -0.06 -0.09 -0.06 -0.07 0.58 0.16 0.05 0.04 0.10 0.15 0.17 0.19 0.13 0.09 0.10 0.06 0.04 0.118 0.680 0.622 0.758 0.598 0.000 0.129 0.415 0.315 The fifth model added the number of terms that the teacher taught one or more target classes during the intervention (range = 1 to 4). This control was added to account for the fact that some teachers were only teaching a target course for a short time (e.g., 1 term), and therefore if they were in a treatment school their exposure to the ECED was less than if they had taught a target course all four terms. However, as noted earlier, this variable is not exogenous to treatment and could, theoretically, be affected by the treatment itself. Findings from these models paralleled those presented above and the added control was non-significant. The sixth model added the interaction between the number of semesters teaching a target course and treatment condition. This cross-level interaction was significant and negative for this second order factor of administrative support for instructional innovation and improvement in Year 1 (β = -.68, SE = .28, p = .016). Teachers in ECED schools who taught a target course in both semesters reported less administrative support for instructional innovation and improvement 117 compared to teachers who only taught a target course one semester, while the opposite was true in control schools. This interaction was not significant at the end of Year 2. Comparable models were then estimated for each of the individual scales on the teacher questionnaire: (1) teacher collective commitment, (2) teacher mutual support, (3) support from district administration, (4) support from school administration, (5) commitment to change, (6) confidence in change, (7) individual teacher morale, (8) professional development, (9) perceived competence, and (10) relative autonomy index. The same set of six models was estimated for the first four scales. Because Wave 1 did not include the last six scales we could not include baseline in these analyses, meaning only five models were estimated for each. Again, the findings generally indicated that ECED had no impact on teachers’ attitudes and experiences. Baseline scores, when available, were always significant predictors of the outcome scores. The only outcome for which the treatment was significant was teacher mutual support in Year 2. In the first five models, teachers in the control condition reported a greater sense of support from colleagues than teachers in the treatment condition at the end of Year 2 (final model: β = -.20, SE = .09, p = .033; effect size = .36). The final model for each of the individual subscales added the interaction between the number of semesters teaching a target course and treatment condition. A significant cross-level interaction was found for teacher collective commitment, support from school administration, confidence in change, individual teacher morale, and RAI in Year 1, and commitment to change in Year 2. Each of these cross-level interactions followed a similar trend; the more semesters teachers taught target courses in intervention schools, the more negative their responses to each of the outcomes were, while the reverse was true for teachers in control schools. It seems that trying to facilitate change among teachers in difficult situations leaves them with somewhat less 118 positive attitudes than comparable control-group teachers in similar situations where the change is not being facilitated. Growth curve analyses predicting math teacher attitudes and experiences. Growth curve analyses were subsequently conducted on the four questionnaire outcomes for which we had four waves of data (i.e., teacher collective commitment, teacher mutual support, support from district administration, support from school administration) in order to better understand the pattern of change over time. Estimating impacts on growth in the second order factor was not possible because this factor consisted of sub-factors for which we had only two waves of data. Before testing the effects of the ECED intervention on growth in our four questionnaire outcomes, three unconditional models without any covariates were compared for each outcome to determine the appropriate functional form of the growth curve. For each outcome, the three models compared were an intercept only model, an intercept-slope only model, and an interceptslope-quadratic model. Once the appropriate form of the growth curve was determined, that model was then estimated. Findings indicated that the intercept-slope model was a significantly better fit than the intercept-only model for two of the four outcomes. For teacher collective commitment and support from district administration the slope parameters were significant, indicating that these outcomes showed meaningful growth across the four time points. The intercept-slope-quadratic model was not a significantly better fit than the intercept-slope only model for any of the outcomes, indicating linear rather than curvilinear change across time. For teacher mutual support and support from school administration the intercept-slope model was not a significantly better fit than the intercept-only model, indicating that these outcomes did not show meaningful growth or change across the four time points, on average. However, the lack of average growth 119 could potentially mask divergent growth or change across the treatment and control groups. Therefore, despite the lack of average growth in these two outcomes, intervention impacts on linear growth in all four outcomes were estimated. The findings generally paralleled the point-in-time findings indicating that ECED only impacted one of the four outcomes. No significant intervention-control differences were found in the slopes of teacher collective commitment, support from district administration, and support from school administration. Despite the non-significant slope parameter of teacher mutual support, there was a significant negative intervention-control difference in this slope parameter (β = -.67, SE = .03, p = .049). As shown in Figure 1, the ECED group showed a decline in mutual support over the two years of the study, relative to the control group. This finding is in keeping with the negative impact found in the point-in-time models for this outcome. 3.38 Control Teacher Mutual Support Intervention 3.33 3.27 3.22 3.16 1 2 3 Time (Fall of Year 1 -- Spring of Year 2) Figure 1. Impact of intervention on Teacher Mutual Support. 120 Associations of variation in implementation with math teacher attitudes and experiences. When the overall variation in implementation variable was used in place of the treatment/control dummy variable in the point-in-time analyses of teacher questionnaire responses, results suggested that variation in implementation was not associated with teachers’ attitudes and experiences. Table 23 presents results from the models predicting perceived administrative support for instructional innovation and improvement at the end of Year 1 and the end of Year 2. The degree to which ECED was implemented at the school-level was not associated with teachers’ perceptions of support from school administration and beliefs that efforts are being made to improve teaching and learning, either at the end of the first year or the end of the intervention. In the fifth model, there was no association between the indicator for number of terms the teacher taught a target class and perceived administrative support for instructional innovation and improvement. Findings from comparable models estimating associations between variation in implementation and each of the individual scales on the teacher questionnaire (i.e., teacher engagement, collective engagement, support from district administration, support from school administration, commitment to change, confidence in change, individual engagement, professional development, perceived competence, relative autonomy index) also indicated no effect of ECED treatment on these individual scales. 121 Table 23. Associations between variation in implementation and math teachers’ perceptions of administrative support for instructional innovation and improvement (second order factor) Year 1 n teachers = 178 j schools = 20 Estimate Overall implementation District 2 District 3 District 4 District 5 Baseline Gender (0=male) Race/ethnicity Years of teaching 0.00 -0.19 -0.34 -0.09 -0.13 0.50 0.17 0.07 -0.03 Year 2 n teachers = 239 j schools = 20 SE p 0.00 0.21 0.23 0.21 0.25 0.11 0.13 0.05 0.04 0.479 0.387 0.155 0.689 0.615 0.000 0.199 0.164 0.438 Estimate SE p 0.00 -0.04 -0.07 -0.03 -0.07 0.58 0.15 0.05 0.04 0.00 0.17 0.18 0.18 0.18 0.09 0.11 0.06 0.04 0.307 0.825 0.700 0.877 0.715 0.000 0.169 0.392 0.280 Point-in-time analyses predicting observed Engagement, Alignment, and Rigor in math classes. In order to analyze the classroom observation data, for each teacher at each wave, three scores were created (one each for E, A, and R). This was accomplished by first applying the scoring methods outlined in Early et al. (2013) to calculate continuous E, A, and R scores for each observation and then calculating each teacher’s average E, A, and R score across all observations during that wave. As noted earlier, this section reports preliminary findings using the unimputed data. Final analyses will be conducted using the imputed data when they become available. Table 24 presents findings from the fourth model. At the end of Year 1, instruction in math classes in treatment schools was rated as more aligned and more rigorous than the instruction in the control schools, controlling for district, baseline, and teacher demographic variables. There was no difference in observed engagement. However, at the end of Year 2, observed engagement was significantly lower in treatment schools as compared to control schools, and rigor was significantly higher. There was no significant difference in alignment. Baseline scores were significant predictors 122 of end of Year 1 and 2 engagement and alignment but not rigor scores (although at the end of Year 2, baseline rigor approached significance). The other controls were largely non-significant. In the sixth model (not tabled), no significant cross-level interactions between treatment and number of semesters teaching a target class were found. The first three models (not tabled) were not always consistent with the later model. Specifically, in Year 1 the positive treatment effect on alignment reached significance only when baseline was added to the model (Model 3). In Year 2 the negative treatment impact on observed engagement reached significance only when baseline and demographic covariates were added to the model. In addition, there was a significant positive treatment effect on alignment before the demographic covariates were added to the model (Models 1-3). 123 Table 24. Predicting Observed E, A, and R for Math teachers Engagement Year 1 n teachers = 96 j schools = 19 Treatment (0 = control) District 2 District 3 District 4 District 5 Baseline Gender Race Yrs. teaching Est. SE 0.02 -0.03 -0.09 0.00 -0.02 0.28 0.04 -0.02 0.01 0.03 0.05 0.06 0.06 0.07 0.08 0.03 0.01 0.01 p Alignment Year 2 n teachers = 77 j schools = 19 Est. 0.56 -0.09 0.60 0.12 0.16 0.15 0.94 0.21 0.81 0.17 0.00 0.52 0.19 0.04 0.14 0.01 0.42 0.00 Year 1 n teachers = 96 j schools = 19 SE p Est. SE p 0.03 0.05 0.06 0.06 0.07 0.10 0.03 0.02 0.01 0.03 0.04 0.03 0.00 0.04 0.00 0.28 0.74 0.73 0.09 0.03 -0.14 -0.07 -0.01 0.20 -0.03 -0.03 0.00 0.03 0.06 0.07 0.06 0.07 0.09 0.03 0.02 0.01 0.02 0.59 0.06 0.28 0.85 0.03 0.34 0.04 0.84 Rigor Year 2 n teachers = 77 j schools = 19 Est. SE p 0.07 0.04 0.18 0.13 0.09 0.24 0.05 0.01 0.00 0.04 0.07 0.08 0.08 0.09 0.12 0.04 0.02 0.02 0.13 0.57 0.04 0.11 0.38 0.05 0.27 0.80 0.95 Year 1 n teachers = 96 j schools = 19 Year 2 n teachers = 77 j schools = 19 Est. SE p Est. SE p 0.23 0.30 -0.50 -0.16 -0.23 0.18 0.09 -0.06 0.02 0.07 0.14 0.14 0.17 0.15 0.11 0.07 0.05 0.03 0.01 0.06 0.00 0.36 0.14 0.11 0.20 0.25 0.51 0.25 -0.04 0.16 0.33 0.22 0.21 0.08 -0.01 0.01 0.10 0.15 0.18 0.19 0.21 0.12 0.10 0.05 0.04 0.03 0.77 0.40 0.10 0.33 0.09 0.41 0.82 0.88 124 Growth curve analyses predicting EAR for math teachers. Growth curves were modeled for each of the three EAR outcomes in order to better understand the pattern of change over time. Unconditional models suggested that an intercept-slope model was not a significantly better fit then the intercept-only model for any of the three outcomes, nor were the average slopes significant. As was done with the teacher questionnaire, intervention impacts on linear growth were estimated for each of the three outcomes despite the finding that these outcomes did not show meaningful change across the four time points, on average. Consistent with the point-in-time analyses, there was a negative effect of ECED on the slope of observed engagement over the course of the study (β = -.03, SE = .01, p = .037). As shown in Figure 2, observed engagement declined among teachers in the treatment group and increased among teachers in the control group. There was also a trend-level positive effect on ECED on the slope of rigor (β = .06, SE = .03, p = .097) such that teachers in the control group showed a decline in observed rigor over two years, relative to the treatment group. See Figure 3. No significant effect of the intervention was found on change in alignment. 125 0.71 Control Observed Engagement Intervention 0.70 0.69 0.67 0.66 0 1 2 3 Time (Fall Wav e 1 -- Spring Wav e 4) Figure 2. Impact of Intervention on change in Observed Engagement. 0.288 Control Intervention Rigor 0.216 0.145 0.074 0.003 0 1 2 Time (Fall Year 1 -- Spring Year 2) Figure 3. Impact of Intervention on change in Rigor. 3 126 Results for ELA Teachers Point-in-time analyses predicting ELA teacher attitudes and experiences. The same point-in-time models were estimated for the ELA teachers as were estimated for math teachers. As explained in the Method section (see p. 51), these comparisons are non-experimental because it was not possible to know which teachers (including additional ones hired to do it) would have taught ECED Literacy in the control schools had that course been offered. Table 25 presents the point-in-time model testing impacts of ECED on perceived administrative support for instructional innovation and improvement (fourth model controlling district, baseline, and teacher demographics), so there is not a set of teachers in the control group that is comparable to the one in the experimental group. Again, teachers who taught a target ELA class (i.e., ECED Literacy and/or 9th- 10th-grade English) during the first year are included in the Year 1 analyses. The Year 2 analyses include all ELA study teachers, that is all teachers who taught a target ELA course at any point during the four waves. As with math, there were no differences on the administrative support variable between the teachers in the treatment and control conditions either at the end of Year 1 or the end of the intervention. Baseline score was the only significant predictor. Additionally, terms teaching a target course and the interaction between terms and condition (not tabled) were non-significant. When comparable models were estimated for each of the individual scales, there were no significant between-group differences. 127 Table 25. Predicting support for instructional innovation and improvement (Second Order Factor) for ELA teachers Year 1 n teachers = 218 j schools = 20 Estimate Treatment (0 = control) District 2 District 3 District 4 District 5 Baseline Gender (0 = male) Race/ethnicity Years of teaching 0.02 -0.16 -0.20 0.20 -0.26 0.44 0.07 0.04 -0.02 Year 2 n teachers = 298 j schools = 20 SE p 0.10 0.13 0.16 0.18 0.13 0.10 0.1 0.06 0.04 0.845 0.232 0.214 0.278 0.071 0.000 0.517 0.491 0.567 Estimate -0.10 -0.02 -0.09 -0.07 -0.20 0.41 -0.01 0.00 -0.01 SE p 0.10 0.18 0.18 0.15 0.18 0.09 0.09 0.05 0.03 0.327 0.917 0.631 0.648 0.272 0.000 0.947 0.988 0.812 Growth curve analyses predicting ELA teacher attitudes and experiences. The same set of longitudinal analyses were conducted for ELA teachers as were presented above for math teachers, using the four questionnaire outcomes for which we had four waves of data. No impacts of the intervention were found on growth in the outcomes over two years. 64 128 VI. Impacts on Students Data Analytic Strategy Point-in-time impacts. As with teacher models, for each outcome, we estimated a series of 2-level Hierarchical Linear Models (HLM 6.02; Raudenbush & Bryk, 2002) with fixed effects to consider the impact of the ECED treatment on the outcomes at the end of Year 1 (Wave 2) and the end of Year 2 (Wave 4) using the imputed data. (Note that there is a matrix showing correlations among all student outcome variables in Appendix 14. Further, there is matrix that shows correlations among all outcomes—both student and teacher—aggregated to the school level, in Appendix 15). Most of the student outcomes had high school-level reliability (see ICCs in Method Chapter), indicating a strong possibility to detect school-level effects. The only exceptions were from the student questionnaire: student engagement in both years and perceived competence in Year 1 had low school-level reliability. Thus, school-level effects would be difficult to find because these reliabilities were below the cut-off considered acceptable; however, they were not so low that detecting effects would be impossible. The models accounted for nesting of students within schools. The Year 1 analyses included all students who were in the 9th-grade and were enrolled in a target school during the first year of the study (Wave 1 and/or 2). The Year 2 analyses include all students who were enrolled in target schools at any point in the study and were in 9th-grade in the first year and/or 10th-grade in the second year. A series of six separate models were estimated for most outcomes. The first two models included condition (treatment versus control) (Model 1) and condition plus four dummy codes accounting for the five school districts (Model 2) at Level 2 (school). The next set of models added covariates at Level 1. The third model added the students’ baseline (Wave 1) of the dependent variable when available. Baseline was available for the survey outcomes, math and 129 ELA achievement, but not for GPA, credits earned, or attendance. The fourth model added student baseline demographic covariates: gender, race/ethnic background, special education, free/reduced price lunch, receipt of English language learner (ELL) services. The fifth model added a variable indicating the number of semesters the student was enrolled in a study school to account for variation in students’ potential exposure to the intervention. In the case of math achievement, an additional covariate was added to the fifth model indicating the type of math test taken (e.g., Algebra 1, Geometry, etc.) to account for the fact that different districts administered different tests and test level often depended on students’ course schedules. It is important to note that the variables added in Model 5 are not exogenous, because they were measured during the intervention and could have been affected by the intervention. Finally, the last model tested for moderation effects of gender, race/ethnicity, baseline when available, and number of semesters in study school by including cross-level interactions between these covariates and treatment condition. See Table 26 for a list of variables included in each model. Throughout this chapter, we will present tables of the findings from Model 4, because it includes the demographic variables but does not include the potentially endogenous variables or the interactions. In each table, the district variables compare each district to District 1. The race/ethnicity variables compare each group to White students. Free/reduced price lunch is coded so that 0 means that the student did not receive that service either year, and 1 means he or she received it one or both years. Special education and English language learner services reflect the students’ status at Year 1 (baseline) because those variables could theoretically be affected by the intervention so we did not want to include the Year 2 measure as a control. 130 Table 26. Variables Included in Student Models Model Condition Four dummy codes for district Baseline, if available Student demographics (gender, race/ethnicity, special education, free/reduced price lunch, ELL services) Semesters enrolled in target school (+ type of math test for math achievement only) Moderation (gender, race/ ethnicity, baseline, semesters X condition) Level 1 2 3 4 5 6 2 (school) 2 (school) 1 (student) X X X X X X X X X X X X X X X X X X X X 1 (student) 1 (student) cross-level X As with the teacher models, maximum likelihood parameter estimates with robust standard errors were used to estimate the parameters. All covariates were grand-mean centered, following guidelines by Enders and Tofighi (2007) for cluster randomized studies where a Level 2 treatment effect is of interest. In interpreting the results we consider an alpha level of p < .05 as statistically significant, but given the nature of the design (resulting in only 14 degrees of freedom to estimate the intervention effect and therefore relatively low power to estimate the intervention effect), we note effects up to the .10 level, particularly in the case of interactions (McClelland & Judd, 1993). Effect sizes were calculated by dividing the estimate of the intervention effect by the raw standard deviation of the dependent variable for the control group (a variant of Cohen’s d, attributed as Glass’s ∆; Cohen, 1992). Although cross-level interactions between treatment status and each of the baseline covariates (i.e., gender, race, baseline value) were estimated for each outcome in the sixth model, no consistent pattern emerged. Given the lack of power at the school level to estimate these 131 cross-level interactions and the lack of a consistent pattern, this report does not present the few significant but idiosyncratic interactions that did emerge with baseline variables (See Appendix 16 for a table of all significant interactions). There was one cross-level interaction that did emerge repeatedly, namely the interaction between treatment and number of semesters enrolled in a study school. This is not a baseline characteristic and is therefore non-experimental. This report does, however, present this one cross-level interaction when it was significant because it seemed to represent an interpretable pattern. Growth curve models. Growth curve analyses were subsequently conducted on the survey outcomes in order to better understand the pattern of change in intervention impacts over time. As noted in the Method Chapter, the measure of student achievement changed across waves, so we are not able to estimate growth curve models for student achievement. And, growth curve analyses were not conducted for the measures of performance because we did not have baseline data for those variables and therefore only had two time points. Estimates of intervention impacts on change in the student survey scales from baseline (Wave 1) to the end of the study (Wave 4) were calculated using a series of three-level hierarchical linear growth models in HLM. In these models, Level 1 represents time (i.e., the repeated assessments of the outcomes of interest for each student), Level 2 represents the student, and Level 3 represents schools. The same student-level covariates as included in the point-in-time models were included at Level 2. Level 3 included an intervention dummy and four district dummies representing the five districts in the study. In addition, we examined cross-level intervention by baseline covariate interactions for the appropriate growth parameters. A series of unconditional models was first estimated and compared to determine the most appropriate functional form for each of the outcomes. These were an intercept only, intercept-slope, and intercept-slope-quadratic model. 132 Models were compared using the likelihood ratio test (Raudenbush & Bryk, 2002) based on change in the deviance estimate between models generated in HLM and the number of parameters in the model. 65 The best-fitting model was used to test program impacts. As with the point-in-time analyses, cross-level interactions between treatment status and each of the baseline covariates (i.e., gender, race, free or reduced price lunch, special education, ELL, baseline value) were estimated for each outcome in the sixth model, no consistent pattern emerged. For that reason we are not presenting those interactions (see Appendix 16 for a table of all significant interactions). We do, however, present cross-level interactions between treatment and terms enrolled in the study school. Variation in implementation analyses. In order to test the degree to which variation in two years of ECED implementation affected intervention impacts on the main outcomes of interest at the end of the study, follow-up analyses were conducted in which the intervention/control dummy variable was replaced with the overall variation in implementation indicator variable (see Chapter IV for a detailed description of this variable). Because variation in implementation was not randomly assigned, these analyses are non-experimental and are meant to complement the experimental results. The analysis strategy was identical to the point-in-time student impact analyses conducted using the intervention/control dummy variable. The same series of 2-Level Hierarchical Linear Models with fixed effects were estimated. In these analyses, the models tested the degree to which greater implementation of the ECED treatment was associated with more positive student questionnaire responses, student achievement, and school performance at the end of Year 2. 133 A second set of non-experimental follow-up analyses was conducted to test the degree to which variation in implementation was related to student outcomes within ECED intervention schools. A similar series of 2-Level fixed effects point-in-time models was estimated (Models 1 through 5) using the variation in implementation indicator at Level 2 in place of the treatment indicator, but the analyses were conducted in the ten treatment schools only. Point-in-Time Analyses Predicting Students’ Attitudes Toward School The outcome for the first set of models was the students’ attitudes toward school, which was the second order factor described in the Method chapter (Chapter III). That factor combines the extent to which students see their teachers as being supportive, the students report being engaged in school, and their perceived competence. Table 27 presents the findings from the fourth model that controls district, baseline, and teacher demographics. The patterns of significance were the same in each of the earlier models (Models 1 through 3). There was no evidence that the treatment had an effect on students’ attitudes toward school. However, the baseline response was a significant and fairly large predictor of responses at the end of the first year and the end of the intervention. Girls reported more positive attitudes toward school at both time points. Relative to White students, Black and Asian/Pacific Islander students reported more positive attitudes toward school at the end of the second year. Free/reduced lunch, special education, and ELL status were not significant predictors. In Model 5 (not tabled), the indicator for number of semesters enrolled in a study school, was not a significant predictor either. 134 Table 27. Predicting Students’ Attitudes Toward School (Second Order Factor) Year 1 n students = 7,354 j schools = 20 Estimate Treatment (0 = control) District 2 District 3 District 4 District 5 Baseline Gender (0 = male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL 0.011 -0.056 -0.023 -0.053 -0.055 0.626 0.040 -0.006 0.008 0.046 -0.049 -0.011 0.026 0.010 SE 0.010 0.020 0.015 0.013 0.015 0.016 0.008 0.010 0.010 0.016 0.031 0.009 0.019 0.013 p 0.296 0.015 0.149 0.001 0.003 0.000 0.000 0.554 0.393 0.005 0.113 0.200 0.181 0.439 Year 2 n students = 8,433 j schools = 20 Estimate -0.013 -0.055 -0.023 -0.045 -0.049 0.567 0.046 0.009 0.031 0.075 -0.006 -0.007 0.029 0.021 SE p 0.017 0.035 0.030 0.027 0.030 0.015 0.006 0.011 0.011 0.019 0.023 0.010 0.018 0.015 0.464 0.145 0.472 0.118 0.120 0.000 0.000 0.421 0.007 0.00 0.789 0.458 0.102 0.176 In the sixth model that added interactions, the number of semesters enrolled in a study school moderated the effect of treatment on students’ attitudes toward school (β = -.02, SE = .01, p = .066) such that there was a negative effect of treatment on attitudes toward school for students who were enrolled more terms in treatment schools. Students who were enrolled the most terms in control schools had the most positive attitudes toward school. See Figure 4. It is important to note that this is not a causal moderation because the number of semesters enrolled in a study school is not exogenous to treatment (i.e., student mobility might be affected by the intervention). Students' attitudes towards school 135 3.20 3.19 3.18 3.17 3.16 3.16 3.15 3.14 3.13 3.12 3.11 3.10 3.10 3.09 3.08 3.07 3.06 3.05 3.04 3.04 3.02 3.01 3.00 -2.03 Control Intervention -1.28 -0.53 0.22 0.97 Number of semesters in study Figure 4. Interaction of ECED intervention and number of semesters enrolled in a study school on Year 2 students’ attitudes toward school. Point-in-Time Analyses Predicting Individual Student Survey Scales Follow-up models were then estimated for each of the individual scales on the student questionnaire: (1) positive teacher support/expectations, (2) lack of teacher support, (3) engagement in school, (4) perceived competence, (5) relative autonomy index. The same series of six models was fitted. In the fourth model without interactions, the effect of treatment was non-significant both at the end of Year 1 and Year 2 for each of the scales. As with the second order factor, the baseline score was always a significant and large predictor in the models predicting these individual scales. Girls tended to report more positive experiences than boys. Relative to White students, Black and Asian/Pacific Islander students tended to report more positive attitudes toward school at the end of the second year. Student free/reduced lunch, special education, and English language learner services were generally not significant predictors. 136 As with the second order factor, in the fifth model, number of semesters in a study school was not associated with any of the scales. However, in the sixth models where the interactions were added, the number of semesters in a study school moderated the effect of treatment on perceived competence (β = -.02, SE = .01, p = .038) such that a negative effect of treatment on perceived competence was found for students who were enrolled more terms in treatment schools. Point-in-Time Analyses Predicting Math and ELA Achievement In Model 3 that includes treatment, district covariates, and baseline test score, a significant effect of treatment was found for math achievement such that students in treatment schools had higher math test scores than their counterparts in control schools. This finding was marginal in Year 1 (β = .18, SE = .09, p = .056, E.S. = .18) and significant in Year 2 (β = .15, SE = .07, p = .043, E.S. = .16). Treatment was not a significant predictor of ELA achievement. Table 28 presents the findings from the fourth model which adds student-level demographic covariates. The effect of ECED treatment is marginal both at Year 1 and Year 2 in this model, indicating that adding the demographic covariates slightly diminished the association between treatment and test score and the effect sizes were small (.16 at Year 1 and .14 at Year 2). No significant effects of treatment were found for ELA test scores in either year. Based on these analyses, we concluded that there is some evidence that intervention was effective at increasing math scores, but that ELA scores were unaffected by the treatment. As would be expected, baseline test scores were a significant and fairly large predictor of tests scores at each year. Additionally, as seen in Table 28, when controlling for all these other variables, several demographic covariates were significantly associated with achievement. 137 In Model 5, the number of semesters enrolled in a study school was not a significant predictor of math or ELA achievement, nor was test type a significant predictor of math achievement. The sixth model revealed no significant interactions between demographic characteristics and ECED treatment condition, indicating the treatment was equally effective at increasing math scores across all these groups and that ECED did not affect ELA scores in any demographic subgroup. 138 Table 28. Predicting Math and ELA Achievement Math achievement Year 1 Estimate Treatment (0=control) SE ELA achievement Year 2 p Estimate SE Year 1 p Estimate SE Year 2 p Estimate SE p 0.162 0.081 0.066 0.130 0.063 0.058 0.032 0.125 0.803 0.062 0.053 0.255 District 2 0.127 0.124 0.323 -0.060 0.092 0.526 0.048 0.122 0.701 -0.015 0.084 0.862 District 3 District 4 0.184 0.132 0.184 0.251 0.125 0.065 0.188 0.093 0.062 0.390 0.094 0.001 0.023 0.129 0.864 -0.058 0.123 0.646 -0.137 0.084 0.127 0.011 0.085 0.896 District 5 0.207 0.130 0.134 0.334 0.100 0.005 0.251 0.318 0.466 -0.148 0.107 0.188 Baseline 0.522 0.015 0.000 0.396 0.015 0.000 0.676 0.012 0.000 0.608 0.020 0.000 Gender (0=male) Hispanic -0.006 0.020 0.783 -0.144 0.040 0.001 -0.071 0.021 0.002 -0.131 0.047 0.018 0.022 0.019 0.257 -0.092 0.043 0.051 -0.012 0.024 0.627 -0.172 0.051 0.011 Black -0.190 0.038 0.000 -0.241 0.052 0.001 -0.147 0.050 0.018 -0.151 0.049 0.012 0.017 0.050 0.736 0.051 0.066 0.458 -0.002 0.052 0.972 -0.032 0.050 0.526 Amer. Indian/Other Free/reduced lunch -0.210 0.075 0.007 -0.012 0.028 0.661 -0.107 0.081 0.201 -0.005 0.032 0.879 -0.075 0.077 0.341 -0.061 0.026 0.034 -0.178 0.081 0.042 -0.099 0.026 0.000 Special education -0.305 0.050 0.000 -0.221 0.055 0.001 -0.290 0.052 0.000 -0.171 0.070 0.041 ELL -0.054 0.029 0.066 -0.025 0.027 0.355 Covariates Asian/Pacific Isl. . -0.082 0.028 0.006 -0.047 0.034 0.197 139 Point-in-Time Analyses Predicting Performance (GPA, Credits, and Attendance) Table 29 presents the findings from the fourth models with student level demographics covariates predicting grade point average (GPA), proportion of credits earned toward graduation, and attendance. The findings indicate that the ECED intervention did not affect grade point average (GPA) at the end of the second year of the study or the proportion of credits students earned at the end of Year 1 or the end of Year 2. Similarly, there were no effects on student attendance in either year. Each of the student baseline covariates were significantly related to GPA and credits earned. Additionally, the number of semesters enrolled in a study school, added as a control in the Model 5, was a significant predictor of GPA, credits earned, and attendance in both years. In Model 6, only one significant cross-level interaction was found. Students who were enrolled the most number of semesters in a treatment school had the highest attendance rate (see Figure 5). Again, because the number of semesters students enrolled in a study school is not exogenous to treatment, this moderation effect is non-experimental. 140 Table 29. Predicting GPA, Credits Earned, and Attendance GPA Credits Earned Year 1 Year 2 Year 1 Attendance Year 2 Year 1 Year 2 Est. SE p Est. SE p Est. SE P Est. SE p Est. SE p Est. SE p 0.15 0.13 0.28 0.11 0.10 0.31 0.13 0.08 0.16 0.07 0.07 0.35 0.23 0.14 0.12 0.14 0.15 0.38 District 2 0.12 0.20 0.56 0.20 0.16 0.24 -0.86 0.13 0.00 0.16 0.11 0.15 0.01 0.22 0.97 -0.02 0.21 0.92 District 3 District 4 -0.02 0.17 0.20 0.23 0.92 0.47 0.04 0.24 0.17 0.16 0.82 0.16 0.26 -0.29 0.13 0.13 0.07 0.05 0.31 -0.28 0.11 0.11 0.01 0.03 0.01 0.02 0.23 0.29 0.97 0.95 0.06 0.03 0.21 0.26 0.77 0.92 District 5 0.15 0.20 0.47 0.24 0.16 0.16 0.12 0.13 0.39 0.20 0.11 0.09 -0.10 0.22 0.65 -0.02 0.21 0.93 Gender (0=male) 0.28 0.02 0.00 0.32 0.02 0.00 0.07 0.01 0.00 0.06 0.01 0.00 0.04 0.02 0.14 -0.03 0.03 0.28 Hispanic Black -0.14 -0.24 0.05 0.04 0.01 0.00 -0.20 -0.27 0.04 0.04 0.00 0.00 -0.06 -0.08 0.02 0.02 0.00 0.00 -0.01 -0.05 0.02 0.02 0.63 0.01 0.06 0.09 0.05 0.05 0.24 0.05 0.11 0.06 0.05 0.05 0.05 0.19 Asian/Pacific Isl. 0.34 0.06 0.00 0.37 0.05 0.00 0.05 0.02 0.05 0.14 0.03 0.00 0.40 0.07 0.00 0.36 0.07 0.00 Amer. Indian/Other -0.40 0.09 0.00 -0.41 0.08 0.00 -0.09 0.04 0.01 -0.08 0.04 0.05 -0.07 0.08 0.40 -0.35 0.09 0.00 Free/reduced lunch Special education -0.04 -0.23 0.03 0.07 0.18 0.01 -0.16 -0.26 0.03 0.05 0.00 0.00 0.04 -0.09 0.01 0.02 0.00 0.00 -0.10 -0.04 0.02 0.02 0.00 0.08 0.08 -0.05 0.03 0.06 0.02 0.37 -0.07 -0.01 0.06 0.08 0.23 0.95 ELL -0.02 0.03 0.51 -0.09 0.03 0.00 0.00 0.01 0.80 0.03 0.02 0.03 0.09 0.03 0.01 0.00 0.04 0.96 Treatment (0=control) Covariates 141 0.24 Control Intervention Year 2 attendance 0.10 -0.05 -0.20 -0.34 -2.03 -1.03 -0.03 0.97 Number of semesters in study Figure 5. Interaction of ECED intervention and number of semesters in enrolled in study school on Year 2 attendance. 142 Growth Curves Predicting Students’ Attitudes Toward School Growth curve analyses were subsequently conducted on student survey outcomes in order to better understand the pattern of change over time in students’ attitudes toward school. Unconditional models. Before testing the effects of the ECED intervention on growth in student attitudes, three unconditional models without any covariates were compared for each outcome to determine the appropriate functional form of the growth curve. For each outcome, the three models compared were: an intercept only model, an intercept-slope only model, and an intercept-slope-quadratic model. Once the appropriate form of the growth curve was determined, the appropriate model was then used to test program impacts over time. With respect to the student survey outcomes, the intercept-slope model was a significantly better fit than the intercept-only model in all cases, indicating that that these outcomes showed meaningful growth across the four time points. Further, the intercept-slope-quadratic model was a significantly better fit than the intercept-slope only models and the quadratic parameter itself was significant, indicating curvilinear rather than linear change across time. Given the results from the unconditional models, intervention effects on the student survey outcomes were estimated for the intercept, slope, and quadratic parameters. The intercept was centered at Wave 1 and the effect of the intervention on the intercept represents intervention-control differences at baseline. The estimate on the slope parameter represents intervention-control differences in linear change in the outcome across the four waves. The estimate on the quadratic parameter represents intervention-control differences in rate of acceleration or deceleration of the trajectory over the waves of data. Intervention by covariate interactions were estimated for the quadratic parameter. Results are reported in Table 30 and 143 summarized below. Figures depicting significant impacts on student growth show smooth curves that reflect the underlying mathematical function rather than the observed data. Impacts on growth in students’ attitudes toward school. A non-significant treatment effect on the intercept parameter indicates that there was no intervention-control difference at baseline for students’ attitudes toward school as expected due to random assignment (see first section of Table 30 labeled “Intercept”). Looking at the second section on Table 30 (labeled “Slope”), there was a trend-level positive treatment effect on the slope parameter. Looking at the third section on Table 30 (labeled “Quadratic”), there was a significant negative treatment effect on the quadratic parameter. As shown in Figure 6, the ECED treatment group declined slightly less in the first year of the study, and continued to decline in the second year, while the control group leveled out in the second year. As shown in Table 30, there were effects of some baseline demographic covariates on the intercept such that girls reported more positive attitudes toward school at baseline, as did Asian/Pacific-Islander and Black (relative to White) students. American-Indian and Hispanic (relative to White) students reported less positive attitudes toward school. In addition, number of semesters in the study, added in Model 5, was significantly associated with the intercept and the slope. Students who were enrolled for more semesters in a study school reported more positive attitudes toward school at baseline compared to students who were enrolled fewer semesters in a study school. While a significant negative slope indicated that this difference got smaller over time, a positive trend-level association between number of semesters in study and the quadratic term indicated that the rate of decline in attitudes toward school was slower for students spending more time in study schools. 144 Model 6 indicated a significant cross-level interaction between number of semesters enrolled in a study school and treatment (β = -.002, SE = .001, p <.001) suggesting that while all students experienced an overall decline in attitudes toward school over two years, students who were enrolled the most semesters in control schools experienced a positive shift in attitudes toward school in the second year (see Figure 7). 145 Table 30. Predicting Growth in Student Attitudes Toward School Est. Intercept Treatment (0=control) Child covariates Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL Slope Treatment (0=control) Child covariates Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL Quadratic Treatment (0=control) Child covariates Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL SE p 0.01 0.01 0.35 0.06 -0.04 0.03 0.02 -0.06 -0.01 -0.03 0.02 0.01 0.01 0.01 0.02 0.03 0.01 0.02 0.01 0.00 0.01 0.02 0.20 0.04 0.21 0.12 0.10 0.01 0.01 0.08 0.02 -0.01 -0.01 0.02 -0.06 -0.01 0.03 0.00 0.01 0.01 0.02 0.02 0.03 0.01 0.02 0.01 0.01 0.67 0.63 0.35 0.08 0.57 0.21 0.98 -0.01 0.00 0.01 -0.01 0.00 0.00 0.00 0.02 0.00 -0.01 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.01 0.00 0.04 0.30 0.37 0.74 0.06 0.56 0.45 0.76 146 Students' attitudes towards school 3.20 Control Intervention 3.17 3.14 3.11 3.07 3.04 3.01 1.00 2.00 3.00 Time (Fall of Year 1 to Spring of Year 2) Figure 6. Impact of intervention on students’ Attitudes Toward School. Figure 7. Cross-level interaction between number of semesters in the study and treatment status on change in students’ Attitudes Toward School. 147 Associations Between ECED Implementation and Student Outcomes Across All Study Schools Student attitudes toward school. Findings from the student models suggested that variation in implementation did not consistently predict students’ attitudes toward school as reported in the student questionnaires. Table 31 presents the findings predicting students’ attitudes toward school—the second order factor—at the end of Year 2, controlling for district, baseline, and student demographics (Model 4). The variation in implementation estimate was non-significant. Follow-up models for the individual student scales (i.e., positive teacher support/expectations, lack of teacher support, self-report of engagement in school, perceived competence, relative autonomy index) indicated no associations between variation in implementation and the outcomes. In each of these models, the addition of the indicator variable for number of semesters students were enrolled in a study school did not alter the findings, nor was the estimate of this indicator significant in any of the models (Model 5). However, number of semesters enrolled did moderate the association between variation in implementation and students’ attitudes toward school, teacher support, lack of teacher support, and perceived competence (Model 6). A negative effect of being in schools with higher ECED implementation was found for students who were enrolled more terms in treatment schools. Students who spent the most time in schools with the least implementation had the most positive reports of their schools. In addition, baseline levels of the outcome moderated the relationship between variation in implementation and students’ attitudes toward school. The slope of the relationship between baseline and students’ attitudes toward school was steeper for students in low implementation schools, indicating a stronger positive relationship. 148 Table 31. Association Between Variation in Implementation and Student’s Attitudes Toward School (Second Order Factor) at the End of Year 2 Estimate Overall implementation District 2 District 3 District 4 District 5 Baseline Gender (0 = male) Hispanic Black Amer. Indian/Other Asian/Pacific Isl. Free/reduced lunch Special education ELL 0.00 -0.05 -0.02 -0.04 -0.05 0.57 0.05 -0.01 0.08 0.01 0.03 -0.01 0.03 0.02 SE 0.00 0.03 0.03 0.03 0.03 0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.02 0.01 p 0.500 0.06 0.45 0.11 0.10 0.00 0.00 0.81 0.00 0.46 0.02 0.51 0.07 0.10 Student achievement. The inclusion of the variation in implementation variable produced similar results to the models testing treatment/control differences in predicting students’ math and ELA test scores at the end of Year 2. Students in schools that implemented ECED to a greater extent had marginally higher math achievement at the end of Year 2, controlling for pre-intervention math scores and baseline demographic covariates (see Table 32). However, variation in implementation was not significantly related to students’ ELA scores at the end of the study. The addition of the math test type did not change the math finding, nor was this a significant predictor of achievement. The addition of the indicator variable for number of semesters students spent in the study did not alter the findings, nor was the estimate of this indicator significant in either model. 149 Table 32: Associations Between Variation in Implementation and Year 2 Math And ELA Achievement Math achievement Estimate Overall implementation Covariates District 2 District 3 District 4 District 5 Baseline Gender (0 = male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL SE p ELA achievement Estimate SE p 0.00 0.00 0.07 0.00 0.00 0.42 -0.08 0.17 0.36 0.34 0.40 -0.07 -0.13 -0.24 0.05 -0.11 0.00 -0.22 -0.03 0.09 0.09 0.10 0.10 0.01 0.02 0.05 0.05 0.07 0.08 0.03 0.06 0.03 -0.02 -0.14 0.00 -0.15 0.61 -0.01 -0.17 -0.15 -0.03 -0.18 -0.10 -0.17 -0.05 0.09 0.09 0.09 0.11 0.02 0.02 0.05 0.05 0.05 0.08 0.03 0.07 0.03 0.80 0.12 0.98 0.20 0.00 0.63 0.01 0.01 0.52 0.04 0.00 0.04 0.20 0.42 0.09 0.00 0.01 0.00 0.00 0.02 0.00 0.46 0.20 0.88 0.00 0.35 Note: The estimates and SEs of 0.00 are due to rounding. Student performance and commitment. Variation in implementation was not significantly related to students’ GPA, credits earned, or attendance at the end of the study (see Table 33 for the Model 4 results). The addition of the math test type did not change the findings, nor was this a significant predictor of performance or attendance. The addition of the indicator variable for number of semesters the student was enrolled in a study school did not alter the findings, but this indicator was significant and positive for all three outcomes. Further, the number of semesters enrolled moderated the association between variation in implementation and attendance (β = .001, SE = .00, p = .003), such that the slope of the relationship between semesters and attendance was steeper for students in high implementation schools. Students who 150 were enrolled the most terms in schools with the highest implementation had the highest attendance. Table 33. Associations Between Variation in Implementation and Year 2 Performance GPA Est. Overall implementation Covariates District 2 District 3 District 4 District 5 Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL SE Credits Earned p 0.00 0.00 0.11 0.18 0.01 0.20 0.25 0.32 -0.20 -0.27 0.37 -0.41 -0.16 -0.26 -0.09 0.16 0.16 0.15 0.16 0.02 0.04 0.04 0.05 0.08 0.03 0.05 0.03 0.28 0.93 0.21 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Est. SE p 0.00 0.00 0.11 0.15 0.30 -0.30 0.20 0.06 -0.01 -0.05 0.14 -0.08 -0.10 -0.04 0.03 0.10 0.10 0.11 0.10 0.01 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.18 0.01 0.02 0.08 0.00 0.63 0.01 0.00 0.05 0.00 0.08 0.03 Attendance Est. SE p 0.00 0.00 0.31 -0.04 0.04 -0.01 -0.01 -0.03 0.11 0.06 0.36 -0.35 -0.07 0.00 0.00 0.21 0.21 0.25 0.21 0.03 0.05 0.05 0.07 0.09 0.06 0.08 0.04 0.83 0.85 0.98 0.96 0.28 0.05 0.19 0.00 0.00 0.23 0.96 0.96 Note: The estimates and SEs of 0.00 are due to rounding. Associations Between ECED Implementation and Student Outcomes in Intervention Study Schools Student survey outcomes. Variation in implementation did not consistently predict students’ attitudes toward school within ECED intervention schools. Table 34 presents the findings predicting students’ attitudes toward school—the second order factor—at the end of Year 2, controlling for district, baseline, and student demographics (Model 4). The variation in implementation estimate was non-significant. Follow-up models estimating associations between variation in implementation within intervention schools and each of the individual student scales 151 were estimated. Only the association between implementation and lack of teacher support showed a positive association (β = .005, SE = .002, p = .058), indicating that greater implementation was association with more teacher support at the end of Year 2. The addition of the indicator for number of semesters in a study school did not change the findings, nor was the estimate significant. Table 34. Associations Between Variation in Implementation in Intervention Schools and Students’ Attitudes Toward School (Second Order Factor) at the End of Year 2 Est. Overall implementation Covariates District 2 District 3 District 4 District 5 Baseline Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL SE p 0.00 0.00 0.26 -0.11 -0.07 -0.10 -0.03 0.55 0.05 -0.01 0.03 0.01 0.09 -0.01 0.03 0.04 0.05 0.05 0.05 0.04 0.01 0.01 0.02 0.02 0.04 0.02 0.01 0.02 0.02 0.07 0.20 0.10 0.52 0.00 0.00 0.66 0.10 0.83 0.00 0.35 0.19 0.02 Student achievement. Within the intervention group, variation in implementation was not significantly related to math or ELA achievement at the end of the study (see Table 35). The addition of the math test type and number of semesters in the study did not make a difference, nor were the estimates of these indicators significant. 152 Table 35. Associations between variation in implementation within intervention schools and Year 2 math and ELA achievement Math achievement Est. Overall implementation Covariates District 2 District 3 District 4 District 5 Baseline Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL 0.00 -0.27 0.16 0.38 0.33 0.41 -0.06 -0.15 -0.26 0.05 -0.17 -0.01 -0.13 0.00 SE ELA achievement p Est. SE p 0.00 0.63 -0.01 0.00 0.11 0.15 0.16 0.17 0.14 0.02 0.04 0.06 0.06 0.09 0.09 0.04 0.06 0.04 0.07 0.17 -0.08 -0.01 0.62 -0.03 -0.18 -0.15 -0.03 -0.21 -0.12 -0.09 -0.04 0.15 0.39 0.09 0.08 0.00 0.10 0.01 0.00 0.57 0.06 0.87 0.04 0.94 0.06 0.06 0.11 0.00 0.04 0.03 0.06 0.05 0.06 0.14 0.03 0.13 0.04 0.30 0.05 0.51 0.11 0.00 0.34 0.01 0.01 0.60 0.14 0.00 0.48 0.34 Student performance. Variation in implementation within intervention schools was significantly associated with Year 2 GPA and credits earned, such that students in schools with greater implementation had higher GPAs and earned more credits toward graduation than students in schools with lower implementation, controlling for district and student demographics (see Table 36). Variation in implementation was not significantly related to student attendance at the end of Year 2. 153 Table 36. Associations Between Variation in Implementation Among Intervention Schools and Year 2 Performance GPA Est. Overall implementation Covariates District 2 District 3 District 4 District 5 Gender (0=male) Hispanic Black Asian/Pacific Isl. Amer. Indian/Other Free/reduced lunch Special education ELL SE Credits Earned p 0.02 0.01 0.04 -0.01 -0.13 -0.19 0.39 0.34 -0.27 -0.31 0.36 -0.50 -0.19 -0.22 -0.12 0.19 0.20 0.20 0.17 0.03 0.05 0.05 0.07 0.11 0.04 0.07 0.04 0.96 0.56 0.41 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 Est. SE p 0.02 0.00 0.00 -0.11 0.15 -0.52 0.31 0.09 0.02 -0.05 0.18 -0.04 -0.13 -0.01 -0.01 0.08 0.08 0.08 0.07 0.02 0.03 0.03 0.04 0.05 0.02 0.03 0.02 0.24 0.14 0.00 0.01 0.00 0.56 0.05 0.00 0.40 0.00 0.71 0.81 Attendance Est. SE p 0.01 0.01 0.32 -0.04 0.23 -0.25 0.18 -0.07 0.06 -0.02 0.35 -0.51 -0.09 0.04 -0.03 0.24 0.25 0.37 0.20 0.04 0.06 0.06 0.09 0.13 0.07 0.10 0.06 0.88 0.41 0.54 0.41 0.13 0.35 0.71 0.00 0.00 0.18 0.71 0.63 154 VII. Implementation and Data Collection Challenges We encountered numerous problems and setbacks with this grant. Indeed, as described in Appendix 1, the grant was awarded to do an effectiveness trial of First Things First, but the funding to do the FTF intervention did not materialize and the four districts that had agreed to participate all withdrew because of changes at their top administrative levels. Thus, we could not evaluate FTF with an effectiveness trial, as we had originally proposed. Thus, we reformulated the project after discussions with NCER to focus on an efficacy trial of ECED, which is the instructional improvement component of FTF, done without FTF’s structural changes. In carrying out ECED, we also encountered challenges, both in implementing the ECED supports and in collecting the needed evaluation data. This section outlines the types of difficulties experienced and concludes by discussing their implications for the impact evaluation of ECED. Challenges in Recruiting Schools to Participate Recruiting schools to take part in the ECED Efficacy Trial proved more difficult than anticipated. In the years just prior to recruiting for the ECED Efficacy Trial, IRRE experienced rapid growth and found that there were more schools and districts wanting their support than they could serve. At that time, No Child Left Behind (NCLB) was mandating large-scale or whole-school reforms for all of the nation’s low-performing schools, making districts eager to participate in this type of reform effort. Thus, we anticipated that many schools would be interested in participating in this project. Unfortunately, by the time recruitment for ECED began in 2008-09, most low-performing schools had already started one or more reform efforts. Many had instituted 9th-grade academies, some type of classroom walkthroughs by administrators and instructional leaders, and/or special additional classes for struggling students such as Read 180. 155 Many had moved to a block schedule and were creating communities of practice or instituting mentoring systems for teachers. Those types of reforms were all compatible with ECED, but they were generally just taking hold in the districts, making them hesitant to embark on another reform that might distract from their recent efforts. Additionally, because NCLB was mandating reforms in many schools, district administrators were being bombarded by individuals wanting to sell them reform packages, professional development models, and new curricula for struggling students. The ECED supports were being offered free of charge, but it was often difficult to get the attention of the district leaders because they were wary of individuals trying to sell them supports. Most districts that expressed initial interest remained interested as they learned the details. However, the addition of a full required course for all 9th- and 10th-graders (i.e., ECED Literacy) did prevent some districts and schools within districts from electing to participate. Student schedules—especially the schedules of the college-bound students—are often quite full and in many cases there was not space for an additional requirement. We accepted all schools/districts for participation that were interested following the site visit. If we had more and better-prepared schools/districts to choose from we would most likely have rejected some because of the magnitude of challenges we knew the schools would face in implementing the reform and fulfilling the research requirements. For instance, as noted earlier, one district does not regularly administer ELA tests to 10th-grade students. Given that that was a main outcome of this study, we might have excluded them if we had more interested districts to choose from that did administer this assessment. That same district included two schools that did not exist at the time of recruitment, making it impossible to meet with their leadership and ensure buy-in, but only two other schools within that district agreed to participate, and we need four. 156 Another district had agreed to participate in Recruitment Group 1 but then several members of the district leadership team, including the superintendent, left their positions before the intervention began. In order to keep them in the study we made an agreement with the interim leadership for them to begin participating in the project one year later, as part of Recruitment Group 2. Additionally, it was clear that that district did not have a good data system so it would be very difficult to get the school/district records data we needed. Still another district that we included was in turmoil when we were negotiating to begin the project there, and indeed the superintendent left the district early in the first term of ECED implementation. The new district administration had not been part of the recruitment phase and demonstrated little support for the project. In the end, we included all schools/districts that were interested in participating if they signed agreements to fulfill their implementation and research commitments. This lead to some major implementation challenges (outlined below), but does provide us with a very rigorous test of the model because we did not limit the study to schools that we were highly confident could implement with fidelity. Challenges in Implementing ECED Once schools and districts agreed to participate, IRRE encountered problems in fully implementing the model. High turnover in district and school leadership was one of the largest problems that limited implementation. As noted in Chapter III, four of the five superintendents were replaced between the time we recruited the district to participate and the end of the second year of implementation. Two of these four were forced out due to political and legal difficulties in the district. The fifth superintendent—the only one that stayed in place throughout the project—announced his resignation as the second year was coming to a close, after experiencing serious health problems during the second year. 66 Further, in all five districts, the individual who 157 had been most instrumental in coordinating and engaging with ECED at the beginning of the project—often the director of high schools or assistant superintendent of curriculum—was either reassigned to another position within the district or left the district entirely during the course of the project. At the principal level, 11 out of the 20 participating schools (six treatment, five control) experienced one or more changes in principals during the course of the study. 67 To be more specific, six principals (two treatment and four control) were brand new as their school began participation, meaning they had not participated in the recruitment efforts and were unfamiliar with the project when it started. At one of the treatment schools there were three different principals during Year 1 and a fourth principal was assigned as the second year began. That school stopped participating in ECED after Year 1. Five schools (four treatment, one control) had a change in principal between the end of the first year and the start of the second year. One treatment school changed principals during the second year of the study. Additionally, four schools learned that there would be major changes to their schools as the second year came to a close, causing significant disruption in two of the schools. One of the control schools learned late in the second year that the principal and all other administrators were being replaced due to low test scores. At another control school that had fairly stable leadership during the study, the staff learned right at the end of the second year that their school would be closed entirely when the school year ended. Two treatment schools learned that their principals were leaving as the second year ended, but those changes did not appear to cause major disruption. Further, in three districts (12 schools) the district finances and union contracts led to all teachers being “laid off” or “furloughed” in the spring of each year. Most teachers were re-hired 158 prior to the start of the next school year with no actual disruption in their employment, but the threat of not having a job or being reassigned made the day-to-day working conditions of all teachers very stressful for a significant portion of each year. Not surprisingly, this high level of leadership turnover and upheaval severely limited some districts’ abilities to focus on instructional improvement or to be open to changing long-standing practices that required considerable effort. This instability thus caused some school-level personnel to question if ECED remained a high priority within the administration. As explained elsewhere in this report, two treatment schools left the project prior to its completion. One left after the first semester of participation. In that district, the superintendent had been the individual most engaged with ECED and had encouraged the schools to take part. He was dismissed by the school board after only 14-months of leadership, in the midst of a highly contentious battle with the teachers’ union. Although the principal at the school that withdrew had expressed enthusiasm for ECED when the school agreed to participate, his support quickly faded when the superintendent was dismissed. Due to a hiring freeze, the school did not have a literacy coach until about one-month into the school year, meaning that she had not participated in the summer training activities. Additionally, the math coaching duties were given to the math chair. He was given some reduction in his teaching load in order to take on this additional role, but he never fully supported the ECED Math and was openly hostile toward the IRRE consultants working with the school. In sum, this school never embraced ECED and stopped participating after a single semester of weak implementation. Nonetheless, because it was randomly selected into the treatment condition, it is included in the intent-to-treat analyses and we were able to gather some Wave 4 data there. 159 The second school that stopped participation had three different principals during the year it participated. Additionally, an assistant principal who had been one of ECED’s biggest supporters died during the first year. A fourth principal was assigned to that school at the start of the second year. He was entirely unfamiliar with ECED and the teachers were against on-going participation because they found it quite demanding. Thus, as with the first school that stopped participation, this school had implemented weakly in the first year and stopped participation all together in the second year. It is, however, included in the intent-to-treat analyses. As noted earlier in the discussion on recruitment, the addition of an entire course to all 9th-and 10th-graders’ schedules proved challenging. This was especially true for the two districts in California. The course requirements for entrance into California’s public universities dictate the student schedules for college-bound students, leaving little-to-no room for an additional course. One district 68 resolved this by offering only one-half of the ECED Literacy curriculum and one-half of the regular 9th- and 10th-grade English curriculum. 69 This decision was made in conjunction with IRRE, after the MOU was signed. In the other district, one school 70 simply did not enroll students whose schedules were already full. It made this decision without consulting IRRE. The other treatment-group school in that district did enroll all 9th- and 10th-graders in the first year, but in the second year only enrolled those who had not passed the state’s standardized test in ELA the previous year. Additionally, the 9th- and 10th- grade English and ECED literacy classes became a single, year long, block course in which they were supposed to cover the English curriculum on certain days and the ECED literacy course on other days. This should have been enough time to cover the full curricula for both courses; however, in was clear from informal conversations with the staff that the regular English curriculum was given priority over the ECED literacy material. These decisions were made in consultation with IRRE and the 160 research team, but indicated that they would cease participating in the project if this accommodation was not made. As noted earlier, the implementation of the ECED Math was fairly successful at the eight schools that participated for two years and almost all used the mastery grading system. However, the mastery grading system was challenging for most schools, both because it was difficult to explain to students and parents and because teachers, students, and parents believed that it resulted in a higher rate of failure. Data received from the schools does seem to bear this out. 71 Excluding the two schools that just stopped participating, students in treatment schools received more Fs and Is 72 in Algebra 1 and Geometry at the end of every term than students in control schools. The largest difference was in Wave 4, Algebra 1 when 48% of students in treatment schools did not pass (i.e., received an F or I) while only 26% of students in control schools did not pass (t = -16.19, p < .001). A final implementation challenge had to do with teacher turnover and the reassigning of teachers. Ideally, the same individuals would have taught the target classes throughout the length of the project, meaning that they would have participated in the initial summer institute where the ECED philosophy and strategies are explained in-depth and would have received IRRE’s support for a full two years. Of course, we knew in advance there would be teacher turnover as a result of teachers leaving the schools, but the rate of teachers leaving was much higher than we had anticipated (see Table 1, p. 53). Further, we had not anticipated that schools would change teachers’ assignments as often as they did, including during a school year, resulting in a low number of teachers teaching a target class in all four waves. For instance, one school that was on a block schedule assigned half of their students to take English in the fall and ECED Literacy in the spring and the other half to take ECED Literacy in the fall and English in the spring. The 161 teachers followed this same pattern. So, when the spring term started, none of the teachers had ever taught ECED before and none had attended the summer professional development sessions designed to introduce them to the ECED curriculum. As noted in the Method Chapter, only about one-third of the study teachers taught a target course during all four waves. Because the project was intended to work with the same group of teachers for two full years, teachers leaving the schools and teachers changing assignment led to weakened implementation and extensive missing data. Data Collection Challenges IRRE has a proprietary, on-line system for collecting questionnaire responses. As described in Appendix 7, schools were given ‘tickets’ for each 9th- and 10th-grade student to use in taking the on-line survey. The tickets included an access code that IRRE could use to link the students’ responses to their study identification numbers. Schools decided which teachers and classes to use to administer the surveys and had to schedule time for each teacher who was administering surveys to spend time in a computer lab. The research project director worked closely with each school to facilitate this process, but each school was required to have an individual who was tasked with making sure that the surveys ran smoothly and that each student had an opportunity to participate. The success of this system for administering surveys depended largely on the commitment of the individual given this responsibility. Some worked hard to ensure that each student was given an opportunity to participate; others simply placed the tickets in the teachers’ boxes and provided no follow-up to ensure that the surveys were completed. Thus, as seen in the Method Chapter, survey response rates were typically between about twothirds and three-quarters of students who were enrolled and eligible. 162 Much of the key data for this project—test scores, student course schedules, student demographics—was housed in district databases. As a condition of participation, each district promised to provide the needed data to the evaluation research team. In the end, each district did provide most of the needed data, but receiving it took much longer than anticipated. In the most extreme case, one district that participated in the second recruitment group (2010-11 and 201112) did not provide the needed records data for either year until October, 2012. This was roughly 15 months after the project was scheduled to receive the 2010-11 data and three months after it was scheduled to receive the 2011-12 data. Often the delays were caused by staffing shortages. One district did not have a research staff at all. The individual who maintained their student records had very limited knowledge of how to extract information from their database. Several districts experienced significant staffing shortages and turnover within the research department during the project. And, several districts adopted new student records data systems during the project, meaning that they were unfamiliar with the new systems as they were trying to extract information. In addition to delays, the records data received from the district often contained inconsistencies and missing information. For example, one district operated only high schools and all students came from “feeder” primary districts. That district did not routinely obtain 7th- or 8th-grade test scores and had to request them as part of this project. A large proportion of the target students were missing from those files. Additionally, districts often sent separate files containing course-period and course-grade information. Those two files would not always agree about which courses a student had taken and would occasionally indicate that a student had taken a course from a teacher who was not listed as teaching that course according to other information provided by the district. This type of discrepancy required extensive hand cleaning on the part of 163 the research project director. In the end, all districts provided data and only a few data points were missing entirely within a district. However, no district could provide complete data on all students in the study and most had large gaps. School records data about students who left the district during the course of the year, which are needed for intent-to-treat analyses, proved especially difficult to obtain and often had to be imputed. This was true even for data regarding the portion of the year that they were present and for variables that would have been meaningful (e.g., course enrollments, attendance, free/reduced lunch). Implications of the Challenges for the Impact Evaluation Collectively, these challenges represent the real world of low-performing high schools serving ethnically diverse and low-income students. The schools that participated in the study were not optimally prepared to either implement ECED or to participate in the impact evaluation. There was extensive turnover in district and school leadership and in teachers. For all these reasons, ECED was not fully implemented in most districts and schools. Further, a large amount of data needed for the impact evaluation was missing, necessitating use of a complex multipleimputation strategy. As described in the data analysis section, through various procedures, we have taken steps to ensure the internal validity of the impact evaluation. While impact estimates may be biased downward (i.e., in the direction of finding no impacts when they might exist), they do not appear to be biased by accidentally favoring the treatments or control group of schools. 164 VIII. Discussion The current study examined the efficacy of Every Classroom, Every Day (ECED), an instructional improvement approach designed by the Institute for Research and Reform in Education. Whereas ECED was designed to be part of a more comprehensive whole school intervention called First Things First, in this study ECED was evaluated as a stand-alone intervention. This study is one of very few randomized field trials done in the area of educational reform that have involved multiple school districts and have randomized at the level of the high schools. The experiment was longitudinal, and involved four waves of data collection, at the start and end of two consecutive school years. The analyses used a multilevel design accounting for students and teachers nested within schools. Central to the intervention were the concepts of engagement, alignment, and rigor as markers of high quality instruction. Teachers received professional development and on-going supports to teach in ways that are more Engaging for students, more Aligned with local, state, and federal standards, and more Rigorous for all students (EAR). Trained independent raters, blind to the intervention status of the schools, observed classrooms as an important source of research data, and trained school leaders and consultants observed classrooms as a means of targeting professional development. Teachers and students completed questionnaires and school districts provided information about students, including demographic characteristics, scores on standardized achievement tests in math and English/language arts (ELA), progress toward graduation, grade point average, and attendance. Our primary hypothesis was that ECED’s instructional improvement interventions would improve math and ELA achievement as measured by standardized test scores. Secondary hypotheses were: that the intervention would improve other student performance outcomes such 165 as attendance, grade point average, and progress toward graduation, and would enhance teacher and student attitudes. Finally, we hypothesized that both the fidelity of implementation and the number of semesters students were in intervention schools would predict better school outcomes in non-experimental analyses. Summary of Most Important Findings Findings from this evaluation provide some evidence that ECED was efficacious in improving student achievement in math. Students in the treatment schools scored significantly (p = .04) higher on standardized tests of math than did students in the control-group schools, after controlling for pre-intervention math achievement and school district, although that result became only marginal (p = .06) when student demographic controls were added to the models (see p. 138). In addition, there was some evidence that fuller implementation of ECED was linked to better student outcomes such that students in schools with higher implementation scores had marginally higher math achievement (see p. 149), significantly higher grade point averages (see p. 152), and significantly higher credits toward graduation (see p. 152). As well, the number of terms students were enrolled in ECED schools moderated the association between implementation and attendance, such that those students with more semesters in schools with higher ECED implementation had higher attendance (p. 149). In contrast to these impacts, the ECED intervention did not improve achievement in English language arts (see p. 138), nor was the degree of ECED implementation related to ELA achievement (see p. 149 & p. 152). Preliminary analyses of the classroom observations for math indicated that across the two years, participation in ECED increased the rigor of classroom instruction relative to control-group schools (see p. 123 & p. 125). 166 Although there was indication that the ECED intervention led to some positive outcomes, including math achievement and rigor in math classrooms, there was also evidence that these improvements came at some cost to students and teachers. Among students in ECED treatment schools, students who were enrolled more terms reported worse attitudes toward school; whereas among students in control schools, those who were enrolled more terms reported better attitudes toward school (see p. 135). This finding appears to be primarily driven by a single component of student attitudes toward school, namely perceived academic competence. Students who were enrolled for more terms in treatment schools reported lower perceived academic competence (see p. 136). This decline in perceived academic competence for students with longer exposure to the ECED intervention could be a function of the increased rigor observed in math classes and to the overall raising of academic expectations and providing more mastery based feedback to students. Similarly, math teachers in treatment schools reported less mutual support among colleagues than did those in control schools (see p. 117). And, across the two year of the study, math teachers in ECED schools who taught more terms of courses targeted by the intervention reported that their districts’ leadership was less committed to change, while the opposite was true for teachers in control schools (see p. 117). Finally, based on preliminary analyses of the classroom observations it appears that ECED had a negative impact on observed student engagement in math classes (see p. 123 & p. 125). No effects of ECED were seen for ELA teacher attitudes and experiences (see p. 127). Implications Math achievement. STEM (Science, Technology, Engineering and Math) instruction and achievement is an increasingly high priority for our nation’s schools, as evidenced by President Obama’s “Educate to Innovate” campaign (The White House, n.d.). ECED Math 167 appears to be a promising strategy to address that priority. Through ECED Math, teachers work as teams to make math relevant and accessible to all students through ‘I Can…’ statements, assess students regularly to ensure they are mastering the content, and provide students multiple opportunities for relearning. Since the initiation of this ECED efficacy trial, IRRE has begun implementing similar benchmarking strategies across other courses within the high school curriculum in other sites where it is providing support. The results from this evaluation support the idea that this type of instruction, assessment, and teacher-student interactions may be beneficial with regard to student achievement. English Language Arts achievement. The ability to read, write, and communicate is also an important skill for today’s high school students. Indeed, the Common Core State Standards emphasizes the key role these skills play in college and career readiness and the interwoven nature of these skills (Common Core State Standards Initiative, n.d.). Unfortunately, this evaluation provided no evidence that ECED’s ELA intervention had an impact on students’ ELA achievement. The cornerstone of ECED’s ELA component is the Literacy Matters curriculum that focuses on expository reading and writing, as well as skills that cut across the curriculum such as critical thinking and skills for comprehending, organizing, and remembering information. It is meant to compliment the 9th- and 10th-grade English curriculum that has traditionally focused on literature. One possible explanation for the lack of ELA achievement improvements is that ECED Literacy was not implemented with sufficient fidelity. As noted elsewhere in this report only 26% of the students were enrolled all four terms and took the prescribed amount of ECED Literacy (see p. 59). Further, turnover was high among ECED literacy teachers, meaning that few teachers received all the supports intended to help them fully implement the curriculum (see p. 168 53). For example, an ELA teacher who was hired to teach ECED literacy during the second year of the intervention would not have had the 3-day professional development workshop during the summer before the first year, nor the other first-year supports, so he or she would be likely to have implemented the literacy curriculum less efficaciously. However, these limitations also affected math. Only 31% of students were enrolled all four terms and took both the math classes in which ECED was working. And, in the treatment schools, math and ELA teachers were employed an equal number of terms, so there is no evidence that turnover was greater in ELA than math. Nonetheless, ECED did impact math achievement, but not ELA achievement. Another possible explanation is that the skills taught in Literacy Matters are not those tested by the ELA achievement tests. Those tests may be more closely aligned to the regular English curriculum that is in use in both treatment and control schools than to the supplemental Literacy Matters curriculum. According to IRRE, ECED Literacy was designed to align with national standards similar to those represented in the new Common Core State Standards and it assessments were more performance-based in contrast to the district- or state-created standards and assessment in palace at the time of this study. Of course, based on this evaluation, it is also possible that the ECED Literacy simply does not have the intended benefits for students’ ELA achievement. Student and teacher attitudes and self-reported experiences. As noted, ECED was designed to be the instructional improvement component of the First Things First (FTF) approach to school reform. Prior to this evaluation, no school had implemented these instructional improvement strategies without also implementing the other two major components of FTF, namely creating small learning communities and implementing the student and family advocacy system. In math, the ECED instructional improvement strategy alone paid off in terms 169 of improvement in test scores, and fuller implementation of this intervention predicted better student performance and persistence (e.g., GPA, credits earned), all of which are primary concerns of today’s schools. But, test scores and performance indicators are only one facet of high quality education. On the negative side, these instructional improvements did not have the intended benefits for students’ attitudes toward school or teachers’ self-reported experiences, and they do appear to have been detrimental to some important aspects of the school experience, such as student engagement in math classes, students’ perceptions of their own academic competence, and teachers’ experience of mutual support. Rigor in math classes appears to have increased, and perhaps that was what led the students to perceive themselves as less competent. Further, the expectable stress of implementing interventions such as ECED can exacerbate already toxic conditions in struggling schools without deft leadership from principals and district staff— conditions that were apparent in the majority of the treatment schools in this study. Viewed together, the results of the current evaluation seem to suggest that to have larger and broader effects on school outcomes, along with more positive experiences and selfperceptions for students and teachers, additional supports for teachers and students are needed that focus on the interpersonal side of education. One approach would be to include changes to the design and/or implementation of ECED itself. An alternative would be to implement ECED—which is the instructional improvement component of FTF—within the context of the other two FTF components (viz., small learning communities and the advocacy system) that were designed to improve student engagement and school climate. These components could create the necessary conditions for students and teachers to get to know each other better, for each student to have an advocate within the faculty, and for both groups to feel supported. By improving the sense of community and shared goals among teachers and students, there would 170 seem to be a greater likelihood both that ELA outcomes would be enhanced and also that the improvements in math tests scores that were observed would be sustainable. A third potential approach would be to focus more attention earlier on in the site selection process on ensuring greater stability and quality in the leadership ranks of the districts and schools—both treatment and control schools—because the effective implementation and the evaluation of the intervention both rely on a threshold of leadership stability and support that was clearly absent in many cases in the schools in this trial. Without a threshold level, there is no hope of obtaining a fair and rigorous test of the intervention’s potential impacts. Indeed, other school reform models have been criticized for failing to couple their focus on quality curricula with effective district- and school-level expectations and supports for teachers and administrators. Contextual supports have the potential to create school-level “buy-in” and increase the likelihood of effective implementation that can lead to sustainable systemic change and desirable student outcomes over time (Berends et al., 2002; Desimone, 2002). Although much time and effort was spent by IRRE working on leadership and teacher development, as well as providing targeted and research-based curricular material and supports, the instability and level of turnover of teachers and administrators did not allow these resources to be directed at the same people consistently even for the two years of the intervention. This third potential approach does not mean to imply that some districts or schools should be denied access to interventions; rather, it means that they may need special preliminary work to ready them to take on the challenges of an intervention such as ECED. Performance and commitment. No Child Left Behind and Race to the Top have encouraged schools to place increasing attention on student achievement as measured by standardized test scores. However, there is some evidence that other aspects of school 171 performance—including course grades, credits toward graduation, and attendance—are equally or possibly even more important outcomes for predicting success in life (Farrington et al., 2012). Each of these is a marker for important personal attributes such as effort, perseverance, and selfregulation, all qualities that are critical for success in post-secondary education and careers. The evaluation provided some evidence that the extent to which ECED was implemented as intended by IRRE was positively related to students getting better grades, receiving more course credit toward graduation, and have higher attendance. However, these findings must be interpreted with caution. We did not obtain baseline information about these variables, so it is possible that these differences existed prior to implementation of ECED. This would seem to be especially likely given the baseline differences in level of economic disadvantages (see p. 57). Further, there were some inconsistencies in the patterns of findings. For grade point average and credits toward graduation, the relations were significant when all 20 schools were included in the analyses, but not significant when only treatment schools were considered (see p. 149 vs. p. 152). Of course, there were only 10 schools in the analyses that included only treatment schools, making the power quite low. For attendance, higher scores on the ECED implementation variable was a significant predictor of better attendance across the 20 schools for students who were enrolled a longer time (see p. 149), but there was no main effect of implementation scores across the 20 schools (see p. see p. 149) and there was no effect when the sample was limited to the 10 treatment schools (p. 152). Thus, although there is some evidence that ECED’s improves school performance, which is tied to school success, we cannot draw firm conclusions. Strengths of the Design and Analyses As one can see from Chapters III, V, and VI, this project employed state-of-the-art design and analytic strategies. First, the school randomized field trial design with particular emphasis on 172 creating unbiased estimates of effects allows us to draw causal conclusions about the impacts of ECED. Second, for our intent-to-treat analyses we were faced with a large amount of missing data. We addressed that problem using sophisticated multi-stage multiple imputation techniques (see p. 92). Multiple imputation reduces the bias introduced by missing data. Third, to properly account for the random assignment at the school-level and our interest in teacher and student data, we used multi-level modeling. Finally, longitudinal growth curve analyses were used for teacher and student questionnaire data and EAR classroom observations, where the data were directly comparable across time. The growth curves allowed us to compare student and teachers in control versus treatment schools on rates of changes over time. Limitations As detailed in Chapter VII the ECED Efficacy Trial experienced numerous challenges in recruiting, implementation, and data collection. One of the largest was missing data, which resulted from imperfect record keeping in the school districts, uneven administration of student surveys at participating schools, limited commitment to the project by some teachers, and mobility. Administrators, teachers, and students were all highly mobile and that mobility led to extensive missing data as well as weak implementation in some schools. The difficulty in combining test scores across state systems and across content is another limitation of this study. The fact that schools were randomized to implement ECED or not within districts means that intervention and control schools’ test scores were treated identically, protecting the internal validity of the study. The fact that this study took place in four different states, spread across the country, strengthens its external validity. However, each had its own testing schedule and system, forcing us to combine scores across systems. We cannot be certain that each system was testing the same type of material or that instructional quality was equally 173 linked to each set of tests. Further, within states, different students took different tests depending on their course enrollments. This is the nature of high school testing, but poses a problem for researchers looking for a common metric. Students in more advanced courses are given more advanced tests; it is not possible to know how those students would have scored had they been given the less advanced tests. Chapter III (p. 83) provides extensive details about how this challenge was mitigated, but we acknowledge that some of the findings, especially some of the non-significant findings, may be related to less-than-perfect outcome measures. Future Analyses This report addresses only the main research questions posed by this study, but there are many more questions that could be explored in the future. For instance, there is extensive additional work that could be done with the classroom observations of Engagement, Alignment, and Rigor. As noted in the teacher results chapter (Chapter V), once the multiple imputation is complete for the EAR data, the impact of ECED on E, A, and R for both math and ELA will be explored using the same analytic strategy as used elsewhere in this report, including both pointin-time and growth curve analysis. Further, IRRE’s theory of change posits that EAR should causally mediate the relation between ECED implementation and achievement. This hypothesis can be explored in the future. Analyses by Early and colleagues (2013) indicate that observed engagement at Wave 1 predicts math and ELA test scores at the end of Year 1. Those analyses could be expanded to account for the multiple teachers that each student experienced across the two years of the study and to better understand links between observed E, A, and R and test scores, as well as attitudes, grades, credits, and attendance. Finally, it would be valuable to understand what types of conditions and teacher characteristics lead to greater E, A, and R. Analyses could be conducted that link various professional development experiences as reported 174 on the teacher questionnaire and student and teacher demographic characteristics to changes in E, A, and R. A second line of future inquiry is whether there are certain students or teachers who benefited more from ECED than others. The analyses presented in this report use primarily an intent-to-treat framework, with some analyses including variation in implementation. These are the appropriate analyses to answer impact questions and tell us the net impact we would expect to see in a typical situation where schools, teachers, and students vary in commitment to the intervention and where a host of real-world circumstances limit the extent to which any intervention could be implemented with fidelity. However, this is a conservative approach that precludes us from knowing the circumstances under which the ECED supports are beneficial. The intent-to-treat approach, for instance, means that the two schools that stopped participating in the project are treated like all the others in the treatment condition, although of course they received much lower scores on variation in implementation (i.e., fidelity). Likewise, students at treatment schools who never enrolled in ECED Literacy, never took one of the math courses targeted by ECED, were enrolled very few days, or had very low attendance, were treated the same as students who received the full dosage of ECED. Additionally, teachers who started at a treatment school late in the project and received little of the support, as well as teachers who implemented few of the ECED strategies, were treated the same as those who took part for four terms and implemented most components. And, regular 9th- and 10th-grade English teachers at treatment schools who never taught ECED Literacy are included with the ECED Literacy teachers. ‘Treatment on the treated’ analyses could be conducted that focus on teachers and students who actually received the treatment. Although these would be non-experimental analyses and would not permit causal conclusions, they would allow us to test various path 175 models with data from individuals who experienced the treatment. Likewise, the current experimental analyses could be expanded to include additional student-level interactions that more fully explore possible differential effects of ECED. For instance, students who had different attitudes toward school or who differed with regard to motivation (as measured by the relative autonomy index) at the start of the project might have different outcomes as a function of their participation. Student and teacher mobility is a third line of inquiry that could be addressed in the future. As noted elsewhere in this report, both student and teacher mobility were high and it is possible that the turnover affected the implementation, the outcomes, or both. Future analyses could consider the role mobility played in ECED’s implementation and the relation of that to the outcomes. Conclusions and Recommendations ECED Math appears to be a valuable path to improved student standardized test scores in math. In schools from five districts in four states, which were fraught with problems and served high percentages of students from economically disadvantages homes, the use of the ECED Math approach resulted in improved math scores relative to those in control-group schools. We found this effect with a relatively small sample size (10 schools per condition) and a stratified random assignment within districts, leaving only about 14 degrees of freedom. Further, of the 10 treatment schools, two had dropped out by the end of the first year, weakening the intervention implementation. Still there was indication that the approach did enhance achievement in math. Nonetheless, the intervention did have some negative effects on teacher and student experiences and self-perceptions, which over time might have interfered with the stability of the positive effects. Thus, we would recommend either that ECED be implemented within the 176 framework of a larger school reform effort, such as First Things First, which focuses on teacher and student support and school climate or that various teacher and student supports be added to the ECED intervention so that teachers and student within ECED would feel better about themselves and about their work and so that these feelings might help to buttress the positive achievement effects that are likely to result from implementation of ECED. 177 IX. References Berends, M., Bodilly, S., & Kirby, S. N. (2002). Facing the challenges of whole-school reform: New American Schools after a decade. Santa Monica, CA: RAND Corporation. Retrieved from: http://www.rand.org/pubs/monograph_reports/MR1498.html Borman, G. D., Hewes, G. M., Overman, L. T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73(2), 125-230. doi: 10.3102/00346543073002125 Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical Linear Models in social and behavioral research: Applications and data analysis methods (First Edition). Newbury Park, CA: Sage Publications. Cavalluzzo, L., Lowther, D. L., Mokher, C., & Fan, X. (2012). Effects of the Kentucky Virtual Schools' hybrid program for algebra I on grade 9 student math achievement. Final report (NCEE 2012-4020). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Educational Sciences, U.S. Department of Education. Retrieved from: http://ies.ed.gov/ncee/edlabs/regions/appalachia/pdf/20124020.pdf Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in Psychology. Psychological Assessment, 6(4), 284290. doi: 10.1037/1040-3590.6.4.284 Cohen, J (1992). A power primer. Psychological Bulletin, 112 (1): 155–159. doi:10.1037/00332909.112.1.155 Common Core State Standards Initiative (n.d.). Retrieved from: http://www.corestandards.org/ Corrin, W., Lindsay, J. J., Somers, M. A., Myers, N. E., Meyers, C. V., Condon, C. A., & Smith, J. K. (2012). Evaluation of the Content Literacy Continuum: Report on program impacts, 178 program fidelity, and contrast (NCEE 2013-4001). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Educational Sciences, U.S. Department of Education. Retrieved from: http://files.eric.ed.gov/fulltext/ED538060.pdf Corrin, W., Somers, M. A., Kemple, J. J., Nelson, E., & Sepanik, S. (2008). The Enhanced Reading Opportunities Study: Findings from the Second Year of Implementation (NCEE 2009-4036). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Educational Sciences, U.S. Department of Education. Retrieved from: http://www.mdrc.org/sites/default/files/full_554.pdf Darling-Hammond, L., Wei, R. C., Andree, A., Richardson, N., & Orphanos, S. (2009). Professional learning in the learning profession: A status report on teacher development in the United States and abroad. Dallas, TX: NSDC. Retrieved from: http://learningforward.org/docs/pdf/nsdcstudy2009.pdf. Desimone, L. (2002). How can comprehensive school reform models be successfully implemented? Review of Educational Research, 72(3), 433-479. doi: 10.3102/00346543072003433 Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward better conceptualizations and measures. Educational Research, 38, 181-199. doi: 10.3102/0013189X08331140 Early, D. M., Rogge, R., Deci, E. L. (2013). Engagement, alignment, and rigor as vital signs of high-quality instruction: A classroom visit protocol for instructional improvement and research. Manuscript submitted for publication. 179 Elmore, R. F. (2002). Bridging the gap between standards and achievement: The imperative for professional development in education. Washington, DC: Albert Shanker Institute. Retrieved from: http://www.gtlcenter.org/sites/default/files/docs/pa/3_PDPartnershipsandStandards/TheImper ativeforPD.pdf Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121-138. doi: 10.1037/1082-989X.12.2.121 Faggella-Luby, M., & Wardwell, M. (2011). RTI in a middle school: findings and practical implications of a tier 2 reading comprehension study. Learning Disability Quarterly, 34(1), 35-49. Farrington, C.A., Roderick, M., Allensworth, E., Nagaoka, J., Keyes, T.S., Johnson, D.W., & Beechum, N.O. (2012). Teaching adolescents to become learners. The role of noncognitive factors in shaping school performance: A critical literature review. Chicago: University of Chicago Consortium on Chicago School Research. French, S. E., Seidman, E., Allen, L., & Aber, J. L. (2006). The development of ethnic identity during adolescence. Developmental Psychology, 42(1), 1-10. doi: 10.1037/0012-1649.42.1.1 Gambone, M.A., Klem, A. M., Summers, J. A., Akey, T. A., & Sipe, C. L. (2004). Turning the tide: The achievements of the First Things First education reform in the Kansas City, Kansas Public School District. Philadelphia: Youth Development Strategies, Inc. Retrieved from: http://www.ydsi.org/ydsi/pdf/turningthetidefullreport.pdf 180 Garet, M. S., Porter, A. C., Desimone, L. M., Birman, B., & Yoon, K. S. (2001). What makes professional development effective? Analysis of a national sample of teachers. American Educational Research Journal, 38(3), 915–945. doi: 10.3102/00028312038004915 Grolnick, W. S., & Ryan, R. M. (1989). Parent style associated with children's self-regulation and competence in school. Journal of Educational Psychology, 81, 143-154. doi: 10.1037/0022-0663.81.2.143 Heller, R., & Greenleaf, C. L. (2007). Literacy instruction in the content areas: Getting to the core of middle and high school improvement. Washington, DC: Alliance for Excellent Education. Retrieved from: http://carnegie.org/fileadmin/Media/Publications/PDF/Content_Areas_report.pdf Hulleman, C. S., Rimm-Kaufman, S. E., & Abry, T. (2013). Innovative methodologies to explore implementation: Whole-part-whole--Construct validity, measurement, and analytical issues for intervention fidelity assessment in education research. In T. Halle, A. Metz, & I. Martinez-Beck (Eds.), Applying implementation science in early childhood programs and systems (pp. 65-93). Baltimore: Paul H. Brookes. Joyce, B. & Showers, V. (2002). Student Achievement through staff development (3rd ed.) Alexandria, BA: Association for Supervision and Curriculum Development. Kansas State Department of Education. Assessment report. Retrieved from: http://www.ksde.org/Default.aspx?tabid=233 181 Kemple, J. J., Corrin, W., Nelson, E., Salinger, T., Herrmann, S., & Drummond, K. (2008). The Enhanced Reading Opportunities study (NCEE 2008-4015). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Educational Sciences, U.S. Department of Education. Retrieved from: http://www.air.org/files/ERO_Full_Report_Year2011.pdf Lang, L., Torgesen, J., Vogel, W., Chanter, C., Lefsky, E., & Petscher, Y. (2009). Exploring the relative effectiveness of reading interventions for high school students. Journal of Research on Educational Effectiveness, 2(2), 149-175. doi: 10.1080/19345740802641535 Ludtke, O., Trautwein, U., Kunter, M., & Baumert, J. (2006). Reliability and agreement of student ratings of the classroom environment – A reanalysis of TIMSS data. Learning Environments Research, 9, 215-230. doi:10.1007/s10984-006-9014-8 McClelland, G. H. & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376-390. doi: 10.1037/00332909.114.2.376 Muthén, L. K. & Muthén, B. O. (1998-2009). Mplus. (Version 6.12) [Computer software]. Los Angeles, CA: Muthén & Muthén National Research Council and the Institute of Medicine. (2004). Engaging Schools: Fostering High School Students’ Motivation to Learn. Committee on Increasing High School Students’ Engagement and Motivation to Learn. Board on children, Youth, and Families, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Quint, J. (2006). Meeting five critical challenges of high school reform. New York City: MDRC. Retrieved from: http://www.mdrc.org/sites/default/files/full_440.pdf 182 Quint, J., Bloom, H. S., Black, A. R., Stephens, L., & Akey, T. M. (2005). The challenge of scaling up educational reform, Findings and lessons from First Things First, Final report. New York City: MDRC. Retrieved from: http://www.mdrc.org/sites/default/files/full_531.pdf Rakes, C. R., Valentine, J. C., McGatha, M. B., & Ronau, R. N. (2010). Methods of instructional improvement in algebra: A systematic review and meta-analysis. Review of Educational Research, 80(3), 372-400. doi: 10.3102/0034654310374880 Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical Linear Models (Second Edition). Thousand Oaks: Sage Publications. Roberston, J. (2013, February 8). Benchmarking is a big boost for math. The Kansas City Star. Retrieved from: http://www.kansascity.com/2013/02/08/4056702/benchmarking-is-a-bigboost-for.html Roediger, H. L., Agarwal, P. K., McDaniel, M. A., & McDremott, K. B. (2011). Test-enhanced learning in the classroom: Long-term improvements from quizzing. Journal of Experimental Psychology: Applied, 17(4), 382-395. doi: 10.1037/a0026252 Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons. Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Annals of Internal Medicine, 152, 1-7. Seidman, E., Allen, L., Aber, J. L., Mitchell, C., & Feinman, J. (1994). The impact of school transitions in early adolescence on the self-system and perceived social context of poor urban youth. Child Development, 65(2), 507-522. doi: 10.2307/1131399 183 Si, Y. & Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics, 38(5), 499-521. doi: 10.3102/1076998613480394 Slavin, R. E., Cheung, A., Groff, C., & Lake, C. (2008). Effective reading programs for middle and high schools: A best evidence synthesis. Reading Research Quarterly, 43(3), 290-322. doi: 10.1598/RRQ.43.3.4 Slavin, R. E., Lake, C., & Groff, C. (2009). Effective programs in middle and high school mathematics: A best-evidence synthesis. Review of Educational Research, 79(2), 839-911. doi: 10.3102/0034654308330968 Tseng, V., & Seidman, E. (2007). A systems framework for understanding social settings. American Journal of Community Psychology, 39(3-4), 217-228. doi: 10.1007/s10464-007-9101-8 US News and World Report (2013, April); Best high schools. US News and World Report. Retrieved from: http://www.usnews.com/education/best-high-schools White House (n.d.) Retrieved from: http://www.whitehouse.gov/issues/education/k-12/educateinnovate 184 Appendix 1: Change in Project Focus This project was originally entitled Scaling Up the First Things First Reform Approach and was funded through a grant from the Institute of Education Sciences (IES), U.S. Department of Education. Its original aim was to conduct a randomized field trial (RFT) of the effectiveness of scaling up the First Things First (FTF) approach to school reform high schools that serve large percentages of disadvantaged students. FTF, designed and implemented by the Institute for Research and Reform in Education (IRRE), creates multiple theme-oriented small learning communities (SLCs) made up of 300-350 students and 15-18 teachers within a larger school. Within the SLCs are family and student advocacy groups, in which a teacher (the advocate) has 15 to 20 students across the four grades with whom he or she meets weekly and with whose family he or she is the liaison. The third and final key FTF strategy is instructional improvement (II), which involves creating enriched learning opportunities that are rigorous and engaging for all students and are aligned with district, state, and federal standards. During the first year of the grant, we encountered two major difficulties in relation to the project: (1) each of four sites that had committed to be part of the trial at the time we submitted the application withdrew their commitment shortly before the grant was to begin; and (2) the funding that IRRE had expected to receive to support the FTF intervention that our project would have evaluated failed to materialize at the end of the first year of our grant. Accordingly, we had spent the first year of the grant recruiting sites for the initial project, but at the end of that year we had to tell these sites that we could not begin the project as planned. Further, this meant that it was impossible for us to do the proposed S-RFT because there would not be enough years to complete the trial if we did not select and randomize the first cohort until the Spring of 2009 185 (i.e., late in Year 2 of the grant). We therefore worked with the program division of the National Center for Education Research (NCER), within IES, to reformulate the project. We designed two studies in the summer of 2008, which was early in the second year of the grant: (1) a validation study of the Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol designed by IRRE and used to assess the quality of classroom instruction; and (2) a S-RFT to examine the efficacy of the instructional improvement component of FTF. We refer to the intervention as Every Classroom, Every Day (ECED). We submitted a request to NCER in the summer of 2008 (early in Year 2 of the grant) to change the grant’s scope of work to these two studies. We received informal approval from NCER in 2008 to do the validity study and began recruitment for the efficacy trial of ECED. In the Spring of 2009 we submitted a “miniapplication” to proceed with the efficacy trail and to have these two studies replace the initial effectiveness trial. We were granted informal approval shortly thereafter in 2009 to begin the efficacy trail in the summer of 2009. In May 2010 we received formal approval from NCER for all the requested changes. Thus, the original title of the grant is misleading as the project is not an effectiveness trial of First Things First but is instead primarily an efficacy trial of the Every Classroom, Every Day component of FTF, evaluated as a free-standing intervention rather than a part of FTF. 186 Appendix 2: Findings From the First Component of Revised Project: Validation of the EAR Classroom Visit Protocol (Manuscript Under Review) ENGAGEMENT, ALIGNMENT, AND RIGOR 187 Engagement, Alignment, and Rigor as Vital Signs of High-Quality Instruction: A Classroom Visit Protocol for Instructional Improvement and Research Diane M. Early, Ronald D. Rogge, and Edward L. Deci University of Rochester Diane M. Early, Department of Clinical and Social Sciences in Psychology, University of Rochester; Ronald D. Rogge, Department of Clinical and Social Sciences in Psychology, University of Rochester; Edward L. Deci, Department of Clinical and Social Sciences in Psychology, University of Rochester. The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305R070025 to the University of Rochester. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education. The authors wish to thank the Institute for Research and Reform in Education for their support of this work and the participating students, teachers, schools and districts for their support of the data collection efforts. Correspondence concerning this article should be addressed to Diane M. Early, Department of Clinical and Social Sciences in Psychology, University of Rochester, P.O. Box 270266, Rochester, NY, 14627, E-mail: [email protected] Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 188 Engagement, Alignment, and Rigor as Vital Signs of High-Quality Instruction: A Classroom Visit Protocol for Instructional Improvement and Research Abstract This paper investigates Engagement (E), Alignment (A), and Rigor (R) as vital signs of highquality teacher instruction and examines the reliability and predictive validity of the EAR Classroom Visit Protocol, designed by the Institute for Research and Reform in Education (IRRE). In Study 1, we examined observations of 33 English/Language Arts (ELA) teachers and 25 mathematics teachers from four high schools. Study 2 included 63 math and 64 ELA teachers from eight high schools. Engagement was a consistent predictor of math and ELA test scores, when controlling for the previous year’s score. Further, under some circumstances, alignment and rigor also served as indicators of high quality instruction. Students’ self-report of their engagement in school was also generally predictive of test scores in models that also included perceived academic competence and observed engagement, alignment, or rigor. We discuss the importance of classroom engagement as a marker of instructional quality and a predictor of student achievement. Keywords: instructional quality, engagement, alignment, rigor, high school, standardized test scores Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 189 Engagement, Alignment, and Rigor as Vital Signs of High-Quality Instruction: A Classroom Visit Protocol for Instructional Improvement and Research In recent years there has been substantial discussion about the quality of our nation’s educational system, with both policy makers and education experts maintaining that, on average, the quality of U.S. education is lower than optimal. To support this claim they often point to international test-score results. For example, results from the Program for International Student Assessments (PISA) for reading literacy for 15-year-olds indicated that the U.S. ranked only in the top 26 out of 65 participating countries, with nine of the countries having scores that were significantly higher than those of the U.S. (National Center for Education Statistics, Institute of Education Sciences, 2009). In math, the Trend for International Mathematics and Science Study (TIMSS) results for eighth-grade students indicated that the U.S. scores were only among the top 24 out of the 56 educational systems involved, with 11 systems having significantly better scores than the U.S. (National Center for Education Statistics, Institute of Education Sciences, 2011). Earlier similar test results were part of the justification for the No Child Left Behind legislation enacted in 2002, which both mandated standardized achievement tests in all states seeking federal funds and required schools and school districts to improve student test scores (Rothman, 2012). Subsequently, the Race-to-the-Top program has added to the press for improved test scores by increasingly holding individual teachers accountable for improving the scores of students in their classes (Klein, 2012). The National Research Council and the Institute of Medicine (2004) argued that the quality of teachers’ instruction is the most proximal and powerful predictor of students’ learning. Accordingly, considerable interest has been directed toward methods for assessing and improving the quality of teacher instruction in our schools. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 190 High-Quality Classroom Instruction Senior staff members from the Institute for Research and Reform in Education (IRRE) were interested in developing a tool for measuring ‘vital signs’ of instructional quality. Similar to a physician measuring blood pressure or pulse to obtain a quick picture of a person’s health, IRRE sought to identify vital signs of instructional quality that could be measured quickly and often as a way of tracking variation in the quality of instruction. To this end, they began with the question, “what characteristics of classroom instruction would make it excellent?” and turned to the existing research literature, as well as to their own experiences working to improve schools, to formulate an answer. They noted that student engagement was consistently linked to high quality instruction and learning. Numerous studies published in the past 30 years have confirmed that when students are high in intrinsic motivation, they become more engaged in learning, which leads to deeper and more conceptual understanding (e.g., Benware & Deci, 1984; Grolnick & Ryan, 1987), particularly when the learning tasks are heuristic rather than algorithmic (McGraw, 1978). As well, there is evidence that, when students have fully internalized the regulation of learning particular topics, even ones they do not find interesting, they tend to be more engaged in learning and to perform better than when learning is controlled by external or internal contingencies (e.g., Black & Deci, 2000; Grolnick & Ryan, 1989). Thus, intrinsic motivation and fully internalized motivation predict engagement and positive educational outcomes; together they have been referred to as autonomous motivation for learning (Ryan & Deci, 2000). Still other research has shown that when teachers are supportive of students, are interested in the material, and are enthusiastic about teaching, students tend to be more autonomously motivated and engaged (e.g., Deci, Schwartz, Sheinman, & Ryan, 1981; Patrick, 1995). Evidence such as this, combined with their experiences in schools, led the IRRE Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 191 staff to conclude that engaging classroom instruction, encouraged by teachers’ support of students, enthusiasm about teaching, and interest in the content they are teaching, matters for student learning and should be measured as a vital sign of high-quality instruction. The IRRE team then considered whether, in light of the circumstances in our schools, engaging instruction alone would be sufficient to index teaching quality. What if students were engaged in learning that was not aligned with curricula and the types of assessments being used by the district? What if the level of instruction was too low to yield the grade-level mastery being assessed by state and national achievement tests? These discussions within the IRRE team led them to postulate two other potentially important vital signs of excellent instruction, namely alignment and rigor. Because the federal government had mandated that states administer standardized achievement tests as the primary indicator of student learning, the IRRE staff reasoned that classroom instruction would need to be aligned with state standards in order to be high quality and affect student performance. This was not intended to be interpreted as teaching to the tests, but rather as being sure that the material—that is, content and curriculum—widely agreed upon by educators as being important for students at particular grade levels was covered in the relevant classes. Alignment, as conceptualized here, is a vital sign of instructional quality because it measures the extent to which the teacher is providing content that is on-time and ontarget with what students need to learn. Other researchers have developed systems for quantifying the extent to which state-mandated assessment systems are aligned with state educational standards (Herman, Webb, & Zuniga, 2007; Webb, Herman, & Webb, 2007). That type of alignment is also important, but differs from the alignment measured by the EAR Protocol because it is a characteristic of the testing system, rather than a component of Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 192 instructional quality. Finally, rigor was selected for two reasons: (1) because the literature showed a strong connection between challenge and students’ intrinsic motivation (e.g., Csikszentmihalyi, 1975; Danner & Lonky, 1981; Deci, 1975; Harter, 1978); and (2) to ensure that all students would be expected and supported—by learning materials, the work being asked of them, and evaluations of their work—to master material at levels sufficient to yield grade-level or better learning of the subject matter embodied by the standards. Although rigor is often interpreted to mean making schoolwork extremely hard, it is here intended to convey that expectations for all students are consistently high and that instructional strategies deployed by teachers ensure that the work presented optimally challenges all students to move from where they are toward high standards. Having postulated that engagement, along with alignment and rigor, would represent vital signs of excellent instruction, the IRRE staff developed an observational protocol to assess these dimensions. The goal was to be able to gather reliable information on a regular basis (1) to provide ongoing feedback to teachers so they could reflect on their own teaching, (2) to aid administrators in selecting professional development activities, and (3) to assess whether the self-reflection and professional development is making a difference in what the teachers actually do in the classroom. Change in student test scores is certainly an important long-term outcome of assessing a professional development strategy, but schools need more immediate feedback to gauge whether their efforts to improve instruction are working. The resulting tool, named the Engagement, Alignment, and Rigor (EAR) Classroom Visit Protocol, was intended for use by school staff, as well as outside consultants and researchers. Classroom Observations Currently, there are a few tools available that have been found reliable and valid for Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 193 assessing instruction. A recent investigation by the Bill & Melinda Gates Foundation (2012) of five such observational tools concluded that each was a valid predictor of student achievement gains when used by trained observers with a background in education but without special knowledge of instructional assessment or ties to the specific tool. The EAR Protocol, if found to be reliable and valid, would expand the list of useful tools because it has several desirable characteristics and a somewhat different focus that may be preferable for specific schools or districts depending on their interests and needs. For example, the EAR Protocol was designed for use in all subject areas, including core subjects such as math and English/language arts (ELA) and electives such as art and physical education. Its focus is a set of specific instruction-related experiences for students that result from what teachers do, rather than measuring teacher behaviors or attributes. That is, E, A, and R are postulated to be vital signs of quality that lend themselves to improvement through multiple modalities and foci of professional development, but they do not focus on implementation of specific instructional strategies. Further, if used widely in a school or district, the protocol provides a common language and set of descriptors to promote conversations about high-quality instruction across grade levels and subject areas,. The EAR Protocol requires a 20-minute observation, providing enough time to obtain a clear picture of what is happening in the classroom while still being feasible for school administrators to use on a regular basis. Multiple observations of a single teacher, grade, or department are necessary for the results to be meaningful, but having this short observation period makes it usable for administrators who generally have very full schedules. The 20-minute observation stands in contrast to the three- to five-minute “walk-throughs” that are popular with school personnel, some of which are imported from external sources and others of which are “home grown” by the school district or individual schools. Although those very brief visits may Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 194 help administrators gain a picture of the general state of instruction in their schools, they are highly subjective and are too brief to provide a meaningful understanding of what is really taking place in an individual teacher’s classroom (Downey, Steffy, English, Frase, & Poston, 2004; Protheroe, 2009). In contrast, the 20-minute EAR Protocol, allows for a richer sampling and more quantitative representation of instructional quality. Additionally, data from the EAR Protocol are collected using an electronic data collection system (via smartphone or tablet computer) and are uploaded immediately. Reports and graphs can be generated through an on-line system that aggregates observations across an individual teacher, entire department, grade, small learning community, school, or district. This on-line system provides school and district administrators with immediate feedback and the ability to quickly identify trends and changes over time. The Current Research The EAR Classroom Visit Protocol was developed in 2004, and IRRE began field testing it immediately. To date, it has been used in more than 100 elementary, middle, and high schools across the country for more than 27,000 visits (Broom, 2012). Those data, and feedback from the schools that use the tool, provide preliminary indication of its utility. The current study was designed to be a more rigorous test of the reliability and validity of the tool, conducted by an independent research team. Thus, the current paper (1) describes the EAR Protocol, (2) investigates the tool’s inter-rater reliability, both when used by trained observers from outside the school and by trained school and district personnel, and (3) examines the tool’s predictive validity, both by itself and in conjunction with students’ questionnaire responses, using standardized test scores as the outcome of interest. The EAR Classroom Visit Protocol Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 195 In the EAR Protocol, engagement is defined as students being actively involved— emotionally, behaviorally, and cognitively—in their academic work. When students are engaged, they are actively processing information (listening, watching, reading, thinking) or communicating information (speaking, performing, writing) in ways that indicate they are focused on the task and interested in it (Connell & Broom, 2004). Engagement is a prerequisite for school success. It leads to effort and persistence, both of which allow students to profit from challenging curricula (National Research Council and the Institute of Medicine, 2004). The EAR Protocol defines alignment as students (1) being asked to do and actually doing schoolwork that reflects academic standards and (2) having opportunities to master the methods used on high stakes assessments such as their state’s standardized tests and college entrance exams. It can be assessed in relation to district, state, or national standards and assessments such as the Common Core. In aligned classrooms, what is being taught and what students are being asked to do: are in line with the standards and curriculum; are “on time” and “on target” with the scope and sequence of the course of study; and provide students opportunities to experience high stakes assessment methodologies among other assessment approaches (Connell & Broom, 2004). Rigor, as defined in the EAR Protocol, reflects the common sense notion that students will only achieve at high levels if that level of work is expected and inspected for all students. In rigorous classrooms, the learning materials and instructional strategies being used challenge and encourage all students to produce work or respond at or above grade level. All students are required to demonstrate mastery at these levels and have the opportunity for re-teaching as needed (Connell & Broom, 2004). Students’ Self-Reported Motivation Variables As noted above, one aim of the current studies was to evaluate the predictive validity of Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 196 the EAR Classroom Visit Protocol. In addition to testing whether the tool would predict standardized achievement test scores, when the previous year’s scores were controlled, we examined whether student reports of their academic engagement and perceived academic competence across their school experiences would add to this prediction. In earlier studies, both perceived competence and engagement were shown to predict change in students’ academic performance (Connell, Spencer, & Aber, 1994; Gambone, Klem, Summers, Akey, & Sipe, 2004). In the current study, the EAR Protocol was used to assess vital signs of high-quality instruction and to predict performance on standardized achievement tests. We further examined whether students’ own perceptions of their academic engagement and perceived competence would contribute to change in these test scores. This is in line with the Bill & Melinda Gates Foundation (2012) report, which encouraged schools to use both classroom observation and student reports of their learning experiences to get the fullest picture of instruction in their school, arguing that neither source of information alone is sufficient. Study 1 Method Description of the EAR Classroom Visit Protocol The EAR Classroom Visit Protocol is an observational tool completed by trained observers during and after a 20-minute observation. The original tool includes 15 items, but only ten are used to calculate final scores. Those ten items appear in Table 1. Typically teachers receive multiple 20-minute observations across the school year to gain a full picture of instruction in their classroom(s). The observers must be experienced educators, such as school administrators, teachers, technical-assistance providers, or researchers with past classroom experience. All observers must be trained in use of the protocol. Data are uploaded to a central server that provides reports at different levels (e.g., teacher, department, grade, school) for use in Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 197 professional development and reflective conversations with teachers, as well as performance management around instructional leadership and support at the district and school levels. Engagement. Classroom visitors use two items to assess engagement: one measures the percentage of students who are on-task and the second measures the percentage of on-task students who are actively and intellectually engaged in the work. For both items, trained observes walk around the classroom, inspecting student work, watching students’ facial expressions, and listening to student conversations and student responses to teacher questions. Additionally, classroom visitors have brief conversations with students. The conversations, which take place only if they will not disrupt the class, include questions like “What does your teacher expect you to learn by doing this work?” and “Why do you think the work you are doing is important?” The questions are open-ended and require students to explain what they are learning, thus preventing students from simply providing socially desirable answers. Student responses are used along with the observations to estimate the percentage who are actively engaged. Alignment. Observers make four binary judgments about whether the learning materials, learning activities, expectations for student work, and students’ class work reflect relevant federal, state, and local standards, designated curricula, and high stakes assessments. When available, observers are provided with the pacing guide for the course being observed to aid in determining the extent to which the course is covering the required material and if instruction is “on-time” and “on-target” for their district. Rigor. This construct is assessed with four judgments (three binary, one percentage) that relate to the cognitive level of the material, the student work expected, and the extent to which students are required and supported to demonstrate mastery of the content. Items concern Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 198 whether learning materials and student work products are appropriately challenging, whether students are expected to meet/surpass relevant standards, and whether they have an opportunity to demonstrate proficiency and are supported to do so. Observers are instructed to consider the level of thinking and performing required by the learning activities, as defined in Bloom’s taxonomy (Bloom, Englehart, Furst, Hill & Krathwohl, 1956). Credit is given only when the activities are predominately intermediate and include some advanced level work. EAR Data Collection Study 1 took place in four high schools from a single district in a southwestern state during the 2008-09 school year. The schools in this study were relatively large, with an average student enrollment over 1,500. Over 40% of the students enrolled in these schools were Latino/Hispanic and a roughly equal percentage was non-Hispanic, White. About one-third of the students were from low-income families, as evidenced by their eligibility for free/reduced lunch. The EAR Protocol data were collected for multiple purposes, including to support professional development in the district, to establish a scoring system with continuous variables that could be used for research, to investigate the tool’s inter-rater reliability, and assess the tools validity for predicting standardized test scores in math and English/Language Arts (ELA).1 Data were collected by three groups of individuals: (1) IRRE consultants (n = 9) who had used the tool extensively over several years and were also providing instructional support in these schools, (2) former educators hired expressly for this project who had deep knowledge of high school classroom practices but no direct connection with these schools (n = 3), and (3) school leaders such as principals, assistant principals, and instructional coaches from the participating schools (n = 21).2 The former educators and school leaders were trained by IRRE, using their standard training procedures that consist of (1) two full days of group instruction, Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 199 including several classroom visits followed by scoring discussions, (2) a two-to-three-week window during which those participating in the training make practice visits as teams to calibrate their scoring, and (3) two additional full days of group instruction focusing on calibration and use of the data for instructional improvement. In all, 2,171 EAR Protocols were collected during the 2008-09 school year; 416 were collected by IRRE consultants, 347 by the former educators, and 1,408 by school leaders. Table 1 presents descriptive statistics for the 10 individual indictors across the 2,171 observations. These observations, which were made in all types of courses, including math, ELA, science, history, art, and special education, were used for the Confirmatory Factor Analysis (CFA) discussed below. Only data from 10th-grade math and ELA classes were used in the predictive validity analyses because the state only administers standardized tests in high school in those two subject areas. Because the math and ELA exams were administered early in the spring term, we used only data from fall observations of math and ELA classes for these validity analyses. Thus, the validity analyses included 125 observations of 33 different ELA teachers and 102 observations of 25 different math teachers. Student Questionnaires In Study 1, in the fall of 2008, all 10th-grade students at the four high schools were asked to respond to an on-line questionnaire, administered during the school day. Items on the questionnaire had been used extensively by IRRE in their past work with schools. Two scales from that questionnaire are of particular interest in this study: self-reported engagement in school and perceived academic competence. The measure of self-reported engagement in school asked students to respond to six items, using a four-point scale ranging from ‘not at all true’ to ‘very true.’ Sample items include: Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 200 ‘It is important to me to do the best I can in school’ and ‘I pay attention in class.’ Cronbach’s alpha on this scale in the current sample was .70 (n = 1,144, mean = 3.04, SD = 0.48). The measure of perceived academic competence includes six items, using the same four-point response scale. Sample items include: ‘I feel confident in my ability to learn at school’ and ‘I am capable of learning the material we are being taught at school.’ Cronbach’s alpha on this scale in this sample was .76 (n = 1,144, mean = 3.23, SD = 0.51). Analytic Plan Inter-rater agreement. In the past, IRRE encouraged school districts to work as teams to establish a common understanding of the tool that could be used within their district to improve instruction. For research purposes, however, it was important to establish that different types of classroom visitors were using a common understanding of the tool and that this common understanding could be used to predict changes in student test scores. For this reason, the first goal of the Study 1 analyses was to establish the tool’s inter-rater agreement across the different types of users, applying the continuous scoring system. Predictive validity. The second goal of the Study 1 analyses was to investigate the relationship between classroom instruction, as measured using the EAR Classroom Visit Protocol, and standardized test scores, above and beyond previous test scores. For these tests, we used Hierarchical Linear Modeling (HLM; Raudenbush & Bryk, 2002) to appropriately model these data in which students were nested within sections (i.e., specific period of a specific teacher), and sections were nested within teachers. Study 1 Results and Discussion Scoring the EAR Classroom Visit Protocol IRRE has used the EAR Protocol extensively in its instructional improvement efforts. In Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 201 order to give straightforward feedback to educators, IRRE has used thresholds to indicate whether a classroom, department, grade or school is at an acceptable level on each EAR vital sign. These thresholds were developed through extensive deliberation by the IRRE instructional experts, with input from districts and the existing literature on instruction. IRRE has found this threshold approach effective for communicating easily with schools and districts about the quality of their classroom instruction. In order to use the EAR Protocol for research purposes, however, it is preferable to have continuous variables that express the full range of variance on these constructs and maximize power when analyzing associations between the quality of instruction and student outcomes. To that end, we conducted a series of Confirmatory Factor Analysis (CFA) using the 2,171 observations in Study 1. Based on those CFAs, three scores were created. Engagement was the mean of proportion of students on task and the proportion of students actively engaged in the work. Alignment was the proportion of positive answers on the four dichotomous alignment indicators. Rigor was the mean of the four rigor indicators, after standardizing them using population estimates derived from 1,551 observations conducted by the IRRE intervention team in 19 high schools in six school districts across the country between 2004 and 2010. Across the Study 1 observations, the three variables were correlated, but not so highly as to be measuring the same construct (E and A r = .32, p < 000; E and R r = .44, p < 000; A and R r = .63, p < 000). Further we tested whether a model with a single underlying construct would have a better fit with the data than the model with the three latent constructs, but found that the fit from that model was unacceptable. Thus a single variable would not satisfactorily represent instructional quality, so we proceeded using the model with the three latent construct. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 202 Inter-Rater Agreement In Study 1, across the 2008-2009 school year, there were 388 cases coded simultaneously by a pair of observers. Inter-rater reliability was calculated as the intraclass correlation (one-way random, absolute agreement) between pairs of scores. After calculating continuous scores on Engagement, Alignment, and Rigor using the scoring method described above, the single measures intraclass correlation was .76 for engagement, .71 for alignment, and .65 for rigor. Of the 388 pairs, there were 238 where the pair was made up of an IRRE consultant and a school leader. Looking just at this sub-set, the single measures intraclass correlations remained unchanged: .76 for engagement, .71 for alignment, and .65 for rigor. There were 107 observations where the pair was made up of an IRRE consultant and one of the external observers from the research team (i.e., the former educators). Looking just at this subset, the correlation was .72 for engagement, .62 for alignment, and .67 for rigor. Thus, all ICCs fall within the “good” (.60 to .74) or “excellent” (.75 to 1.0) range (Cicchetti, 1994). Predictive Validity Cases available for validity analyses. The subset of observations that were collected for Study 1 in the fall of 2008 in 10th-grade math and ELA classes was used to test the predictive validity of the EAR Protocol. We focused on math and ELA because those are the subjects in which standardized test scores were available. As noted, we used fall observations because the tests were administered fairly early in the spring term. In math classes, 125 observations were conducted of 33 teachers, teaching 57 sections (i.e., specific period of a specific teacher). On average, each math teacher was observed 3.68 times (range = 1 to 8, SD = 2.18). In ELA classes, 102 observations were conducted of 25 teachers teaching 50 sections. On average, each ELA teacher was observed 4.08 times (range = 1 to 7, SD = 1.89). After calculating continuous E, A, Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 203 and R scores for each observation using the scoring described above, a mean E, A, and R score was calculated for each section and each teacher when there was more than one observation for a particular section or teacher. Math teachers were relatively diverse (43% female; 75% White, 10% Latino, 10% MultiRacial). They had an average of 5.09 years of teaching experience (SD = 1.23) and had been in their current positions 2.50 years (SD = 1.06) on average. The ELA teachers were less diverse: 81% female and 93% White. ELA teachers had an average of 4.94 years of teaching experience (SD = 1.18) and had been in their current positions 2.38 years (SD = 1.29) on average. The standardized test serving as the 10th-grade outcome for this study was the state’s high school exit exam. Students in this state began taking a high school exit exam in the spring of the 10th-grade year in ELA and math, repeating it each semester until they passed. For this study only scores from the first administration of this exam were used. Additionally, this district administers a nationally normed standardized assessment called the Terra Nova in math and ELA to all 9th-graders. The district provided 9th-grade Terra Nova and 10th-grade high school exit exam scores for students who were in 10th-grade in the 2008-2009 school year. There were 634 students available for the math analyses, meaning that they had both 9thand 10th-grade math scores and their math section had been observed. There were 993 students available for the ELA analyses, meaning that they had both 9th-grade and 10th-grade ELA scores and their ELA section had been observed. The sample size dropped slightly when student questionnaires were added to the models (n = 621 for math; 975 for ELA) due to some missing student questionnaires. Table 2 presents demographic information describing the student sample. Multi-level model description. We used Hierarchical Linear Modeling (HLM; Raudenbush & Bryk, 2002) to predict 10th-grade exit exam scores in math or ELA, controlling Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 204 for the previous year’s score in the same subject, as a function of observed math or ELA classroom engagement (E), alignment (A), or rigor (R). Specifically, we built 3-level models in which between-student differences within sections, as measured by students’ previous year’s test scores, were modeled at Level 1; within teacher (between section) variation was modeled at Level 2; and between teacher variation was modeled at level 3. Math and ELA outcomes were modeled separately, using observed E, A, or R in fall math classes in the math models and observed E, A, or R in fall ELA classes in the ELA models. The association between 10th- and 9th-grade test scores were allowed to vary across teachers (as a level 3 random effect), and average 10th-grade test scores were allowed to vary across classes (as a Level 2 random effect) and to vary across teachers (as a level 3 random effect). Both the predictor and outcome variables were standardized prior to running these analyses, essentially converting the HLM coefficients into standardized coefficients. Secondary models including students’ self-reported engagement in school and perceived academic competence were run to investigate the predictive role of these individual student characteristics beyond that of classroom-level observations. Predicting math scores from EAR observations. As seen in Table 3, 9th-grade test scores served as a strong predictor of 10th-grade test scores in all of the models. A 9th-grade test score one standard deviation above the mean predicted a 10th-grade test score .61-.62 standard deviations higher on math and .73-.74 standard deviations higher on ELA, suggesting a strong component of student ability and/or past instruction in 10th-grade scores. After controlling for these effects, when observations of Engagement, Alignment, or Rigor were separately allowed to predict residual change in standardized test scores, the results offered support for E, A, and R as predictors of student achievement. As seen in the first set of columns for predicting 10th-grade Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 205 math scores (on the three rows labeled “Differences Across Teachers in…”), for every standard deviation higher than average a fall math teacher was rated on engagement, the model predicted his or her students scoring an average of .17 standard deviations higher on their 10th-grade math tests, even after controlling for their 9th-grade math scores and variations of engagement among the teacher’s different sections. Similarly, for observed alignment, a +1 standard deviation difference for a teacher predicted a statistically significant +.16 standard deviation difference in students’ scores, and for rigor a +1 standard deviation difference for a teacher predicted a marginally significant +.14 standard deviation difference in students’ scores. Thus, under the stringent conditions of predicting standardized math test scores in 10th-grade after controlling for standardized scores one-year earlier, we found evidence that each of the teaching variables of observed engagement, alignment, and rigor explained some variance. Teachers whose instruction was more engaging, aligned, and rigorous had students who showed greater gains on standardized tests. These results further suggest that the dimensions assessed by the EAR tool capture aspects of classroom dynamics and effective instruction that lead to measurable realworld gains in learning, underscoring the utility of this instrument. Predicting ELA scores from EAR classroom observations. Looking at the models predicting 10th-grade ELA scores (the right portion of Table 3), we see there was again evidence that the three dimensions of instructional quality were linked to student outcomes, but the results were somewhat weaker than they were for math. For every standard deviation higher than average a fall ELA teacher was rated on engagement or on alignment, the model predicted his or her students scoring an average of .06 standard deviations higher on their 10th-grade standardized ELA tests (both effects marginally significant), even after controlling for 9th-grade ELA score and variations among the teacher’s different sections. For observed rigor, a +1 standard deviation Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 206 difference in teachers predicted a significant +.10 standard deviation difference in ELA scores. Thus, we see evidence that observations of teachers’ instructional quality predict students’ improvement in ELA, although the size of the effects was not as large for ELA as for math. In the six models without student questionnaires presented in Table 3, the betweensection variation on E, A, and R within a teacher (Level 2) were non-significant (see the three lines on the table labeled “Within Teacher Variation in…”). This indicates that the engagement, alignment, and rigor of a particular section was not predictive of student outcomes, above and beyond that section’s teacher’s overall level, suggesting that the common experiences teachers are creating across different sections of their courses predict student growth in learning more than differences they create between these different sections. Predicting math scores from EAR observations and student motivation variables. The second and fourth set of columns in Table 3 summarize the HLM results after the student reports of engagement in school and perceived competence were included to assess their unique contribution to student learning beyond the quality of observed instruction. The sample for the math analyses includes 621 10th-graders who responded to the questionnaire, were enrolled in a math section where EAR protocols were collected, and had scores on both the 9th-and 10th-grade standardized achievement tests in math. In the three models predicting 10th-grade math scores (second set of columns for math), students’ self-reported engagement in school was a significant predictor of their math test scores, while perceived competence was not. Thus, higher student self-report of academic engagement was associated with slightly higher 10th-grade math scores. After controlling for student reports of engagement and competence, observed E, A, and R in math classes all remained significant predictors of 10th-grade math scores. Specifically, higher observed levels of a teacher’s E, A, or R predicted significantly higher average 10th-grade math Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 207 scores in his or her students ranging from +.15 to +.21 standard deviations. These results indicate that both the observed quality of students’ instructional experience and their generalized sense of how engaging they find their work in school uniquely contribute to their performance on standardized assessments. Predicting ELA scores from EAR observations and student motivation variables. The final set of columns in Table 3 includes the 975 students with questionnaire and ELA data. These data suggest that higher student reports of their own engagement in school were also associated with slightly higher 10th-grade ELA scores even after controlling for 9th-grade test scores. Additionally, perceived academic competence was a marginal predictor of 10th-grade ELA scores. Of the three observed vital signs, only observed engagement in the ELA classrooms was marginally predictive of student ELA achievement in these models, suggesting that the students’ experience of engagement may have more pervasive effects across subject matter areas. Study 2 Method The second study sought to replicate the predictive validity findings of Study 1 in a larger sample of teachers and students, this one taken from more economically disadvantaged schools. Ensuring that the findings hold across different settings, with different student demographics and different standardized testing systems, is important for conclusions about the tool’s validity. EAR Data Collection Study 2 took place in eight high schools in two districts in a single western state. The schools in Study 2 were also large, with an average student enrollment of 1,880. Over half the students in these schools were Latino/Hispanic, with much smaller groups of White, African American, and Asian students (less than 20% per group). Over two-thirds of the students across these eight schools were eligible for free/reduced price lunch. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 208 Data for these analyses were collected as part of an evaluation of an intervention designed to improve instruction in 9th- and 10th-grade ELA and math. Four of the participating high schools had been randomly assigned to receive the intervention and four were serving as controls; however the two treatment conditions were combined for the current analyses because the EAR Protocol observations came from the baseline wave of data collection, just as the intervention was beginning. The observers (n = 3) in this study were former educators or former IRRE consultants with no links to the participating teachers, schools, or districts. All three had also collected data as part of Study 1. These three individuals used the EAR Protocol for 261 observations in the fall of 2009 (229 alone, and 17 in pairs); all observations were made in 9thand 10th-grade ELA, Algebra 1, and Geometry classes. Student Questionnaires In the fall of 2009, 9th- and 10th-graders in Study 2 were asked to respond to the same questionnaire as had been used in Study 1. The questionnaire was administered on-line, during the school day. Cronbach’s alpha for the six-item self-reported student engagement scale in Study 2 was .68 (n = 2,601, M = 3.10, SD = .65). Cronbach’s alpha for the six item measure of perceived academic competence in Study 2 was .72 (n = 2,553, M = 3.31, SD = .22). Study 2 Results and Discussion Inter-Rater Agreement The three individuals who collected data for Study 2 worked on the larger project for its full three years of data collection. Across that project, they participated in 249 observations for which two individuals were present. Inter-rater reliability was calculated as the intraclass correlation (one-way random, absolute agreement) between pairs of scores across the entire project, in order to have enough cases for meaningful analysis of reliability. After calculating Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 209 continuous scores, the single measures intraclass correlation was .72 for engagement, .65 for alignment, and .67 for rigor. Cases Available for Analyses In math, observations were conducted of 63 teachers teaching 111 sections. On average, each math teacher was observed 1.95 times (range = 1 to 3, SD = 0.63). In ELA, observations were conducted of 64 teachers teaching 104 sections. On average, each ELA teacher was observed 1.86 times (range = 1 to 3, SD = 0.53). As in Study 1, after calculating continuous E, A, and R scores for each observation, section- and teacher-level means were created. The state in which Study 2 took place administers standardized tests in math and ELA in 8th-, 9th-, and 10th-grade. Students eligible for this study were those who were in either 9th- or 10th-grade in 2009-10. Test scores from one year previous (8th- or 9th-grade) were used as control variables. The ELA test that each student took depended on his or grade-level, with 8th-graders taking the 8th-grade ELA test, 9th-graders taking the 9th-grade ELA test, etc. Scale scores were provided by the districts and were standardized within grade and district for the current analyses. The math test taken each year depended on the courses in which the students were enrolled. For instance, students enrolled in Algebra 1 took the Algebra 1 test, regardless of their grade. However, the scores provided by the districts were scaled similarly for all math tests, as evidenced by the equivalence of cut-points used across tests to determine different levels of proficiency. Therefore, we elected to standardize the math test scores within grade (8th, 9th, or 10th) and school district, but not within test. There were 1,644 students available for the math analyses, meaning that they had two years of math scores and their math section was observed. There were 2,262 students available for the ELA analyses, meaning that they had two years of ELA scores and their ELA section was Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 210 observed. As seen in Table 2, the student sample for Study 2 was highly racially diverse, with a high level of eligibility for free or reduced price lunch. As with Study 1, due to missing questionnaire data, the sample sizes dropped slightly when student questionnaires were added to the models. The math analyses that included student questionnaires had sample sizes of 61 teachers, 109 sections, and 1,360 students. The ELA models that included student questionnaire responses, the sample sizes were 62 teachers, 100 sections, and 1,956 students. Multi-Level Model Results HLM analyses parallel to those conducted for Study 1 were conducted using the Study 2 data, with 9th- or 10th-grade test scores as the outcome, controlling for test scores from one year earlier. As seen in Table 4, previous year’s test scores served as the strongest predictors of current math and ELA scores. After controlling for those effects, higher levels of observed engagement in both math and ELA (modeled at level 3 as differences between teachers) were predictive of higher test scores in both math and ELA. These results continued to offer support for the utility of the EAR tool. However, in contrast to the findings of Study 1, between-teacher differences in alignment and rigor did not serve as significant predictors of test scores in Study 2. The second and fourth sets of columns in Table 4 present the models that also include self-reported engagement and perceived academic competence. Consistent with Study 1, studentreported engagement in school was a significant, positive predictor of math achievement, again suggesting that engaged students tend to show greater gains in math knowledge. In contrast to Study 1, perceived academic competence was marginally, negatively, associated with math achievement in Study 2, predicting slightly lower gains in math knowledge for students reporting high levels of competence in this sample. After controlling for those effects, observed engagement continued to predict gains in student achievement for math and ELA, but alignment Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 211 and rigor did not. Taken as a set, these results suggest that classroom engagement is the most critical element of success. General Discussion Summary of Findings Findings from these two studies indicate that school and district personnel, as well as educators from outside the district, can learn to reliably use the Engagement, Alignment, and Rigor Classroom Visit Protocol. Further, across the two studies, observed engagement was significantly associated with higher test scores in six out of eight analyses and marginally (p < .10) associated in the other two, after controlling for the previous year’s test scores. Alignment and rigor were significantly or marginally associated with higher test scores, again after controlling the previous year’s test scores, in both math and ELA in Study 1, but were not predictive of test scores in Study 2. When student self-reports of their engagement in school and perceived academic competence were added to the models, both observed student engagement and students’ reports of their generalized engagement in school predicted academic performance in math in both studies. In ELA, observed engagement predicted achievement test scores in both studies, but students’ reports of school engagement was a significant predictor of achievement only in Study 1. For the most part, students’ perceived academic competence was unrelated to their test scores. Central Role of Engagement These studies provide evidence for the importance of student engagement in the classroom, as well as for the validity of EAR Classroom Visit Protocol for measuring the extent to which teachers are engaging students during class. The importance of student engagement for academic success is well accepted in the education and psychological literature (Appleton, Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 212 Christenson, Kim, & Reschly, 2006; Finn & Rock, 1997; Klem & Connell, 2004); however, an agreed upon way to measure the construct has been lacking. The EAR Protocol focuses on observable engagement in the classroom, across all students in the classroom. It assesses the extent to which students are paying attention, doing the work requested, and showing signs of cognitive involvement in the task at hand. It is measured through watching student behavior, facial expressions, and, when possible, conversations with the students. Clearly, if students are not engaged in the content in this way, it would be difficult for them to profit fully from the curriculum. It is important that the EAR Protocol’s assessment of classroom-level engagement significantly predicts student achievement, above and beyond past achievement, because it is a relatively quick and simple measure of a complex and fundamental construct, thus adding an important tool for educators and researchers working to assess instructional quality. The fact that observed classroom-level engagement continues to predict students’ test scores when controlling for the individual student’s self-reported engagement in school shows that these are two somewhat distinct experiences of engagement. Engagement as assessed in the EAR Protocol captures students’ behavior, affect, and cognition in a particular classroom which signify that the teacher is teaching in an engaging way. It is thus an indicator of instructional quality in the classroom rather than a characteristic of a student. The student’s self-report of general academic engagement, on the other hand, is a malleable characteristic of the individual student (Appleton et al., 2006) and has been shown—using measures similar to the one used for these studies—to predict outcomes such as attendance and school drop-out, as well as standardized test scores (Appleton et al., 2006, Finn & Rock, 1997, Klem & Connell, 2004). Of course, these two types of engagement are related but perhaps in somewhat complex ways. For example, consistent experiences of engaging instruction should contribute to student reports of Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 213 being generally engaged in school; but high school students who generally find school highly engaging might disengage in classes where instruction is not engaging. The current studies, as well as the work by the Bill & Melinda Gates Foundation (2012), indicate that measuring both types of engagement is important as they account for different variance in student outcomes. Alignment and Rigor Clearly, it is important for the instruction provided to map onto the standardized tests if we expect that instruction to make meaningful differences in students’ scores on those tests. Likewise rigor – defined as a combination of appropriate difficulty and continuous checking to ensure the students are mastering the content—is a common sense requirement for improving student outcomes. The evidence for the importance of these two predictors; however, is inconsistent. The differences in findings regarding alignment and rigor across the two studies may be due to differences in the districts in which these studies took place. The district in which Study 1 took place had very well defined pacing guides for every course. District administrators provided copies of the pacing guides to all individuals collecting EAR Protocol data, so they could quickly and easily determine if the course was on pace and teaching the district-supported content. Further, the district had been careful to base those pacing guides on the state grade level performance standards and the standardized tests the students were expected to pass, ensuring that the correct material was covered prior to the exam, at the appropriate level of rigor. Neither of the two districts that participated in Study 2 had such clear and specific pacing guides. The data collection team sought to obtain pacing guides for each course that was to be observed and those were provided to the Study 2 data collectors. However, not all courses had guides and sometimes the guides were fairly vague. Further, those districts had not focused as specifically on linking the pacing and curriculum guides to the state tests or Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 214 state grade level performance standards. Additionally, a difference in the data collectors’ familiarity with the states might partially explain the differences in the two studies. For Study 1, all of the school leaders and former educators hired for the project were from within that state. The only individuals from out of the state were the IRRE consultants, who conducted less than 20% of the total observations. For Study 2, the three data collectors had never been educators in the state in which the data were collected. Thus, it is possible that less knowledge of what alignment and rigor would actually entail in that state led to lower accuracy in the Study 2 ratings. These data provide evidence that when the expected curriculum and pacing is clearly articulated and maps well onto the tests, and when the individuals making the ratings are highly familiar with the state’s expectations, alignment and rigor can predict student test scores. This may be less true when the pacing guides either lack detail or are not specifically mapped onto the tests or when the raters are less familiar with the state standards. Finally, it may just be that having students engaged in classroom activities is most important for students’ learning and achievement, as various motivational theorists have suggested or implied (e.g., Pintrich & Schunk, 1996; Ryan & Deci, 2009; Wigfield & Eccles, 2002). There was evidence that engagement shared variance with alignment and rigor, indicating that teachers who are engaging in their instruction tend also to instruct in ways that are aligned and rigorous, so those factors may be present to support achievement, although there was no evidence of the importance of alignment and rigor in Study 2. Conclusions The EAR Protocol provides a straightforward, quick means for school administrators, instructional support personnel, and researchers to assess the quality of classroom instruction in a Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 215 way that is reliable and linked to student test scores. A single EAR Protocol visit takes only 20 minutes and teachers were observed, on average, fewer than two times during the term in Study 2. With that little amount of information, Engagement was a valid predictor of student’s standardized test scores, controlling for the previous year’s score, in both math and ELA. This demonstrates both the importance of engagement as a construct and the ability of this tool to measure it in a way that is meaningful. In addition, there is evidence that if classes have defined pacing guides and curricula that are explicit and clearly linked to the state tests, then alignment and rigor may also be useful constructs for measuring instructional quality. School districts and researchers need reliable and valid ways to assess the quality of instruction in the classroom in order to appropriately target professional development and monitor change. For such systems to be useful, they need to be feasible within the workdays of school personnel, provide immediate and actionable feedback, and give schools a common language with which to discuss high-quality instruction. The EAR Protocol meets these criteria and should be considered as a means of quickly and reliably gauging instructional vital signs. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 216 References Appleton, J. J., Christenson, S. L., Kim, D., & Reschly, A. L. (2006). Measuring cognitive and psychological engagement: Validation of the Student Engagement Instrument. Journal of School Psychology, 44, 427– 445. doi:10.1016/j.jsp.2006.04.002 Benware, C. & Deci, E. L. (1984). Quality of learning with an active versus passive motivational set. American Educational Research Journal, 21, 755-765. doi: 10.2307/1162999 Bill & Melinda Gates Foundation (2012). Gathering feedback for teaching. Combining highquality observations with student surveys and achievement gains. Retrieved from: http://www.metproject.org/downloads/MET_Gathering_Feedback_Research_Paper.pdf Black, A. E., & Deci, E. L. (2000). The effects of student self-regulation and instructor autonomy support on learning in a college-level natural science course: A self-determination theory perspective. Science Education, 84, 740-756. doi: 10.1002/1098-237X Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl,D. R. (1956). Taxonomy of educational objectives: Handbook Cognitive domain. New York, NY. David McKay Broom, J. (2012, November). Building system capacity to evaluate and improve teaching quality: A technical assistance providers perspective. In J. P. Connell (Chair), A developmental approach to improving teaching quality: Integrating teacher evaluation and instructional improvement. Symposium conducted at the meeting of the Association for Public Policy Analysis & Management, Baltimore MD. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in Psychology. Psychological Assessment, 6(4), 284290. doi: 10.1037/1040-3590.6.4.284 Connell, J. P. & Broom, J. (2004). The toughest nut to crack: First Things First’s (FTF) Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 217 Approach to improving teaching and learning. Retrieved from IRRE website: http://www.irre.org/sites/default/files/publication_pdfs/The%20Toughest%20Nut%20to%20 Crack.pdf Connell, J. P., Spencer, M. B., & J. L. Aber (1994). Educational risk and resilience in AfricanAmerican youth: Context, self, action, and outcomes in school. Child Development, 65 (2), 493-506. doi: 10.2307/1131398 Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass. Danner, F. W., & Lonky, E. (1981). A cognitive-developmental approach to the effects of rewards on intrinsic motivation. Child Development, 52, 1043-1052. Deci, E. L. (1975). Intrinsic motivation. New York: Plenum. Deci, E. L., Schwartz, A. J., Sheinman, L., & Ryan, R. M. (1981). An instrument to assess adults' orientations toward control versus autonomy with children: Reflections on intrinsic motivation and perceived competence. Journal of Educational Psychology, 73, 642- 650. doi: 10.1037/0022-0663.73.5.642 Downey, C. J., Steffy, B. E., English, F. W., Frase, L. E., & Poston, W. K. (2004). The threeminute classroom walkthrough: Changing school supervisory practice one teacher at a time. Thousand Oaks, CA: Corwin Press. Finn, J. D., & Rock, D. A. (1997). Academic success among students at risk for school failure. Journal of Applied Psychology, 82, 221–234. doi: 10.2307/1170412 Gambone, M.A., Klem, A. M., Summers, J. A., Akey, T. A., & Sipe, C. L. (2004). Turning the tide: The achievements of the First Things First education reform in the Kansas City, Kansas Public School District. Philadelphia: Youth Development Strategies, Inc. Grolnick, W. S., & Ryan, R. M. (1987). Autonomy in children's learning: An experimental and Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 218 individual difference investigation. Journal of Personality and Social Psychology, 52, 890898. doi: 10.1037/0022-3514.52.5.890 Grolnick, W. S., & Ryan, R. M. (1989). Parent styles associated with children's self-regulation and competence in school. Journal of Educational Psychology, 81, 143-154. doi: 10.1037/0022-0663.81.2.143 Harter, S. (1978). Pleasure derived from optimal challenge and the effects of extrinsic rewards on children's difficulty level choices. Child Development, 49, 788-799. Herman, J. L., Webb, N. M., & Zuniga, S. A. (2007). Measurement issues in the alignment of standards and assessments: A Case study. Applied Measurement in Education, 20,101-126. doi: 10.1207/s15324818ame2001_6 Klein, A. (2012). Obama uses funding, executive muscle to make often-divisive agenda a reality. Education Week, 31 (35), 1-28. Klem, A. M., & Connell, J. P. (2004). Relationships matter: Linking teacher support to student engagement and achievement. Journal of School Health, 74(7), 262– 273. doi: 10.1111/j.1746-1561.2004.tb08283.x McGraw, K. O. (1978). The detrimental effects of reward on performance: A literature review and a prediction model. In M. R. Lepper & D. Greene (Eds.), The hidden costs of reward (pp. 33-60). Hillsdale, NJ: Erlbaum. National Center for Education Statistics, Institute of Education Sciences. (2009). Highlights from PISA 2009: Performance of U.S. 15-year-old students in reading, mathematics, and science literacy in an international context. Retrieved from: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2011004 National Center for Education Statistics, Institute of Education Sciences. (2011). Highlights Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 219 from TIMSS 2011: Mathematics and science achievement of U.S. fourth- and eighth-grade students in an international context. Retrieved from: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2013009 National Research Council and the Institute of Medicine. (2004). Engaging Schools: Fostering High School Students’ Motivation to Learn. Committee on Increasing High School Students’ Engagement and Motivation to Learn. Board on children, Youth, and Families, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Patrick, B. C. (1995). College students' intrinsic motivation as a function of instructor enthusiasm. Unpublished Doctoral Dissertation, University of Rochester. Pintrich, P. R., & Schunk, D. H. (1996). Motivation in education: Theory, research, and applications. Englewood Cliffs, NJ: Prentice-Hall. Protheroe, N. (2009). Using classroom walkthroughs to improve instruction. Principal, March/April, 2009. Raudenbush, S. W. & Bryk, A. S. (2004). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage. Rothman, R. (2012). Laying a common foundation for success. Phi Delta Kappan, 94 (3), 57-61. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 68-78. doi:10.1037/0003-066X.55.1.68 Ryan, R. M., & Deci, E. L. (2009). Promoting self-determined school engagement: Motivation, learning, and well-being. In K. R. Wentzel & A. Wigfield (Eds.), Handbook on motivation at school, (pp. 171-196). New York: Routledge. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 220 Webb, N. M., Herman, J. L., & Webb, N. L. (2007). Alignment of mathematics’ state-level standards and assessments: The role of reviewer agreement. Educational Measurement: Issues and Practice, 26, 17-29. doi: 10.1111/j.1745-3992.2007.00091.x Wigfield, A., & Eccles, J. S. (2002). Students’ motivation during the middle school years. In J. Aronson (Ed.), Improving academic achievement: Impact of psychological factors on education (pp. 159-184). New York: Academic Press. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 221 Notes 1 Throughout this manuscript, we use the term English/Language Arts (ELA) to describe courses and exams focused on comprehension, reading, and writing in English. The courses included typical high-school English courses, as well as literacy courses focused on improving expository reading and writing. The exam names were ‘Reading’ in Study 1 and ‘English-Language Arts’ in Study 2. 2 School district employees were trained by IRRE to collect EAR Classroom Visit Protocol for their own instructional improvement purposes, as well as for this study. The district decided which individuals would collect data for their purposes. Eight individuals (in addition to these 21) who worked for the district conducted EAR Visits during the year but either did not participate in any inter-rater reliability visits (n = 2) or did not appear to understand the tool based on preliminary analyses using thresholds set by IRRE (n = 6). Thus, their data were used for the districts’ internal purposes only, and have been excluded entirely from this research. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 222 Table 1 Study 1: EAR Protocol Descriptive Statistics (n = 2,171) Item Engagement E1 % of students on task E2 % of students actively engaged in the work requested. Product of E1 * E2 1 Alignment A1 The learning materials were(1) / were not(0) aligned with the pacing guide of this course or grade level curriculum A2 The learning activities were(1) / were not(0) aligned with the scope and sequence of the course according to the course syllabus. A3 The student work expected was(1) / was not(0) aligned with the types of work products expected in state grade level performance standards. A4 Student work did(1) / did not(0) provide exposure to and practice on high stakes assessment methodologies. Rigor R1 The learning materials did(1) / did not(0) present content at an appropriate difficulty level. R2 The student work expected did(1) / did not(0) allow students to demonstrate proficient or higher levels of learning according to state grade level performance standards. R3 Evaluations/grading of student work did(1) / did not(0) reflect state grade level performance standards. R4 % of students required to demonstrate whether or not they had mastered content being taught. 1 Mean SD 77% 63% 53% 21 28 30 .89 .31 .88 .33 .72 .45 .56 .50 .89 .32 .59 .49 .37 .48 35% E2 refers to the proportion of those students who were on task (in E1) who were actively engaged, so E1 and E2 must be multiplied together to be meaningful. Submitted for review to the High School Journal Please do not share or cite. 38 ENGAGEMENT, ALIGNMENT, AND RIGOR 223 Table 2 Demographic Characteristics of Participating Students Study 1 Total n Study 2 1,144 3,144 In both math and ELA analyses 483 778 In math analyses only 151 1,484 In ELA analyses only 510 882 51.0% 49.4% 11.9% 15.6% 4.4% 12.8% Latino/Hispanic 40.9% 58.6% Native American 1.3% 3.3% 41.5% 9.7% 32.6% 75.6% Female Race/Ethnicity African American Asian/Pacific Islander White Eligible for Free/reduced price lunch Note: All data on Table 2 come from student records provided by the school districts, except free/reduced price lunch in Study 1. That district was unable to provide that information due to confidentiality policies. Instead, this information comes from the Common Core of Data and reflects the four schools in their entirety, rather than just the study sample. Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 224 Table 3 Observed Engagement, Alignment, or Rigor Predicting Standardized Test Scores in Study 1 Predicting 10th-Grade MATH Scores Predictor Variables without student motivation variables Predicting 10th-Grade ELA Scores including student motivation variables without student motivation variables including student motivation variables Coeff. SE df Coeff. SE df Coeff. SE df Coeff. SE df -0.09 0.07 31 -0.07 0.06 30 0.02 0.03 23 0.02 0.03 23 0.06* 0.03 615 0.06* 0.02 969 + 0.02 969 Observed Engagement Intercept Student Self-Reported Engagement in School Student Perceived Academic Competence Previous Year’s Test Score Within Teacher Variation in Engagement Differences across Teachers in Engagement 0.03 0.03 615 0.04 0.62*** 0.04 32 0.62*** 0.04 31 0.74*** 0.03 24 0.72*** 0.03 24 -0.03 0.08 55 -0.03 0.07 54 0.04 0.04 48 0.04 0.04 48 + 0.03 23 0.06 + 0.03 23 0.03 23 0.02 0.03 23 0.17* 0.07 31 0.21*** 0.05 30 0.06 -0.11 0.07 31 -0.09 0.07 30 0.01 Student Self-Reported Engagement in School 0.06* 0.03 615 0.05* 0.02 969 Student Perceived Academic Competence 0.03 0.03 615 0.05+ 0.02 969 Observed Alignment Intercept Previous Year’s Test Score Within Teacher Variation in Alignment Differences across Teachers in Alignment 0.62*** 0.04 32 0.63*** 0.04 31 0.73*** 0.03 24 0.72*** 0.03 24 -0.01 0.07 55 -0.01 0.06 54 0.02 0.03 48 0.02 0.03 48 + 0.03 23 0.04 0.03 23 0.03 23 0.02 0.03 23 0.16* 0.08 31 0.18* 0.07 30 0.06 -0.12 0.07 31 -0.09 0.07 30 0.02 0.06* 0.03 615 0.06* 0.02 969 + 0.02 969 Observed Rigor Intercept Student Self-Reported Engagement in School Student Perceived Academic Competence Previous Year’s Test Score Within Teacher Variation in Rigor Differences across Teachers in Rigor 0.02 0.03 615 0.04 0.61*** 0.04 32 0.62*** 0.04 31 0.74*** 0.03 24 0.72*** 0.03 24 -0.05 0.07 55 -0.05 0.06 54 0.04 0.04 48 0.03 0.04 48 + 0.07 31 0.15* 0.07 30 0.10* 0.04 23 0.07 0.05 23 0.14 This table reflects twelve separate analyses: E, A, and R, for math and ELA, with and without student motivational variables from the questionnaires. Both the predictor and outcome variables are standardized, essentially converting the HLM coefficients into standardized coefficients. + p<.10, * p < .05, ** p<.01, ***p<.001 Submitted for review to the High School Journal Please do not share or cite. ENGAGEMENT, ALIGNMENT, AND RIGOR 225 Table 4 Observed Engagement, Alignment, or Rigor Predicting Standardized Test Scores in Study 2 Predictor Variables Predicting 9th/10th-Grade MATH Scores without student with student questionnaires questionnaires Predicting 9th/10th-Grade ELA Scores without student with student questionnaires questionnaires Coeff. SE df Coeff. SE df Coeff. SE df Coeff. SE df -0.12** 0.04 61 -0.11** 0.04 59 0.07** 0.02 62 0.06** 0.02 60 0.07** 0.02 1354 0.02 0.01 1950 + 0.02 1354 -0.01 0.01 1950 Observed Engagement Intercept Student Self-Reported Engagement in School Student Perceived Academic Competence Previous Year’s Test Score -0.04 0.44*** 0.03 62 0.42*** 0.03 60 0.79*** 0.01 63 0.79*** 0.02 61 0.05 0.04 109 0.03 0.04 107 0.02 0.02 102 0.01 0.03 98 0.10** 0.03 61 0.10** 0.04 59 0.07** 0.02 62 0.07** 0.02 60 -0.12** 0.04 61 -0.11* 0.04 59 0.07** 0.02 62 0.06** 0.02 60 Student Self-Reported Engagement in School 0.07** 0.02 1354 0.02 0.01 1950 Student Perceived Academic Competence -0.04+ 0.02 1354 -0.01 0.01 1950 Within Teacher Variation in Engagement Differences across Teachers in Engagement Observed Alignment Intercept Previous Year’s Test Score 0.45*** 0.03 62 0.43*** 0.03 60 0.78*** 0.01 63 0.79*** 0.02 61 Within Teacher Variation in Alignment 0.01 0.04 109 0 0.04 107 0.02 0.02 102 0.01 0.02 98 Differences across Teachers in Alignment -0.02 0.03 61 0.01 0.04 59 -0.01 0.02 62 -0.01 0.02 60 -0.12** 0.04 61 -0.11* 0.04 59 0.07** 0.02 62 0.06** 0.02 60 0.07** 0.02 1354 0.02 0.01 1950 + 0.02 1354 -0.01 0.01 1950 0.03 60 Observed Rigor Intercept Student Self-Reported Engagement in School Student Perceived Academic Competence Previous Year’s Test Score -0.03 0.45*** 0.03 62 0.43*** 0.78*** 0.01 63 0.79*** 0.02 61 + 0.02 102 0.02 0.02 98 0.02 62 0.03 0.02 60 Within Teacher Variation in Rigor 0.02 0.03 109 0.01 0.04 107 0.03 Differences across Teachers in Rigor 0.02 0.04 61 0.05 0.04 59 0.02 Note: This table reflects twelve separate analyses: E, A, and R, for math and ELA, with and without student motivational variables from the questionnaires. Both the predictor and outcome variables are standardized, essentially converting the HLM coefficients into standardized coefficients. + p<.10, * p < .05, ** p<.01, ***p<.001 Submitted for review to the High School Journal Please do not share or cite. 226 Appendix 3: Sample Memorandum of Understanding EVERY CLASSROOM, EVERY DAY (DATE) (District’s name) commits to the participation of four comprehensive high schools in the Institute for Educational Sciences supported evaluation of Every Classroom, Every Day from Summer 20XX through Spring 20XX. Every Classroom, Every Day is an instructional reform initiative based on instructional improvement supports provided by the Institute for Research and Reform in Education (IRRE). Evaluation Activities and Commitments (District name) understands that participation in Every Classroom, Every Day (ECED) will involve the following components for the evaluation: Random assignment: Of the four comprehensive high schools participating in the project, two schools will be randomly selected to receive the instructional improvement supports from IRRE and participate in the research efforts. The other two schools will participate only in the research efforts. The two schools in the research-only group will receive a $10,000 stipend given directly to the school and will be eligible to receive supports two years later from district personnel trained during the first two years of the project and/or by contracting directly with IRRE. It is understood that the process of deciding which schools receive the instructional supports and which are in the research-only group will be entirely random, and the district will have no input in deciding which of the 227 participating schools are in which group. Full participation: Both groups of schools (those receiving the supports and those in the research-only group) will participate in data collection each year from Summer 2009 to Summer 2011, as part of this project. The District will actively encourage all teachers, personnel, and students to participate in the data collection. Data collection: The data collection will include: o Individual Student Records, (e.g., demographic characteristics, course grades, scores on state mandated tests, attendance, disciplinary actions) collected annually, including 8th grade standardized test scores and attendance o Teacher Questionnaire, taking roughly 30-minutes, collected annually o Student Questionnaire, taking roughly 30-minutes, collected twice each year o Master Schedule and Class Rosters, allowing researchers to link students, teachers and courses, using code numbers o Classroom Observations, conducted 3 to 10 times annually in each 9th and 10th grade language arts and math class o Research Site-visit, conducted annually (District name) commits to full participation in the ECED evaluation activities described above as well as to the more detailed data requirements delineated in Attachment 1. Project Implementation Activities and Commitments (District name) understands that participation in Every Classroom, Every Day (ECED) requires commitment to implementing the following components of the project in the two schools receiving instructional supports each year (except where noted): 1. At minimum, a half-time math coach and a half-time English/literacy coach will be devoted to the 9th and 10th grade Language Arts and Math classes in each of the two school receiving the instructional supports.1 These ECED coaches will participate in two days of training during the first year, in addition to 1 The district agrees that the English/literacy and math coaches who work with the schools receiving instructional supports will not be coaches in, or otherwise work with, the two research-only schools during the two-year duration of the ECED project. 228 participating in the activities described in paragraphs 2, 3, 4, and 5. 2. A minimum of three full days of professional development time for 9th and 10th grade English/Language Arts and Math teachers, ECED coaches, and district/campus instructional leaders, with at least 75% of these target participants attending each professional development activity. 3. A minimum of four additional full days of leadership trainings during the first year for instructional leaders (e.g., principals, assistant principals, curriculum specialists, teacher leaders) and ECED coaches to build shared understanding of Engagement, Alignment and Rigor. 4. Participation in four three-day instructional site-visits. District and school leadership and ECED coaches participate in all three days of each visit. During one day of the site visit, teachers of 9th and 10th grade Math and Language Arts will be out of their classrooms for one half-day (e.g., Math teachers in the morning and Language Arts teachers in the afternoon). During the other two days of the site visit, these teachers will also participate in classroom observations, classroom coaching, and individual consultations. 5. Twice monthly, two-hour conference calls with ECED coaches and at least one instructional leader from each school to support their real-time coaching, talk through emerging issues, and help maintain project momentum. 6. Use of the ECED literacy curriculum with all 9th and 10th graders – including ELL and special education students – who are expected to take the state mandated achievement tests.1 7. A double English/Language Arts period for all students in 9th and 10th grades (e.g., one 60- to 100-minute period every day all year or two 45- to 59-minute periods every day all year) with half of this instructional time guided by the ECED literacy curriculum described in paragraph 6. 8. Use of the ECED math benchmarking activities in 9th and 10th grade math classes with all 9th and 10th graders – including ELL and special education students – who are expected to take the state mandated achievement tests. 9. Creation and staffing of benchmark café by the math coach. 10. Use of the Measuring What Matters classroom observation protocol throughout the participating schools. Training in the use of the protocol requires that instructional leaders conduct a minimum of ten classroom visits 1 th th The ECED literacy curriculum will not replace the current 9 and 10 grade English/Language Arts (ELA) curriculum. Instead, it will be used in conjunction with the existing ELA curriculum which is why a th th double block is needed for 9 and 10 grade ELA. 229 of approximately twenty minutes each to build a shared understanding of Engagement, Alignment, and Rigor. Use of the protocol also requires PDAs1, to be purchased by the district, for a minimum of one district leader and five campus leaders per school. Once training is completed, each trained user will conduct at least five 20-minute classroom observations per week using the protocol for the duration of the project. Furthermore, (district name) commits to making the Every Classroom, Every Day activities the focus of the professional development and instructional improvement activities for 9th and 10th grade English/Language Arts and Math teachers and instructional leaders in the schools receiving instructional supports. Specifically, these teachers will not be asked to participate in any additional instructional improvement activities beyond their work with ECED. (District name) also commits to only offering the ECED programs and activities in the schools that are randomly selected to receive the supports through Spring 2011. It is understood that over the two-year course of the project (district name) cannot attempt to replicate or import the ECED activities from the schools receiving the supports to the research-only schools. It is also understood that after the project ends, (district name) can choose to offer these supports to other (district name) schools. Additionally, (district name) agrees none of the participating schools will be part of any other research studies. Financial Considerations and Commitments (DISTRICT NAME) understands that two participating schools will receive two years of training, on-site supports and technical assistance from the IRRE instructional team, along with curricular materials and technology supports, at no cost to (district name). All research costs – including the $10,000 honorarium given to the each of the two research-only schools – and the costs of IRRE staff providing technical assistance to participating schools and districts will be covered by a grant to the University of Rochester from the US Department of Education. (District’s name) commits to participating in ECED for the entire two years of the project (Summer 2009 – Summer 2011) and to covering the following costs associated with implementation of ECED for both years of the project implementation: 1 PDAs cost between $350 and $450 each and generally must be purchased new because of the requirements of the Measuring What Matters software. Specifications are available upon request. 230 Salary/stipends/substitutes, as needed, for the participating teachers for the three professional development days and the four half-day trainings during the sitevisits, as well as for the participation of any teacher leaders in leadership trainings and site visits. PDA devices for at least one district-level and five school-level instructional leaders at each of the two schools receiving the instructional supports. A district-level point person (e.g., assistant superintendent) and one point person at each school receiving the instruction supports to coordinate all Every Classroom, Every Day activities. It is anticipated that these responsibilities will require roughly 15% time for the district-level person, plus 15% time for each school-level person. The ECED coaches (at least one-half FTE for math and one-half FTE for English/literacy at each school receiving instructional supports) for each of the two years of the ECED project. Facilities and food for professional development days, instructional leader and coach trainings, and instructional site-visits. If needed, reallocation of staff to support the additional period of English required for the literacy curriculum in 9th and 10th grades for both years of the project. In-kind time from district and school staff for overseeing teacher and student questionnaire administration, fulfilling data requests associated with data collection activities, and supporting technology associated with Measuring What Matters. (Note: as outlined in the Data Requirements Addendum, some financial assistance will be provided to offset these costs.) Data Confidentiality (District name) understands that the District’s participation in the ECED initiative will be public information. Results of the ECED research will be included in public research presentations and reports; however, results will be reported in a manner that masks the identity of individual schools, teachers, and students in order to maintain confidentiality. Indeed, the University of Rochester research team will have no way to link student and teacher names to the information they provide. For the schools receiving the ECED supports, school-level data, disaggregated student outcome reports (when available), and teacher-level data from the EAR Classroom Visit protocols, will be provided by IRRE as part of the ECED program supports. These data will be made available to school and district administrators and IRRE’s technical assistance staff to monitor student progress, plan professional development and other supports, and give ongoing feedback to teachers, administrators, and IRRE staff on their own practices. For schools in the research-only condition, no individual school-, teacher-, or studentdata will be provided by IRRE. No student- or teacher-level survey responses will be 231 released to anyone, at any school. (name) Superintendent, (district name) (name) Assistant Superintendent for Curriculum & Instruction, (district name) (name) Principal, (school name) (name) Principal, (school name) (name) Principal, (school name) (name) Principal, (school name) James P. Connell, Ph.D. IRRE President Edward Deci, Ph.D. Professor of Psychology and Gowen Professor in the Social Sciences University of Rochester 232 Attachment to Attachment 1 Every Classroom Every Day Memorandum of Understanding Data Requirements Addendum This document outlines the data collection activities that are a required component of Every Classroom, Every Day. Design Features • Random assignment: Within each participating district, all of the interested and qualified schools will enter a lottery where half of the schools will be selected to begin the Every Classroom, Every Day instructional supports in Summer 2009 and continue through the 2010-2011 academic year. Schools not selected in the lottery will participate in the research activities, continue to receive the instructional supports currently provided by the district, and receive an honorarium of $10,000 for their participation in the research. No one from the school district, the implementation team, or the researcher team will have any control over which schools are in which group. • Data collection in all participating schools: Research activities will begin the Summer of 2009 in all schools that are participating in the project (both those that are and are not receiving the Every Classroom, Every Day supports). Research activities will continue for two years, through the 2010-2011 school year. • Ability to link student, teacher, classroom, and school information: A primary purpose of Every Classroom, Every Day’s research is to understand how different aspects of this instructional improvement model affect teachers and their students. To understand these effects, code numbers will be assigned to each student and teacher that will protect the confidentiality of participants from the external research team, while also allowing researchers to link information from students and teachers to their classes and schools. Descriptions of Research Activities All research activities described below will take place on the same schedule in all schools that are part of the project, regardless of whether or not they are receiving the Every Classroom, Every Day supports. • Individual Student Records: Student records will be provided at the end of each school year (total of two times), in an electronic format for each 9th and 10th grade student enrolled in participating schools. In addition, districts will provide 8th grade records for incoming 9th graders. 233 The specific pieces of information needed for each student, each year, are: • o demographic characteristics, including birth date, race/ethnicity, gender, and free or reduced-price lunch eligibility; o special education and English language learner status; o course grade for each course enrolled; o state-mandated standardized test scores and proficiency levels in all statemandated subject areas; o scores on ACT Plan or PSAT, if any; o date enrolled in school; o days absent from/present in school; o drop-out status or other reason for withdrawal, and date of last attendance; o promotion or retention status (including whether the student has graduated); o progress towards graduation (credits earned and credits and courses that are still needed for graduation); and o number of suspension incidents and days missed due to suspension (both inand out-of-school). Teacher Questionnaires: The Teacher Questionnaire measures attitudes, beliefs, and perceptions and takes about 30 minutes to complete. The research team will work with the school/district staff to administer the questionnaires once each year (total of two times) via a secure website1 to all teachers of 9th and 10th grade math and language arts classes. • Student Questionnaires: The Student Questionnaire measures attitudes, beliefs, and perceptions and takes about 30 minutes to complete. The research team will work with school/district to administer the questionnaires to all 9th and 10th grade students in the fall and spring of each year (total of four times) via a secure website. • Master Schedule: Master schedules will be used to facilitate linkages in the data among teachers and classes. The Master Schedule will be provided each semester electronically (total of four times) and should include teacher names, teacher human resource ID numbers, departmental affiliation, course ID numbers, and course names. A code number will be created for each teacher allowing his/her course, questionnaire, and student information to be linked confidentially. • Class Rosters: Class rosters will be used to facilitate data linkages in the data among students, teachers, and classes. They will be provided electronically each semester (total of four times) and can include either the student code numbers that have been created for the research project or the student names and school IDs. In the latter case, IRRE will convert the names or school IDs to code numbers prior to sharing the information with the research team. 1 If a school or district does not have sufficient computer capability to conduct on-line questionnaires, the research team will work with the school/district to make arrangements for paper and pencil administration. 234 Alternatively, schools can provide each student’s course schedule electronically, including course ID numbers that correspond to the course ID numbers on the Master Schedule. • EAR Classroom Visit Protocol: Observers who work for the research team and have been trained by a senior IRRE instructional staff member will conduct instructional ratings, using the Engagement, Alignment, and Rigor (EAR) Protocol. Each year, between six and ten observations will be conducted all 9th and 10th grade math and language arts classes at all four schools in the project (both those who are and are not receiving the Every Classroom, Every Day supports). The EAR protocol data will be collected with a PDA and will go directly to IRRE’s secure server. IRRE will provide the data to the research team only after it has been made anonymous, using the research code numbers. Additionally, the EAR Protocol data collected by school and district leaders as part of the Every Classroom, Every Day supports will be used not only by the district and schools but also by the research team to investigate differences in ratings made by individuals with different roles. • Research Site-Visit: At least once per year, the research team will conduct sitevisits to all four participating schools to determine the degree to which teachers and students are or are not engaged in Every Classroom, Every Day activities. During these visits, the researchers will interview the principal or assistant principal and teachers in each of the four school, as well as key district personnel. Parental Notification about Research Activities IRRE will work with the schools to create and distribute a summary of the research activities (including student surveys, classroom observations, and student records) for parents. The summary sheet will include a way for parents to notify the school and research team if a parent does not want the student to participate or does not want the student’s records released. All 9th and 10th grade students will be included unless the parent notifies the school that s/he should be excluded. Financial Assistance One person from each of the four schools will be designated as the study coordinator and will work with the research team to coordinate the EAR Classroom Visit Protocol observations and the administration of the teacher and student surveys. Each school will receive a one-time honorarium of $500 to cover the costs of the coordinator’s work. Further, each district will be provided a one-time honorarium of $1,500 to cover the costs of the staff member who provides the data from the district database to the researchers. Participant Confidentiality The names of the districts that participate in the ECED project will be considered public information, however, information regarding which schools are in which condition will be confidential, and all research results will be reported to the public in a way that 235 masks the identity of the schools, teachers, and students. Student and teacher ID numbers will be created for purposes of this study and all research data will be linked using only study-specific IDs only. For schools receiving the ECED supports, the following data will be made available to the schools, district-administrators, and IRRE’s technical assistance providers: 1) EAR Classroom Observation protocol data at the teacher-level, and 2) student- and teacher-survey responses, aggregated to the school-level. For schools in the research-only condition, no data will be released to the schools or district during the course of the study. 236 Appendix 4: Recruitment and Participation Diagram a Assessed for eligibility (n = ~21,100) Excluded (n = ~21,080) ♦ Not meeting inclusion criteria • 9 grade enrollment <220 (n = ~14,800) th • FRPL < 30% (n = ~3,280) Enrollment • Fewer than 4 eligible schools in district (n = ~1,800) ♦ Initial contact made, but received no response (n = ~900) ♦ Declined after some additional contact (phone calls, site visit) (n =~300) Randomized (n = 20) Allocation Allocated to intervention (n = 10) • Received 2 years of intervention (n=8) • Received 1 year of intervention (n=1) • Received <1 year of intervention (n=1) Allocated to control (n = 10) Follow-Up Lost to follow-up (n = 0) Lost to follow-up (n = 0) Discontinued intervention (n = 2) Analysis a Analysed (n = 10) Analysed (n = 10) Excluded from analysis (n = 0) Excluded from analysis (n = 0) This flow chart reflects the number of schools at each step, because randomization took place at the school level. However, recruitment took place primarily at the district level and the randomization was blocked at the district level. All values in the enrollment boxes are approximate. The recruitment process was iterative and took place over several years. During that time, changes in school enrollments, etc. changed schools’ eligibility. 237 Appendix 5: Teacher Survey Items The table below shows all teacher questionnaire items, along with the construct each was intended to measures and the waves at which it was administered. The fall (Wave 1 and 3) teacher questionnaires were not part of their original data collection plan and were not part of the data collection that was described to the schools prior their agreeing to participate. For that reason, we felt that it was important that it be very short (one page); it included only the 12 most important items in the Fall of 2009 and Fall of 2010. For the Fall of 2011 we decided it could be two pages in length and added individual teacher morale and demographic information. Three constructs were included at all four waves because we believed they were the most likely moderators of changed instruction, and it was critical to have baseline information on those constructs. They were: support from school administration(3 items), support from district administration (3 items), and support and collective engagement (6 items). Most of the other constructs were measured only during the spring waves (Waves 2 and 4). They were: district/school commitment to change (4 items), confidence in change (3 items), perceived competence as a teacher (3 items), relative autonomy (8 items), amount and type of professional development received (8 items), perceived value of professional development (7 items), implementation of ECED (12 items). The final construct -- individual teacher morale (6 items) – was included only in Waves 2 and 4 for schools in the first recruitment group and in Waves 2, 3 and 4 for schools in the second recruitment group. Additionally, there were 15 demographic items, but not all were asked at each wave. We also requested demographic information about teachers from the five districts, but only three (Districts 2, 3 and 4) provided the information. The questionnaire and district information were combined, resulting in 26% missing. 238 Order Question Text 4 School administrators understand and respond to the teachers' needs. School administrators support teachers making their own decisions about their students. School administrators help teachers and staff get what they need from the district office. District administrators are attentive to school personnel and provide the encouragement and support they need for working with students. District administrators allow school staff to try educational innovations that the teachers believe would be helpful for the students. District administrators are responsive to the needs of teachers and staff for professional development. Teachers at this school do what is necessary to get the job done right. Teachers at this school don't give up when difficulties arise. Teachers at this school go beyond the call of duty to do the best job they can. Teachers in this school go out of their way to help each other. Teachers in this school encourage each other to do well. Teachers in this school share resources with each other. How committed is your superintendent to strengthening the quality of instruction within your district? How committed are the staff in the District Office to improving teaching and learning in the district? How committed do you think the School Board is to making changes that will improve instruction and achievement in your district? How committed is your principal to supporting changes in the school that will improve the quality of teaching in all classrooms? How confident are you that instruction can be improved in your school to ensure that all students experience high quality teaching and learning every day? How confident are you that your school is making changes that will improve the performance of your students? 7 13 3 10 16 2 6 9 12 15 18 22 23 24 25 9 20 Response Options Original Construct Waves 1 2 3 4 Not At All True Not Very true Sort of True Very True Support from School Administrators All 1 2 3 4 Not At All True Not Very true Sort of True Very True Support from District Administrators All 1 2 3 4 Not At All True Not Very true Sort of True Very True Support and Collective Engagement All 1 Not at all Committed 2 3 4 Somewhat Committed 5 6 7 Very Committed District/School Commitment to Change 2&4 only 1 Not at all Confident 2 3 4 Somewhat Confident 5 Confidence in Change 2&4 only 239 Order Question Text 29 33 How confident are you that your school is improving instruction in ways that can be sustained over time? I am very confident in my abilities as a teacher. I think I am a very skilled teacher. 36 I feel very competent as a teacher. 21 26 27 28 30 31 32 34 35 37 38 39 40 41 42 The reason I am a teacher is that it's interesting and enjoyable to work with students. (intrinsic) I teach because it is my job and I need the salary to live. (external) I teach because it is personally important to me to help students learn and develop. (identified) I teach because I think I should and would feel guilty if I didn't. (introjected) I teach because it is meaningful to me to understand students and encourage their growth. (identified) I am a teacher because I would feel bad about myself if I did not stick with this career. (introjected) The reason I teach is that it is exciting to watch students learn. (intrinsic) I teach because I feel like I have to. (external) How many hours have you been involved in: Workshops? During this school year, how many hours have you been involved in: College Courses (face to face)? During this school year, how many hours have you been involved in: Online courses/ modules? During this school year, how many hours have you been involved in: Conferences? During this school year, how many hours have you been involved in: Coaching or mentoring by another teacher? During this school year, how many hours have you been involved in: Coaching or mentoring by a specialist, administrator, or expert (not a peer)? Response Options 6 7 Very Confident 1 Not At All True 2 Not Very true 3 Sort of True 4 Very True 1 2 3 4 Not At All True Not Very true Sort of True Very True 0 1 2 3 4 Was not involved 1-5 hours 6-10 hours 11-15 hours More than 15 hours Original Construct Waves Perceive Competence 2&4 only Relative Autonomy Index 2&4 only Amount/type of Professional Development 2&4 only 240 Order 43 44 45 46 47 48 49 50 51 1 5 8 11 14 17 52 Question Text During this school year, how many hours have you been involved in: Observation of other teachers' classes? During this school year, how many hours have you been involved in: Involvement in teacher study groups? My professional development activities: Helped me to increase student engagement in my classes. My professional development activities: Helped me to better understand the subjects I teach. My professional development activities: Enhanced my classroom management skills. My professional development activities: Helped me to challenge and encourage all students to work at or above grade level. My professional development activities: Increased my use of effective instructional strategies for improving academic achievement. My professional development activities: Increased the extent to which my instruction is aligned with the course standards and curriculum. My professional development activities: Are likely to have a lasting impact on my instructional practices. I look forward to going to work in the morning. My job has become just a matter of putting in time. (reverse) When I am teaching, I feel happy. Time goes by very slowly when I'm at work. (reverse) When I am teaching, I feel bored. (reverse) When I am teaching, I feel discouraged. (reverse) Last semester, how often did you meet with the instructional coach and other ECED teachers for discussions about improving instruction? ('ECED' was omitted from the version used in control schools.) Response Options 1 2 3 4 Not At All True Not Very true Sort of True Very True 1 2 3 4 Not At All True Not Very true Sort of True Very True 1 Never (I was part of ECED but never met with the coach to Original Construct Waves Perceived Value of Professional Development 2&4 only Individual Teacher Morale (referred to as Individual Engagement by IRRE) Implementation of ECED 2&4 for RG 731; 2, 3 & 4 for RG2 2&4 only 241 Order Question Text Response Options 2 53 This semester, how often do you meet with the instructional coach and other ECED teachers for discussions about improving instruction? ('ECED' was omitted from the version used in control schools.) 3 4 5 Original Construct Waves Implementation of ECED (continued) 2&4 only discuss instruction) Once or twice during the semester About once per month About once per week More than once per week 6 N/A (I was not part of ECED this/last semester) 54 55 56 57 If you taught Algebra I or Geometry last semester, was there a chart in your classroom that displayed and tracked student progress toward mastery of all benchmarks? If you are teaching Algebra I or Geometry this semester, is there a chart in your classroom that displays and tracks student progress toward mastery of all benchmarks? How often do school administrators or instructional coaches visit your classroom to watch student learning? (Note: do not count observations done as part of your formal performance evaluation.) If school administrators or instructional coaches visit your classroom, do you receive feedback or have conversations about your instruction after the visits? 1 Yes 2 No 3 N/A I do not teach math or did not teach Algebra I or Geometry (this/last) semester 1 Never 2 Once or twice during the semester 3 About once per month 4 About once per week 5 More than once per week 1 Yes, always 2 Yes, sometimes 3 No, never 4 N/A School administrators or instructional coaches do not visit 242 Order Question Text Response Options 58 If you taught 9th grade ECED Literacy last semester, how many of the 51 lessons did you cover last semester? (Note: do not count observations done as part of your formal performance evaluation.) (Omitted from the version for the control schools.) 59 If you are teaching 9th grade ECED Literacy this semester, how many of the 51 lessons have you covered this semester, as of today? 59a If you taught 10th grade ECED Literacy last semester, how many of the 122 lessons did you cover last semester? (Omitted from the version for the control schools.) 0 0 1 1-5 2 6-10 3 11-15 4 16-20 5 21-25 6 26-30 7 31-35 8 36-40 9 41-45 10 46-51 11 N/A I did not teach ECED Literacy (this/last) semester 0 0 1 1-10 2 11-20 3 21-30 4 31-40 5 41-50 6 51-60 7 61-70 8 71-80 9 81-90 10 91-100 11 101-110 12 111-122 13-N/A I did not teach ECED Literacy last/this semester th 59b 74 If you are teaching 10 grade ECED Literacy this semester, how many of the 122 lessons have you covered this semester, as of today? (Note: do not count observations done as part of your formal performance evaluation.) (Omitted from the version for the control schools.) Original Construct Waves Implementation of ECED (continued) 2&4 only 243 Order Question Text 60 Just prior to the start of school, ECED and your district offered three days of professional development about Every Classroom, Every Day (ECED). Did you attend? (Omitted from the version for the control schools.) 61 During the school year how many times did you participate in an ECED site visit, that involved working with the ECED staff and your instructional coach, while a substitute covered your class? (Omitted from the version for the control schools.) 62 What is your current position? Response Options 1 Yes, I attended all 3 days 2 Yes, I attended 2 days 3 Yes, I attended one day 4 No, I was busy or not interested 5 No, I had not yet been hired or did not yet know I would be part of ECED 0 0 times 1 1 time 2 2 times 3 3 times 4 4 times 1 Math Teacher 2 English or Literacy Teacher 3 Other classroom teacher (not Math or English/Literacy) 4 Counselor 5 Administrator 6 Librarian/ Media Specialist 7 School Psychologist/ Speech Pathologist 8 Long Term SubstituteMath 9 Long Term SubstituteEnglish/Literacy Original Construct Waves Implementation of ECED (continued) 2&4 only Demographics 2&4 only 244 Order 63 Question Text Counting this year, how many academic years have you worked in your current position? 64 Counting this year, how many years in total have you taught at this or other schools/ districts? 65 Counting this year, how many years in total have you worked in this school? 66 Response Options 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 No Answer Less than 1 year 1-2 years 3-5 years 6-10 years 11-20 years More than 20 years No Answer N/A I'm not a teacher Less than 1 year 1-2 years 3-5 years 6-10 years 11-20 years More than 20 years No Answer Less than 1 year 1-2 years 3-5 years 6-10 years 11-20 years More than 20 years Are you...(male/female)? Recoded – see dataset 67 What grade (or grades) are you currently teaching? (Check all that apply) Original Construct Waves 2&4 only 2&4 for RG1; 2, 3&4 for RG2 Demographics (continued) 2&4 only 2&4 for RG1; 2, 3&4 for RG2 2&4 only 245 Order Question Text 68 Are you Hispanic/ Latino? 69 What is your race? 70 What is the highest level of education that you have completed? Response Options Original Construct Waves 2&4 for RG1; 2, 3&4 for RG2 2&4 for RG1; 2, 3&4 for RG2 1 Associates or other 2year college degree 2 Bachelor's or other 4year college degree 3 Teaching certification programming requiring at least one year of study beyond Bachelor's 4 Master's degree 5 Education specialist or other professional diploma beyond a Master's 6 Phd, EdD, or other doctorate Demographics (continued) 4 only for RG1; 2, 3&4 for RG2 246 Order 71 Question Text Original Construct Are you certified in THIS state to teach Secondary English/ Language Arts? 72 Are you certified in THIS state to teach Secondary Mathematics? 73 Are you certified in THIS state to teach any other content areas? 74 Response Options Are you certified by the National Board for Professional Teaching Standards in at least one content area? 1 Yes, regular certification 2 Yes, provisional, probationary, temporary, or emergency certificate 3 No Demographics (continued) 1 Yes, fully certified 2 Working toward National Board Certification 3 No, not certified Waves 4 only for RG1; 2, 3&4 for RG2 4 only for RG1; 2, 3&4 for RG2 4 only for RG1; 2 & 4 for RG2 2&4 for RG1; 2, 3&4 for RG2 247 Order 75 76 Question Text Are/were you part of a Teach for America program? Are/were you part of a program that trains non-educators to teach? Response Options 1 Yes, I am currently teaching in this school through a Teach for America program 2 Yes, I was part of Teach for America when I began teaching 3 No, I have not been part of Teach for America 1 Yes, I am currently part of such a program 2 Yes, I was part of such a program when I started teaching 3 No, I have not been part of such a program Original Construct Waves 4 only for RG1; 2 & 4 for RG2 Demographics (continued) 4 only for RG1; 2 & 4 for RG2 248 Appendix 6: EAR Protocol Training for ECED Efficacy Trial Data Collectors Of the seven individuals who collected EAR Protocol data for ECED, six participated in one of IRREs typical training events. Of these six, two 75 participated in an IRRE training to learn the EAR protocol when they were employees of a school district, and their district elected to take part. Those two used the tool in their own district for several years and then later became consultants for IRRE, using the EAR Protocol in school districts across the country. After working for IRRE, they began work for ECED. Two of the six 76 went through the typical IRRE training as they began working as consultants for IRRE. Again, they used the tool for several years as part of their IRRE work prior to becoming data collectors for ECED. The remaining two 77 took part as they were beginning to work on an earlier project funded by this same grant regarding the tool’s psychometric properties. They participated in the four-day training, along with a school district that was not part of ECED, but they were not and had never been employees of that school district. The seventh individual 78 who collected EAR Protocol data for ECED learned the protocol specifically to collect data for this project. She worked directly with an IRRE consultant who had led many of the four-day trainings. Her training consisted of one and one-half days focused solely on understanding the terms and scoring of the protocol. Immediately following this training, this individual took place in the “refresher training” described below. Following the first year of data collection, five of the seven data collectors participated in a one and one-half refresher training, with an IRRE consultant. Of the two who did not participate, one was no longer working for the project 79 when the refresher took place and the other had not been hired yet. 80 Both were individuals who had learned the tool as part of prior employment and had used it extensively for their work. 249 Appendix 7: Student Questionnaire Administration Procedures Several weeks prior to each planned administration, the district or school provided a list of all enrolled 9th- and 10th-graders to the research team, as well as information about what sections (e.g., teacher and period) would be used for survey administration. The research team created a ‘ticket’ for each student, which was simply a half sheet of paper that indicated the student’s name, his/her survey administration section (e.g., name of English teacher and period), the survey URL, and a unique seven character survey access code. The survey access codes were used to link student identifiers to their survey responses. Each survey access code was used just one time and they were not the same as the study IDs or district-assigned IDs. Survey tickets were organized by section and distributed to the teachers who would be administering the surveys. Schools were given a set of ‘extra’ tickets that contained survey access codes, but no student or teacher names, for use with students who enrolled after the lists were created or if tickets were lost. Schools were asked to tell the research team the name and district-assigned ID of any student who used an extra ticket. Data were discarded for any cases where an extra ticket was used without the research team learning who used it or if the research team learned that the student who used it was not eligible (e.g., 11th-grader). Teachers were instructed to take their class to a computer lab during the assigned period, read a very brief set of instructions to the students, and ensure that students were able to log on to the survey. Teachers were asked to give absent students multiple opportunities to take the survey. Students who did not want to participate were asked simply to click through to the end without answering any questions. Students whose parents had returned the opt-out form were given a ticket without a survey access code and a note indicating that the parent had requested that the student not participate. 250 A few schools were not able to administer the surveys on-line due to shortage of computer space and/or limited internet connections. In those cases, the surveys were administered on paper and double keyed later. When paper surveys were used, schools were provided with one survey for each target student. Each survey had a cover sheet on it that indicated the student’s name and administration teacher and period. Students were asked to remove the cover after completing the survey. Once the cover was removed, the survey itself had only the student’s study ID on it. (Note: The schools and waves in which the student questionnaires were administered on paper were: School 1, Waves 3 and 4; School 11 Waves 2 through 4; School 17 Waves 3 and 4; School 19 Waves 3 and 4.) 251 Appendix 8: Student Questionnaire Items The table below lists all items in the student questionnaire and the construct each was intended to measure. As noted in the text, factor analyses conducted for this study resulted in a slightly different set of scales. In addition to gathering demographic information (9 items), the questions were intended to measure six constructs: teacher support (8 items); student engagement (8 items), peer support (3 items), teacher expectations (4 items), rigor (4 items), relative autonomy (8 items), and perceived competence (4 items). (Note that the demographic questions were not included in the student questionnaires in District 5. That district’s policies only permitted questions that were directly about school. The district did, however, provide demographic information about the study students from the student records.) Students responded to all items except the demographic and rigor items using the following scale: 1 = Not at all true, 2 = Not very true, 3 = Sort of true, 4 = Very true. The response scale for the Rigor items was: 1 = Almost Never, 2 = Not Very Often, 3 = Most of the Time, 4 = Almost Always. 252 Student Questionnaire Items Question My teachers care about how I do in school. My teachers like to be with me. My teachers like the other kids in my class better than me. (reverse) My teachers interrupt me when I have something to say. (reverse) My teachers are fair with me. My teachers don't make clear what they expect of me in school. (reverse) Original Construct Teacher Support Teacher Support My teachers are not fair with me. (reverse) Teacher Support My teachers' expectations for me are not realistic. (reverse) Teacher Support It is important to me to do the best I can in school. Student Engagement I work very hard on my schoolwork. Student Engagement I don't try very hard in school. (reverse) Student Engagement I pay attention in class. Student Engagement I often come to class unprepared. (reverse) Student Engagement Question Number 3 8 Teacher Support 14 Teacher Support Teacher Support Teacher Support 18 23 26 31 34 5 7 17 20 10 When I'm doing a class assignment or homework, it's not clear Student Engagement to me what I'm supposed to be learning. (reverse) 13 When I'm doing a class assignment or homework, I understand why I'm doing it. Student Engagement 35 A lot of the time I am bored in class. (reverse) Student Engagement Student in my school get to know each other well. Students in my school show respect for each other. Peer Support Peer Support 24 21 28 In my school, the students push each other to do well. Peer Support 32 My teachers show us examples of the kinds of work that can earn us good grades. My teachers make it clear what kind of work is expected from students to get a good grade. My teachers expect all students to do their best work all the time. My teachers expect all students to come to class prepared. Teacher Expectations 1 Teacher Expectations 6 Teacher Expectations 12 Teacher Expectations 16 253 Question I am asked to fully explain my answers to my teachers' questions. Our classroom assignments and homework make me think hard about what I'm learning. Rigor 36 Rigor 37 I know what kind of work it takes to get an A in my classes. Rigor 38 Rigor 39 Relative autonomy Relative autonomy 2 19 Relative Autonomy 4 Relative Autonomy 27 Relative autonomy 15 Relative Autonomy Relative Autonomy Relative Autonomy Perceived Competence 22 11 29 30 Perceived Competence Perceived Competence Perceived Competence Demographic Demographics Demographics Demographics Demographics 9 25 33 40 41 42 43 44 What language are you most comfortable speaking? Demographics 45 Are you enrolled in special education classes this year? Demographics Demographics 46 47 Demographics 48 My teachers make sure I understand before we move on to the next topic. I do my homework because I would get in trouble if I didn't. (external) I do my schoolwork because that's the rule. (external) I do my homework because I want the teachers to think I'm a good student (introjected) I do my schoolwork because I would feel bad about myself if I didn't do it. (introjected) I do my homework because it's important for me to do my homework. (identified) I do my schoolwork because I really want to understand the subjects we are studying. (identified) I do my homework because it's fun. (intrinsic) I do my schoolwork because I enjoy doing it. (intrinsic) I feel confident in my ability to learn at school. I am capable of learning the material we are being taught at school. I feel able to do my schoolwork. I feel good about how well I do at school. How old are you? Are you male or female? Are you Hispanic or Latino? What is your race? What grade are you in this year? Do you get free or reduced price lunches this year? During this school year, did you have any suspensions, inschool or out of school? If how many total days have you been suspended this school year? (only asked in Wave 4 for Recruitment Group 1 and Waves 2, 3 and 4 for Recruitment Group 2) Original Construct Question Number 254 Appendix 9: Restructuring the Course Files for Use in Analysis Because the main purpose of these files was to link students to math and ELA teachers and periods/blocks, the data sets were restructured so that for each student, at each wave, up to eight courses were described: (1) regular (9th- or 10th-grade) English, (2) ECED Literacy, (3) first other English, (4) second other English, (5) Algebra 1, (6) Geometry, (7) first other math, and (8) second other math. First and second ‘other English’ were any classes other than regular (9th-/10thgrade) English or ECED Literacy for which a student could receive English credit. Typically these were support classes for English language learners and remedial English courses, but also included English electives such as creative writing. In math first and second ‘other math’ included any math class other than the two courses targeted by ECED, Algebra 1 or Geometry. Therefore, other math included both lower level remedial math courses and more advanced math courses like Algebra 2. For each of the eight courses, six variables were created (1) teacher ID, (2) period (e.g., 1st, 2nd, 3rd), (4) period ID (which uniquely identifies each combination of teacher and period) (3) course name (as provided by the district), (4) course id (as provided by the district), (5) final grade. When a student did not take a particular course, the variables describing that course were given a specific missing code to indicate that the student was not enrolled (995). Additionally, for each wave and year, variables indicating the total number of math and English courses each student had were calculated. In order that these variables could be directly compared across schools, they were weighted by the length of the course. A course that met for a single period each day in a school on a traditional schedule was weighted as 1. These courses met between 45 and 55 minutes daily. Likewise, a course that met every other day in a school on an AB Block was also weighted 1. Blocks are typically 90 minutes in length, so this every other 255 day block schedule is roughly equivalent to 45-55 minutes daily. A course that met every day for a block (e.g., 90 minutes) was weighted as 2 because that is roughly twice as much time as a course meeting daily on a tradition schedule. Courses that met less frequently (e.g., one 45 minute period every other day) were weighted accordingly (e.g., 0.5). 81 Note that occasionally a student had more than two ‘other’ English or math courses. Those ‘others’ were included in the count variables, but are not described in the data set. 256 Appendix 10: Test Scores Received for Grade Cohort 1 This table shows the types of tests for which we received scores in each district, as well as the number and percentage of scores received for students in Grade Cohort 1. The last line in each cell (in bold) presents the total number and percentages of Grade Cohort 1 students with a usable test score in each district and year, prior to imputation, after combining using the rules described in the text. District (state) n Baseline Math 1 (CA) 2628 2 (CA) 1590 3 (TN) 1264 GL 7 ‘08: 1928 (73%) Gen. Math ’08: 2 (<1%) Gen. Math ‘09: 819 (31%) th GL 8 ’09: 8 (<1%) Alg 1 ‘08: 7 (<1%) Alg 1 ‘09: 1240 (47%) Geo ’09: 10 (<1%) CBL_math: 2159 (82%) th GL 7 ‘08: 1032 (65%) Gen. Math ‘08: 2 (<1%) Gen. Math ‘09: 598 (38%) th GL 8 ‘09: 2 (<1%) Alg 1 ‘08: 94 (6%) Alg 1 ‘09: 486 (31%) Geo ‘09: 75 (5%) Alg 2 ‘09: 1 (<1%) CBL_math: 1225 (77%) th ‘ 7 09: 713 (56%) th 7 ’10: 10 (<1%) th ‘ 8 09: 96 (8%) th 8 ‘10: 778 (62%) Alg 1 ’10: 54 (4%) th CBL_math: 905 (72%) Baseline ELA th Y1 Math Y1 ELA th Y2 Math Y2 ELA th 7 CST ‘08: 1937 (73%) th 8 CST ‘09: 2101 (79%) Gen Math ‘10: 16 (<1%) Alg 1 ’10: 1580 (60%) Geo ‘10: 437 (16%) Alg 2 ‘10: 15 (<1%) Int Math 1 ’10: 1 (1%) 9 CST ‘10: 2148 (82%) Alg 1 ‘11: 634 (24%) Geo ‘11: 877 (34%) Alg 2 ‘11: 339 (13%) HS Math ‘11: 23 (1%) CAHSEE ‘11: 2059 (78%) 10 CST ‘11: 1985 (75%) CAHSEE ‘11: 2079 (78%) CBL_ELA: 2165 (82%) th 7 CST ‘08: 1147 (72%) th 8 CST ‘09: 1150 (72%) CY1_math: 2039 (78%) Gen Math ‘10: 8 (<1%) Alg 1 ‘10: 863 (54%) Geo ‘10: 278 (17%) Alg 2 ‘10: 64 (4%) HS Math ’10: 1 (<1%) CY1_ELA: 2148 (82%) th 9 CST ‘10: 1214 (76%) CY2_math: 1863 (71%) Alg 1 ‘11: 229 (14%) Geo ‘11: 502 (32%) Alg 2 ‘11: 257 (16%) HS Math ‘11: 56 (3%) Int Math 1 ‘11: 118 (7%) Int Math 2 ‘11: 1 (<1%) CAHSEE ‘11: 1295 (81%) CY2_ELA: 1974 (75%) th 10 CST ‘11: 1207 (76%) CAHSEE ‘11: 1300 (82%) CBL_ELA: 1230 (77%) th 7 Rdg ’09: 714 (56%) th 7 Rdg ’10: 10 (<1%) th 8 Rdg ’09: 96 (8%) th 8 Rdg ’10: 774 (61%) th 8 Wrt ’09: 103 (8%) th 8 Wrt ’10: 753 (60%) th 9 ELA ’10: 72 (6%) th 10 ELA ’10: 7 (<1%) CBL_ELA: 913 (72%) CY1_math: 1213 (76%) Alg 1 ‘11: 641 (51%) Geo ‘11: 78 (6%) Alg 2 ‘11: 12 (1%) CY1_ELA: 1214 (76%) th 9 ELA ‘11: 804 (64%) th 10 ELA’ 11: 53 (4%) CY2_math: 1160 (73%) Alg 1 ‘12: 157 (12%) Geo ‘12: 504 (40%) Alg 2 ‘12: 154 (12%) CY2_ELA: 1207 (76%) th 9 ELA ‘12: 84 (7%) th 10 ELA ‘12: 676 (53%) CY1_math: 720 (57%) CY1_ELA: 848 (67%) CY2_math: 766 (61%) CY2_ELA: 736 (58%) 257 District (state) n 4 (AZ) 2222 5 (NY) 729 Baseline Math th 7 ‘08: 1 (<1%) th 7 ‘09: 605 (27%) th 8 ‘09: 1 (<1%) th 8 ‘10: 735 (33%) CBL_math: 1329 (60%) th 7 ‘09: 361 (50%) th 8 ‘09: 54 (7%) th 8 ‘10: 421 (58%) Alg1 ‘10: 105 (14%) Geo ‘10: 2 (<1%) CBL_math: 519 (71%) Baseline ELA th Y1 Math Y1 ELA Y2 Math Y2 ELA 7 Rdg ‘08: 1 (<1%) th 7 Rdg ‘09: 602 (27%) th 7 Wrt ‘08: 1 (<1%) th 7 Wrt ‘09: 604 (27%) th 8 Rdg ‘10: 735 (33%) CBL_ELA: 1329 (60%) th 7 ‘09: 364 (50%) th 8 ‘09: 54 (7%) th 8 ‘10: 425 (58%) Stan ‘11: 1530 (69%) Stan Lang ‘11: 1490 (67%) Stan Rdg ‘11: 1528 (69%) th 8 Rdg ‘11: 1 (<1%) Exit ‘12: 1582 (71%) Exit Rdg ‘12: 1610 (72%) Exit Wrt ‘12: 1607 (72%) CY1_math: 1530 (69%) Alg1 ‘11: 370 (51%) Geo ‘11: 29 (4%) Alg2 ‘11: 1 (<1%) CY1_ELA: 1530 (69%) Rgnt ELA ‘11: 5 (<1%) GM ‘11: 3 (<1%) CY2_math: 1582 (71%) Alg1 ‘12: 226 (31%) Geo ‘12: 131 (18%) Alg2 ‘12: 17 (2%) CY2_ELA: 1613 (73%) Rgnt ELA ‘12: 35 (5%) GM: 199 (27%) CBL_ELA: 507 (70%) CY1_math: 392 (54%) CY1_ELA: 8 (1%) CY2_math: 359 (49%) CY2_ELA: 232 (32%) 258 Appendix 11: State-Specific Decisions Regarding Combining Test Scores In addition to the general rules we formulated for handling data at different grade levels across districts, we also had to make state-specific decisions in order to combine of achievement scores across. Districts 1 and 2 (CA). Starting in 10th-grade, students in Districts 1 and 2 took the California High School Exit Exam (CAHSEE) in English and math, in addition to the CST. If they did not pass the CAHSEE in 10th-grade, they continued to take the test (up to eight additional times) until they passed. Passing the CAHSEE is a requirement of graduation in California. For ECED, we requested scores only for the first time each student took the tests. However, in the end we decided to omit California High School Exit Exam (CAHSEE) scores all together. Almost all students with a CAHSEE score also had a CST score, so omitting those did not appreciably increase the amount of missing data. CAHSEE is a very different testing system from the CST, with a different purpose and content. Including CAHSEE would be the only time that we were including two entirely separate testing systems to create a student’s score. Districts 3 (TN). This state entirely revamped their testing system between 2009 and 2010. So, 2009 test scores were standardized separately from 2010 and later, even when the tests had the same name. Additionally, for ELA at baseline, students in this state took a reading test in the 7th-grade and separate reading and writing tests in the 8th-grade. In order to avoid giving twice as much weight to the 8th-grade tests as to the 7th-grade tests, we first averaged the 8thgrade reading and writing tests together and then averaged those scores with whatever other tests the student had (typically 7th-grade reading, but occasionally 9th- or even 10th-grade ELA). Districts 4 (AZ). For ELA at baseline, students in this state took separate reading and writing test in the 7th-grade and only a reading test in the 8th-grade. In order to avoid giving twice 259 as much weight to the 7th-grade tests as to the 8th-grade tests, we first averaged the 7th-grade reading and writing tests together and then averaged those scores with the 8th-grade reading score. District 5 (NY). For Grade Cohort 1, only 8 out of 729 students had an ELA score. We imputed the missing data and included this district in the impact analyses. However, given the large amount of error produced when imputing such a large amount of missing data, we ran sensitivity analyses in which we excluded the district. 260 Appendix 12: Indicators of Variation in Implementation Indicator E1 E2 E3 E4 E5 English/Language Arts All 9th- and 10th-graders are enrolled in ECED Literacy, including English language learners, special education and honors (excluding only new immigrants ["newcomers"] and students with profound disabilities). Each student is enrolled in ECED Literacy in minimum 135 clock hours per year (45 minutes per day, 180 days per year). In the first year of implementation, all 41 lessons in the first 3 units from a single strand of the ECED Literacy Curriculum are covered for both 9th- and 10th-graders. (Note: The 10 lessons of unit 4 are optional). In the 2nd year of implementation, 9th-graders again receive all 41 lessons of Year 1 curriculum, and 10th-graders receive the 97 lessons of the first 3 units of the Year 2 curriculum. (Note: the 25 lessons of the Year 2 unit 4 are optional). In both years in both grades, three mid-unit, and three end-of-unit assessment are administered. All 9th- and 10th-graders are enrolled in regular ELA course, including ELL, special education, and honors. Each student enrolled in minimum of 135 clock hours per year (45 minutes per day, 180 days per year). Data Source Weight District Records 7 District Records 7 Teacher Questionnaires & ELA Coach/Chair Interviews 8 District Records 4 District Records 4 11 E6 All ECED Literacy teachers participate in 3 days of professional development prior to the start of school each year, focused on using the ECED Literacy Curriculum. Teacher Questionnaires & ELA Coach/Chair Interviews E7 All ECED Literacy teachers participate in four one-half days of PD across the school year with ECED staff, during which they practice instructional strategies and discuss challenges. Teacher Questionnaires & ELA Coach/Chair Interviews 11 ELA Coach/Chair Interviews 8 ELA Coach/Chair Interviews 10 E8 E9 There are regularly scheduled weekly meetings between the literacy coach and all ECED Literacy teachers, focused on instruction. Most teachers attend most meetings. The time is used to discuss emerging issues around use of the curriculum, including but not limited to reflection on lessons taught, preview and modeling of upcoming lessons, and discussion of modification for struggling students. There is a literacy coach for whom a minimum .50 FTE is dedicated to helping ECED Literacy teachers use the curriculum, conducting model lessons, and increasing Engagement, Alignment, and Rigor (EAR) in ECED Literacy courses. Position lasts for entire school year. 261 E10 E11 E12 M1 M2 M3 M4 M5 Indicator Literacy coach participates in the 3 days of professional development focused on using the ECED Literacy Curriculum, along with the ECED Literacy teachers prior to the start of school. Literacy coaches also participate in 6 hours of additional PD embedded in those 3 days (i.e., starting earlier, staying later) focused on learning the expectations of the role, the skills necessary to carry out the role, and how to engage others to build capacity and further develop collaborative instructional improvement. Literacy coach participates in four 3-day site visits from ECED team, focused on conducting EAR classroom visits, working directly with ECED Literacy Teachers, and one-on-one coaching with ECED teachers. Literacy coach participates in conference calls with ECED staff focused on emerging issues. Total of 14 calls per year (2 per month during the 5 months when there is no site visit; 1 per month during the 4 months when there is a site visit). Math (Algebra I and Geometry) Throughout the school year, lessons/units in Algebra I and Geometry classes are organized around benchmarks based on the state standards in the form of "I Can" statement. In Algebra I and Geometry classes, short, focused, benchmark assessments, created by the teachers as a content team and based on state standards, are given at the end of each lesson/benchmark completion. Students must successfully answer 80% of the questions on two benchmark assessments to be given credit for mastery of that benchmark. (As noted below, course grades are based on number of benchmarks mastered). Capstone assessments are given at the end of a series of benchmarks that test application of a group of related concepts. These assessments are created together by the content team and are based on state standards. The format of the questions on the capstone assessments is similar to that on high-stakes tests. Student grades are assigned based solely on the proportion of benchmarks mastered during the grading period (cut-points select by school teams/district). Students receive either a grade or an 'incomplete' for each grading period. Fs are not assigned until all opportunities for re-learning have been exhausted (i.e., end of summer school). Ds are never assigned. School teams/district determines what proportion of all benchmarks must be mastered in order to receive course credit. Each Algebra I and Geometry class has a chart that publically displays and tracks which benchmarks each student has mastered to date (80% correct on two occasions). Students have a record of which benchmarks they have mastered. Parents have been told about the benchmarks and how to read their child's benchmark record. Data Source Weight ELA Coach/Chair Interviews 10 ELA Coach/Chair Interviews 10 ELA Coach/Chair Interviews 10 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 262 M6 M7 Indicator Benchmark Café is open for students to receive extra help and re-take benchmark assessments daily before, during, and after school, throughout the school year. It is organized by the math coach and staffed by teachers, students, volunteers, and others deemed qualified by the coach. During the summer, there are opportunities for students to re-learn and re-test benchmarks that they did not master during the school year. Benchmarks mastered during the summer count towards the students' final grade. M8 All Algebra I and Geometry teachers participate in 3 days of professional development prior to the start of school each year, focused on creating "I Can" statements, pacing guides, and benchmark assessments. M9 All Algebra I and Geometry teachers participate in four one-half days of PD across the school year with ECED staff, during which they practice instructional strategies and discuss challenges. M10 M11 M12 M13 There are regularly scheduled weekly meetings between the math coach and all Algebra I/Geometry teachers, focused on instruction. Most teachers attend most meetings. The time is used to discuss emerging issues around use of the curriculum, including but not limited to reflection on lessons taught, preview and modeling of upcoming lessons, and discussion of modification for struggling students. There is a math coach for whom a minimum of .50 FTE is dedicated to helping Algebra 1 and Geometry teachers use ECED math strategies (e.g., benchmarking, I Can statements), increasing EAR in Algebra 1 and Geometry classes, conducting model lesson, and working in and organizing the Benchmark Café. Position lasts for entire school year. Math coach participates in the 3 days of professional development focused on creating "I Can" statements, pacing guides, and benchmark assessments, along with the Algebra 1 and Geometry teachers, prior to the start of school. Math coaches also participate in 6 hours of additional PD embedded in those 3 days (i.e., starting earlier, staying later) focused on learning the expectations of the role, the skills necessary to carry out the role, and how to engage others to build capacity and further develop collaborative instructional improvement. Math coach participates in four 3-day site visits from ECED staff, focused on conducting EAR classroom visits, working directly with Algebra 1 and Geometry teachers, and one-on-one coaching with teachers. Data Source Weight Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Teacher Questionnaires & Math Coach/Chair Interviews Teacher Questionnaires & Math Coach/Chair Interviews 7.14 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 Math Coach/Chair Interviews 7.14 263 M14 P1 P2 P3 P4 Indicator Math coach participates in conference call with ECED staff focused on emerging issues. Total of 14 calls per year (2 per month during the 5 months when there is no site visit; 1 per month during the 4 months when there is a site visit). EAR Protocol Math and literacy coach, plus three other instructional leaders, participate in four days of training on use of the EAR Protocol. Math and literacy coach, plus three other instructional leaders, conduct 10 practice visits in groups, between EAR Protocol training days 1-2 and days 3-4, followed by debriefing, to build shared understanding. Each trained EAR observer conducts 5 EAR visits per week, once training is complete, and uploads data to server. (Total of 140 per observer per year, assuming 28 weeks). EAR Classroom visit are used as a non-evaluative tool to give data to coaches and instructional leaders for reflective coaching conversations with teachers; allow coaches and instructional leaders to see trends; and make professional development decisions specific for individuals and groups of teachers around EAR. Data Source Weight Math Coach/Chair Interviews 7.14 Wgt. Math & ELA Coach/Chair Interviews 25 EAR Protocol Database 25 EAR Protocol Database 25 Math Coach/Chair, ELA Coach/Chair & Principal/AP Interviews 25 264 Appendix 13: Teacher-Level Correlations Among Outcome Variables Outcome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Fall Year 1 1 Teacher collective commitment 2 Teacher mutual support 3 Support from district .43** .23** .21** 4 Support from administration .36** .34** .40** 5 Engagement 6 Alignment 7 Rigor .02 .07 -.04 .08 -.15* .07 -.13 -.06 -.04 .26** .04 -.08 .03 .44** .64** Spring Year 1 8 9 Perception of administrative support for .26** .24** .35** .37** instructional innovation and improvement Teacher collective commitment .37** .34** .21* .27** .03 .01 -.03 .01 10 Teacher mutual support .30** .38** .19* .30** .05 .08 11 Support from district .20** .16** .30** .28** -.03 -.03 12 Support from administration .24** .24** .24** .36** .10 .07 13 Commitment to change .14** .08 .24** .17** -.05 -.01 14 Confidence in change .23** .26** .28** .36** -.01 .12 .39** .12 .33** .48** -.04 .72** .27** .25** .04 .70** .33** .30** .49** -.05 .77** .24** .15** .44** .33** .10 .04 .22* .16** .27** .16 .07 16 Professional development .19** .20** .30** .27** -.05 -.04 17 Perceived competence .01 .05 .01 .22** .31** .20** -.04 -.14* -.09 .20** .22** .21** 15 Individual teacher morale .21* .18* .06 .81** .35** .32** .44** .53** .39** .09 .37** .28** .21** .26** .38** .14** .39** .04 .47** .28** .25** .44** .31** .32** .38** .25** .22** .20* .10 18 RAI .13* .16* .00 .18** 19 Engagement -.08 -.03 -.09 -.06 .41** .18** -.02 20 Alignment -.09 -.04 -.04 -.08 21 Rigor -.03 -.03 -.16* -.16* .16* -.14* .08 .30** .26** -.02 .14* .28** .33** -.05 .11 .21* .11 .28** .11 .23** .34* .01 .26** .35** .12 .31** .09 -.02 .01 .12 -.14* .05 .06 .01 -.07 .19** -.04 .02 .06 .03 -.17** .32** .73** .21* -.04 .02 -.14* -.06 -.10 -.02 -.22** -.04 .00 .02 -.07 -.13* -.09 -.03 .12* .02 Fall Year 2 22 Teacher collective commitment .31** .29** .15* .23** -.12 -.09 .06 .25** .30** .28** .18** .22** .16** 23 Teacher mutual support .27** .30** .14* .24** -.01 -.04 .06 .28** .27** .31** .25** .12* -.08 -.04 -.08 .35** .12 .16* .19* .15** .19** -.01 .03 .03 .40** .18* .19** .20* .15** .24** .12 .15 .10 .24* .19* .19** .19* .22** .21* .18* .09 -.09 -.12 -.11 -.03 -.01 .22** .15 -.01 .12 -.02 .02 .01 -.03 .05 .13 -.24** 0.11 -.11 .13 .35** .30** .00 .01 .01 .05 .08 -.05 .05 -.04 24 Support from district 25 Support from administration 26 Individual teacher morale 27 Engagement .09 .07 .20** .14 .04 -.02 -.02 28 Alignment -.03 .11 -.09 .04 29 Rigor -.09 .04 -.03 .06 .16* .20** .12 .16* .16* .13 .10* .34** .21** .28** .23** .10* .21** .03 .17* .34** .34** .26** .31** .18* .20** .08 .10* .07 -.07 .02 .18* .15** .16* .14 .02 .25* .17** .22** .04 .10 .24** .26** -.13 -.05 -.14 .08 .04 -.01 .10 -.07 .11* .04 .02 .06 .02 .06 .01 -.01 -.13 .03 .06 .13 .33** .15* .08 -.12 .03 .15* .18* -.03 .14 .15* .17* Spring Year 2 30 Perception of administrative support for instructional innovation and improvement .31** .31** .29** .37** 31 Teacher collective commitment .40** .36** .22** .34** 32 Teacher mutual support 33 Support from district -.01 -.01 .09 -.03 .34** .36** .20** .31** .05 .08 .10 .31** .34** .37** .26** .27** .18** .27** .21** .23** .14 .26* .25** .27** .29** -.05 .10 -.02 .47** .29** .24** .41** .34** .34** .34** .23** .31** .16* 34 Support from administration .29** .32** .26** .38** 35 Commitment to change .16** .19** 36 Confidence in change .28** .25** .25** .35** 37 Individual teacher morale .28** .26** 38 Professional development .17 .18** .04 .59** .36** .32** .47** .45** .40** .49** .32** .37** .22** .18** -.03 -.12 -.03 .05 .37** .39** .35** .27** .29** .23** .33** .25** .26** .19* .16 -.05 -.04 .10 .03 .14 .11 .50** .36** .31** .40** .43** .31** .42** .31** .33** -.06 .04 -.04 .37** .21** .19** .29** .25** .30** .28** .15** .20** .02 .10 .14* .19* .21** .15* .20* .13 .24** .35** .28** .24* .22** .25** .31** -.08 .02 .02 .41** .24** 39 Perceived competence .22** .20** .11 .24** .02 -.03 -.01 .17* .16* 40 RAI .23** .19** .06 .21** .12 -.05 -.01 .17** .14* .06 .04 -.09 -.05 .06 -.14 -.13 -.08 .10 .53** .31** .29** .39** .41** .31** .49** .32** .35** .20** .19** .22* .34** -.01 -.00 .12 -.08 -.05 -.09 .07 -.09 .08 .20* .26** .33** .19** .33** .36** .27** .21** .16** .17* .08 .17* .35** .28** .28** .35** .25** .45** .13* .10 -.10 -.01 .05 .11 .11 .17* .13* .13** .19** .08 .17** .24* .12* .30** .16** .03 .19** .21** .16** -.24** -.06 -.15 -.01 -.13 .38** .23** All other correlations are pooled correlations using 5 multiply imputed datasets. .19* -.19* -.21* -.21** -.16* .09 -.08 -.03 -.03 .02 .11 -.31** -.02 .03 -.14 -.05 -.01 -.01 .07 -.01 0.0 -.06 .44** .12 .13 42 Alignment .05 .01 -.06 -.01 .18* -.10 -.07 .07 .33** .20** 43 Rigor -.12 -.04 .09 -.13 .17* .29** .27** -.12 -.03 -.09 -.01 -.07 -.11 -.13 .03 -.09 -.02 -.12 .22** .35** .25** Note. **p < .01, *p < .05; Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. 41 Engagement -.09 -.17* -.16* -.04 -.05 -.13 .11* .27** .15 265 Teacher-Level Correlations Among Outcome Variables (con’t) Outcome Fall Year 2 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 22 Teacher collective commitment 23 Teacher mutual support 24 Support from district .40** .16* .22** 25 Support from administration .27** .35** .42** 26 Individual teacher morale .22** .24** .17* .22* 27 Engagement -.08 -.07 -.02 -.00 .18 28 Alignment .00 .02 .01 .11 29 Rigor -.05 -.08 .01 .10 .12 .22** .07 .39** .68** Spring Year 2 30 Perception of administrative support for instructional innovation and improvement .28** .33** .37** .41** .28** 31 Teacher collective commitment .39** .35** .15* .23** .26** .04 -.06 .15* -.02 .15 .21** .24** 32 Teacher mutual support .30** .34** -.13 .03 33 Support from district .24** .27** .32** .34** .21* .11 .08 34 Support from administration .25** .29** .30** .37** .27** .12 35 Commitment to change .15 .19** .26** .25** .12 36 Confidence in change .24** .29** .30** .33** .27** 37 Individual teacher morale .22** .26** 38 Professional development 39 Perceived competence 40 RAI 41 Engagement .05 .15* .20* .18** .19** .031 .47** .56** .09 .74** .43** .39** .16 .22** .76** .45** .45** .58** .05 -.05 .77** .31** .28** .45** .38** * .05 .20** .01 .13* .23** .30** .20** .20** .19** .25** .27** .19* -.06 .11 .02 .50** .17* .83** .42** .39** .47** .58** .41** .12 .48** .38** .33** .40** .45** .24** .44** .13 .14 .55** .34** .33** .42** .44** .32** .53** .37** .07 .18* .07 .01 -.03 .19** .28** .20** .16* .20** .09 .17** .34** .24** .06 .11* .21** .04 .04 .02 .22** .28** .27** .15* .22** .07 .25** .35** .19** .32** -.05 .02 .29** -.06 .09 .00 .17 .45 -.01 .10 -.00 -.15 -.12 .02 .05 .02 .17 .12 .22** .27** .01 -.03 -.05 .05 .08 -.06 .02 .12 .04 .09 .10 .32** .02 .11 .07 .45** .70** 43 Rigor -.08 -.14 -.02 -.01 .06 .19** .20** .31** -.05 -.13* -.12 -.02 .01 -.12 .01 .19** Note. **p < .01, *p < .05; Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. 42 Alignment * -.15 .02 -.08 -.06 -.04 -.05 -.04 .09 All other correlations are pooled correlations using 5 multiply imputed datasets. ** * 266 Appendix 14: Student-Level Correlations Among Outcome Variables Outcome 1 2 3 4 5 6 7 Fall Year 1/pre-baseline Attitudes towards school 1 2 Positive teacher support .74** 3 Lack of teacher support .79** .43** .75** .40** .42** 4 Engagement Perceived competence .76** .52** .39** .50** 5 RAI .18** .11** .12** .16** .16** 6 ELA achievement .07** -.05** .11** .06** .05** -.06** 7 Math achievement .04** -.05** .07** .04** .04** -.04* .67** 8 Spring Year 1 9 Attitudes towards school .63** .48** .47** .50** 10 Positive teacher support .50** .47** .34** .35** 11 Lack of teacher support .45** .30** .42** .32** .49** .34** .33** .48** 12 Engagement 13 Perceived competence 14 RAI 15 ELA achievement 16 Math achievement 17 Grade point average 18 Credits earned .51** .38** .34** .39** .10** .07** .06** .08** .09** -.02 .11** .08** .09** -.00 .10** .08** 24 Perceived competence 25 RAI 32 ELA achievement 33 Math achievement 34 Grade point average 35 Credits earned 36 Attendance Note. **p < .01, *p < .05 10 11 12 13 14 15 16 17 18 19 .30** .09** .09** .09** .77** .41** .37** .13** .06** .05** .76** .42** .42** .46** .12** .07** .08** .78** .58** .37** .52** .10** .22** -.06** -.05** .13** .08** .08** .12** .12** .08** -.05** .69** .53** .12** .08** -.02 .49** .54** .12** .02 .12** .10** .11** -.05** .03* .11** .10** .12** -.02 .37** .38** .20** .09** .15** .19** .17** .01 .14** .14** .12** .09** .09** .10** .10** -.01 .18** .19** .08** .04** .06** .07** .07** -.03* .59** -.02 .43** .51** .01 .17** .19** .32** .00 .17** .24** .48** .32** .48** .14** .06** .05** .69** .54** .50** .53** .56** .11** .09** .09** .17** .10** .07** -.02 .55** .53** .35** .38** .45** .08** .00 .01 .08** .06** .03** .39** .11** -.03* .30** .09** .09** .08** .50** .35** .46** .35** .36** .08** .11** .10** .14** .08** .06** .36** .12** .04** .03* .53** .38** .34** .50** .42** .09** .07** .07** .15** .07** .05** .50** .37** .33** .39** .45** .13** .07** .05** .56** .44** .36** .42** .51** .10** .09** .09** .14** .08** .05** .02 .00 .12** .09** .07* .09** .11** .20** -.06** -.04* .12** .10** .09** .09** .10** .22** -.05* -.03* -.01 Spring Year 2 26 Attitudes towards school .55** .41** .40** .43** 27 Positive teacher support .45** .40** .30** .32** 28 Lack of teacher support .36** .24** .34** .26** .44** .31** .29** .41** 29 Engagement 30 Perceived competence 31 RAI 9 .49** .14** .07** .07** .02 .00 .76** .39** .10** .17** .07** .13** .17** .14** .10** .07** .06** .08** .08** .08** .04** .06** .07** .07** 19 Attendance Fall Year 2 20 Attitudes towards school .60** .45** .44** .47** 21 Positive teacher support .49** .45** .34** .35** 22 Lack of teacher support .42** .27** .38** .30** .46** .32** .32** .44** 23 Engagement 8 .45** .33** .29** .35** .09** .07** .06** .07** .09** -.02 .11** .08** .04** -.04** .06** .04* .44** .15** .07** .08** .64** .50** .46** .50** .52** .13** .09** .09** .17** .12** .07** .02 .01 .50** .47** .32** .36** .42** .11** .01 .02 .08** .08** .04** .37** .12** .25** .09** .10** .09** .45** .31** .41** .32** .32** .08** .11** .10** .14** .09** .06** .35** .13** .04* .04** .49** .36** .32** .46** .39** .11** .06** .06** .15** .10** .06** .41** .11** .07** .08** .50** .39** .32** .38** .46** .10** .09** .09** .14** .09** .06** .02 .00 .09** .17** -.06** -.05** .11** .09** .08** .09** .10** .20** -.05** -.03* -.01 .08** -.05** .63** .48** .11** .04** -.02 .40** .42** .07** .20** .07** .16** .20** .16** .12** .07** .08** .11** .11** .08** .04** .06** .07** .07** .01 .12** .09** .10** -.04* .66** .50** .40** .17** .19** -.01 .08** .05** .07** -.02 .42** .51** .30** .14** .13** -.01 .43** .41** .23** .10** .17** .23** .20** -.02 .47** .51** .68** .31** .36** -.01 .17** .22** .14** .08** .10** .12** .12** -.02 .19** .24** .40** .53** .36** -.00 .15** .17** .08** -.01 .14** .18** .32** .23** .55** .05* .05** .07** .07** 267 Student-Level Correlations Among Outcome Variables (con’t) Outcome 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Fall Year 2 20 Attitudes towards school 21 Positive teacher support 22 Lack of teacher support .77** .77** .42** 23 Engagement 24 Perceived competence .75** .43** .43** .79** .59** .39** .51** 25 RAI .16** .10** .10** .14** .15** Spring Year 2 26 Attitudes towards school .67** .54** .48** .51** .55** .15** 27 Positive teacher support 28 Lack of teacher support .53** .52** .34** .37** .44** .11** .77** 29 Engagement 30 Perceived competence .52** .38** .33** .49** .41** .13** .75** .44** .39** .47** .34** .45** .32** .33** .10** .74** .38** .52** .41** .32** .39** .50** .11** .78** .60** .34** .49** 31 RAI 32 ELA achievement .10** 33 Math achievement 34 Grade point average .21** .09** .18** .20** .18** -.02 .23** .11** .18** .22** .20** .12** .07** .08** .10** .11** .00 .13** .07** .09** .12** .11** .08** .04** .07** .06** .07** -.01 .08** .05** .05** .08** .07** 35 Credits earned 36 Attendance Note. **p < .01, *p < .05 .12** .09** .08** .10** .11** .24** .16** .11** .11** .13** .14** .06** .00 .12** .08** .10** -.06** .12** -.01 .08** .04** .07** -.03* .08** .02 .12** .08** .12** -.05** .01 .08** .05** .09** -.02* .53** -.02 .55** .45** .02 .18** .09 .38** -.01 .18** .18** .42** .33** 268 Appendix 15: School-Level Correlations Among Outcome Variables Outcome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 Treatment status Fall Year 1/pre-baseline 2 Attitudes towards school .25 3 Positive teacher support .04 .69** 4 Lack of teacher support .23 .75** 5 Engagement .30 6 Perceived competence .15 .82** .59** .40 .23 -.14 .26 .41 .29 -.12 .05 .47* -.24 -.32 .09 8 ELA achievement .33 .13 -.27 .58** .06 -.07 9 Math achievement .04 .24 .15 .38 -.04 .05 10 Teacher collective commitment .19 .13 -.21 .39 .27 -.10 -.41 .35 11 Teacher mutual support .08 .08 -.17 .37 .16 -.17 -.35 .28 12 Support from district .04 .36 .27 .53* -.02 .02 .17 .55* 13 Support from administration .18 .21 .04 .42 .13 -.12 -.39 .39 14 Engagement .33 -.08 -.13 .16 -.16 -.16 -.26 .59** 15 Alignment .31 .18 .07 .27 .14 -.05 -.20 16 Rigor .32 .19 .06 .16 .26 .07 -.22 .53* .39 .41 .19 .38 .10 -.05 .19 .26 .29 .13 .29 .09 .36 .37 .69** .05 -.24 .34 .50* -.38 -.04 -.02 .10 .01 .13 -.09 .20 .25 .14 .57** .33 .27 -.10 .17 .29 .32 .31 .28 .34 .18 .32 .50* .43 .57** 7 RAI -.50* -.34 .57** .46* .36 .93** .37 .07 -.01 .39 .65** .65** .42 .52* .11 .06 .20 .25 .37 .25 .27 .04 .19 .50* .45* .39 -.06 -.05 .03 .67** .07 .63** .79** Spring Year 1 17 Attitudes towards school .22 18 Positive teacher support .14 19 Lack of teacher support .14 20 Engagement .13 .42 .12 .37 .49* .22 -.27 -.06 .11 .33 .33 -.07 .26 .08 21 Perceived competence .34 .44* .37 .26 .08 .47* .17 .06 .20 .24 .24 .10 .19 .09 .27 .47* -.02 -.06 .24 -.20 -.22 -.06 .87** -.39 -.40 -.30 -.22 .22 -.14 -.35 -.39 -.33 23 ELA achievement .40 .01 -.15 .26 .02 -.15 -.36 .73** .59** .21 .09 .29 .08 .77** .47* .37 24 Math achievement .34 .31 .05 .64** -.09 .07 -.45* .76** .64** .40 .36 .38 .35 .71** .67** .51* 25 Grade point average .27 .33 .17 .01 -.38 .84** .76** 26 Credits earned .13 .24 .46* .12 -.31 .21 27 Attendance Perception of administrative support for 28 instructional innovation and improvement 29 Teacher collective commitment .37 .33 -.02 .61** .18 .03 .14 .17 .12 .24 .08 -.05 -.12 .22 .21 .12 -.17 .47* .09 -.13 -.18 30 Teacher mutual support .15 .20 .07 .39 .01 -.02 -.29 31 Support from district .05 .20 .28 .25 -.18 .04 .10 22 RAI 32 Support from administration .49* .04 .57** .17 .11 .36 .25 .68** .36 .28 .26 .43 .502* .58** .50* .52* -.08 -.04 .11 .23 .30 .17 .33 .40 -.08 -.23 -.20 .07 -.03 .11 -.48* .79** .82** .52* .49* .29 .22 .47* .76** -.08 .40 .45* .79** .73** .23 .59** .28 .36 .69** .77** .07 .71** .19 .19 .11 .59** .68** .27 .12 .29 .07 -.07 .29 .03 -.14 -.37 .36 .18 .42 .44 33 Commitment to change -.03 .20 .24 .04 .12 .12 .20 -.07 .05 -.09 -.22 34 Confidence in change .18 .14 .02 .28 .19 -.15 -.29 .27 .40 .47* .40 35 Individual teacher morale .27 -.16 -.35 -.06 .24 -.14 -.40 .23 .13 .32 .25 .01 36 Professional development .42 .34 .18 .45* .58** .53* .13 .27 .84** .45* .34 .37 .78** .02 .04 .00 -.29 -.38 -.28 .04 .17 .20 -.07 .38 .12 .19 .23 .17 .07 .16 .34 .03 -.02 .36 -.23 .15 .35 .30 .21 .39 -.16 .07 -.24 -.50* -.33 -.30 -.08 -.66** -.36 .12 .35 .25 .33 -.15 .17 .16 .30 .24 38 RAI .20 -.21 -.13 -.18 -.11 -.13 -.17 -.20 -.17 .21 .29 -.46* .32 -.14 .08 -.04 39 Engagement .29 -.01 -.22 .31 .13 -.28 -.35 .61** .33 .19 .19 .25 .04 .73** .56* .51 40 Alignment .15 .19 .11 .22 .11 .03 .16 .08 .27 .32 .20 .09 -.11 -.08 .10 .20 41 Rigor .30 .16 -.17 .24 .42 .06 -.16 .16 .30 .48* .34 -.12 -.05 .14 .34 .53* 37 Perceived competence Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. All other correlations are based on school aggregates of multiply imputed data. 269 School-Level Correlations Among Outcome Variables, Year 1-2 (Part 2 of 6) Outcome 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Spring Year 1 17 Attitudes towards school 18 Positive teacher support .75** 19 Lack of teacher support .89** .40 20 Engagement .82** .38 .84** 21 Perceived competence .83** .73** .58** .49* 22 RAI .13 .42 .04 -.22 .13 23 ELA achievement -.17 -.32 -.08 -.10 -.04 24 Math achievement .24 .00 .32 .18 .27 -.52* .62** 25 Grade point average .10 -.21 .27 .06 .15 -.39 .69** .79** .48* .66** .20 .24 .55* .05 26 Credits earned -.43 .17 .31 .14 27 Attendance Perception of administrative support for 28 instructional innovation and improvement 29 Teacher collective commitment .33 -.03 .46* .28 .34 -.39 .64** .80** .86** .19 .16 .06 .27 .09 .00 .08 -.05 .11 .30 -.15 .25 .01 .37 .16 .21 -.03 .17 .40 .37 -.22 .49* .53* 30 Teacher mutual support .39 .27 .37 .33 .29 -.21 .08 .46* .28 .21 .41 .55* .75** 31 Support from district .20 .26 .24 -.06 .08 .33 -.11 .08 .22 .14 .17 .87** 32 Support from administration .17 .08 .23 .14 .07 -.10 .04 .29 .28 .04 .33 .85** .58** .72** .76** 33 Commitment to change -.03 -.05 .09 -.10 -.14 .28 -.22 -.27 .07 -.34 -.16 .79** 34 Confidence in change .22 .01 .34 .24 .03 -.13 .06 .26 .40 -.21 .37 .94** .67** .67** .69** .84** .63** 35 Individual teacher morale -.02 -.20 .09 .04 -.01 -.24 .03 .17 .19 -.34 .18 .60** .53* .52* 36 Professional development .34 .45* .26 .17 .20 .42 -.17 -.12 -.03 .01 .01 .58** .55* .45* .54* .43 .42 .58** .36 37 Perceived competence -.28 -.29 -.27 -.04 -.25 -.41 .30 .11 .18 -.15 .22 .01 .17 .08 -.19 .24 .19 -.02 38 RAI -.03 .19 -.18 -.05 .00 -.06 -.14 -.05 -.30 -.07 -.15 .34 .35 .56* .24 .58** .02 .37 .55* .30 .13 39 Engagement -.02 -.34 .19 .17 -.15 -.38 .59** .59** .56* .09 .51* -.02 .15 .20 -.17 .05 -.18 .14 .11 -.26 .31 -.34 40 Alignment -.04 -.11 -.04 -.07 .11 .02 .18 .13 .17 -.31 .18 -.23 .30 -.12 -.38 -.39 -.08 -.10 -.28 .03 .19 -.35 .15 41 Rigor .10 -.24 .26 .22 .07 -.19 .13 .21 .19 -.52* .28 .06 .58** .23 -.28 -.05 .05 .28 .37 .19 .15 -.07 .40 .65** .22 .35 .17 .43 .07 .68** .19 .40 .32 .68** -.23 .29 .70** Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. All other correlations based on school aggregates of multiply imputed data. 270 School-Level Correlations Among Outcome Variables, Year 1-2 (Part 3 of 6) Outcome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .41 Fall Year 2 42 Attitudes towards school .16 .23 .14 .24 -.05 .26 .24 -.10 .12 .19 .22 -.03 .04 .22 .30 43 Positive teacher support -.04 .22 .28 .07 -.12 .30 .32 -.36 -.06 .19 .27 -.08 .09 -.03 .20 .23 44 Lack of teacher support .21 .09 .03 .22 -.13 .05 .24 .12 .27 .06 .04 .15 -.05 .48* .31 .40 45 Engagement .09 .17 .03 .20 .05 .18 .13 -.14 .02 .12 .17 -.18 -.01 .08 .19 .34 46 Perceived competence .20 .40 .17 .30 .15 .51* -.03 -.10 .01 .33 .39 -.21 .20 -.12 .20 .29 .23 .69** -.18 -.27 .26 .93** -.54* -.19 -.37 -.32 .20 -.22 -.34 -.26 -.18 .38 47 RAI 48 Teacher collective commitment -.10 .16 -.05 -.37 .23 .28 -.21 -.31 .20 .30 .78** .80** -.12 .53* -.01 .20 49 Teacher mutual support -.07 -.07 -.27 .28 .06 -.31 -.36 .23 .44 .76** .84** -.05 .58** .10 .24 .39 50 Support from district -.27 .12 .18 .31 -.25 -.13 -.05 .23 .27 .30 .18 .00 -.13 .48* .46* .57** .15 .58** .45* .40 .33 .48* .64** .71** .14 .82** .05 .25 .17 .38 .19 -.04 .26 .29 .51* 51 Support from administration 52 Individual teacher morale 53 Engagement 54 Alignment 55 Rigor -.13 .03 .41 .23 .11 -.03 .45* -.10 -.17 -.42 .49* -.02 -.02 -.10 .21 .02 -.27 -.49* .31 .13 -.01 -.26 .33 -.04 -.12 -.46 .55* .38 .10 .02 .14 .05 .21 .22 -.01 .23 .41 .04 -.21 .25 .54* .30 .32 .14 .34 .55* .51* .21 .50* .35 .33 -.26 .49* .54* .32 .33 .18 .25 .55* .62** .72** Spring Year 2 56 Attitudes towards school -.11 .02 .17 -.09 -.14 .07 .29 -.18 .20 .18 .23 .02 .14 .10 .19 57 Positive teacher support -.16 -.13 .26 -.33 -.36 .01 .51* -.42 .01 -.03 .03 -.07 -.04 .00 .09 .15 58 Lack of teacher support .05 .10 .06 .19 -.08 .04 .12 .12 .31 .29 .28 .20 .27 .27 .27 .45* 59 Engagement -.07 .08 .12 -.08 .16 .08 .10 -.23 .08 .16 .26 -.12 .14 .06 .20 .54* 60 Perceived competence -.24 .02 .13 -.16 -.04 .16 .16 -.16 .23 .20 .25 -.06 .09 -.08 .03 .32 61 RAI .00 -.03 .38 -.31 -.41 .16 .82** -.47* -.46* -.32 -.21 -.12 -.36 -.27 -.26 -.16 62 ELA achievement .38 .20 -.27 .54* .23 .06 -.57** .87** .56** .48* .38 .13 .13 63 Math achievement .33 .00 -.04 .37 -.40 -.15 .18 .67** .44* .12 64 Grade point average .24 .37 .03 .64** .16 .09 .38 .29 65 Credits earned .14 .52* .45* .30 .09 .54* 66 Attendance Perception of administrative support for 67 instructional innovation and improvement 68 Teacher collective commitment .26 .22 -.03 .51* .00 .00 -.17 .17 .06 .41 .01 -.14 -.16 .15 .46* .60** .69** .24 .67** -.07 .10 -.08 .38 .08 -.17 -.15 .20 .45* .76** .78** .20 .65** 69 Teacher mutual support -.21 .18 .10 .38 -.02 -.11 -.04 .11 .45* .70** .79** 70 Support from district -.41 .17 .17 .43 -.24 -.10 -.09 .17 .43 71 Support from administration -.03 .24 .05 .50* .01 -.02 -.36 .24 72 Commitment to change -.27 .07 .10 .33 -.22 -.21 .11 73 Confidence in change -.04 .13 -.01 .25 .23 -.11 -.17 74 Individual teacher morale -.10 .07 -.19 .32 .26 -.18 -.35 75 Professional development .12 .29 .19 .24 .32 .05 76 Perceived competence -.20 -.32 -.26 -.27 .02 77 RAI -.18 -.15 -.04 -.07 -.26 78 Engagement -.23 -.23 -.36 .16 79 Alignment -.14 -.14 -.25 .36 .22 .15 80 Rigor .33 .26 .44 .32 .05 .05 .42 -.38 .86** .67** .27 .23 .60** .36 .56* -.03 .43 .34 .36 -.02 .04 .18 -.46* .83** .79** .34 .34 .48* .44 .64** -.20 .64** .17 .41 .37 .57** .42 .38 -.04 .19 .17 -.02 .19 .30 .18 .61** -.06 .22 .29 .54* .37 .65** -.03 .05 .00 .44 .54* .66** .06 .60** .17 .31 .28 .13 .30 .47* .40 .48* -.20 .05 -.12 .05 .41 .54* .59** .10 .59** -.04 .18 .29 .31 .50* .70** .71** .25 .75** -.07 .08 .15 -.02 -.03 .30 .57** .62** .06 .55* -.19 .15 .39 -.29 -.26 -.01 .09 .37 .39 -.18 .33 -.26 -.31 -.01 -.12 -.16 .03 .16 .39 .51* -.05 .57** -.19 -.18 -.05 -.14 -.35 -.51* .37 .16 .40 .46* -.01 .29 .19 .08 .10 .13 -.14 -.15 -.40 .23 .01 .41 .48* -.06 .41 -.03 -.12 .05 .22 -.11 .26 .00 .28 .04 -.09 -.11 .09 -.14 .52* .31 .32 .43 .52* Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. All other correlations are based on school aggregates of multiply imputed data. 271 School-Level Correlations Among Outcome Variables, Year 1-2 (Part 4 of 6) Outcome 17 28 29 30 31 .20 .04 .31 .30 -.05 -.04 .07 .22 .29 .09 .29 .10 .10 .37 .37 .20 -.05 .24 -.18 21 22 23 24 25 26 .53* .66** .25 -.14 .16 -.06 .32 .34 .55* .24 -.39 .03 -.20 .42 .36 .43 .24 .16 .24 .14 .18 .48* .71** .60** .48* .24 -.19 -.01 -.21 .11 .03 -.29 .15 -.11 18 19 20 27 34 35 36 37 38 39 40 -.06 .06 -.01 .26 -.37 .12 -.11 -.13 .11 -.10 -.07 -.15 .22 -.36 .17 -.37 -.19 -.17 -.01 .07 .10 -.04 .15 -.25 -.06 .20 -.02 .20 .14 -.05 .14 .17 .40 -.26 .21 -.08 -.17 .23 .15 -.27 -.02 .06 .14 -.34 .27 -.36 -.10 .02 -.17 32 33 .12 .06 .12 -.01 .17 .11 .33 .10 .33 .04 41 Fall Year 2 42 Attitudes towards school .78** .62** .72** 43 Positive teacher support .60** .68** .40 44 Lack of teacher support .56** 45 Engagement .71** 46 Perceived competence .74** .63** .55* .35 .63** .53* .79** 47 RAI .24 .63** .03 -.15 .27 .82** -.43 -.43 -.32 .22 -.38 .00 -.21 .25 -.28 .30 -.39 .46* -.40 -.17 -.45 .13 -.19 48 Teacher collective commitment .26 -.02 .37 .29 .19 -.07 -.03 .19 .14 -.31 .40 .39 .85** .71** .17 .52* .02 .57** .59** .39 .29 .46* .10 .17 .58* 49 Teacher mutual support .25 -.01 .38 .32 .07 -.19 .04 .29 .19 -.16 .44 .40 .80** .80** .22 .52* -.01 .58** .42 .35 .45* .39 .25 .12 .45 50 Support from district .12 .04 .28 .13 -.22 .00 .06 .17 .23 .20 .24 .39 .01 .47* .31 .31 .31 -.14 .00 .09 -.16 .32 -.35 -.27 51 Support from administration .24 .07 .33 .28 .05 -.34 .37 .57** .41 .22 .59** .45* .53* .72** .35 .63** .03 .55* .28 .11 .30 .31 .51* -.16 .14 52 Individual teacher morale .09 .02 .11 .12 .04 -.37 .11 .41 .43 .03 .50* .57** .59** .74** .43 .73** .08 .70** .52* .32 .50* .49* .04 -.17 .01 53 Engagement -.44 -.53* -.25 54 Alignment .45 55 Rigor .19 -.37 -.39 -.39 .59** .53* .49* -.27 .43 -.08 .13 .00 -.08 .03 -.18 -.02 -.02 -.41 .12 .07 .27 .15 .06 .08 .62** .45 .26 -.02 .17 .16 .37 -.13 .59* .42 .45 .44 .25 .33 .25 .55* .42 .30 .19 .01 .31 -.03 .43 .26 .57* .44 .52* -.18 .49* .62** .58* .30 .76** .09 .33 .49* .07 .21 -.19 .23 .21 .08 -.07 .00 .44 .06 .31 .64** .59** .49* .40.65** .21 -.18 -.02 -.07 .35 .07 .15 .20 .31 .15 .03 .14 .17 -.02 .25 .06 .06 -.12 -.03 .03 .38 .62** .12 .09 .46* .35 -.21 -.20 -.27 .37 -.20 .00 -.03 .07 .10 -.12 .10 -.05 -.21 .24 .02 .16 -.37 -.07 -.23 .42 .63** .54* Spring Year 2 56 Attitudes towards school 57 Positive teacher support 58 Lack of teacher support .65** 59 Engagement .76** .44.63** .09 .01 .26 .19 .26 .27 .26 .40 .44 .20 .17 .19 .32 .08 .17 -.01 .04 .10 .04 .17 .55* .69** .68** .58** .17 -.32 -.10 -.13 .33 .10 .15 .14 .37 .12 .12 .00 .22 .20 .36 .02 .03 .03 -.26 .11 .07 -.19 -.14 -.11 .24 .05 .06 .11 .17 .04 -.07 .11 .08 -.04 .13 .23 -.07 -.15 .10 .10 -.46* -.41 -.52* .31 -.52* -.19 -.14 -.10 .00 -.26 .02 -.33 -.24 .25 -.35 .00 -.31 .08 -.11 60 Perceived competence .44 .40 .28 .28 .56* 61 RAI .18 .58** -.11 -.19 .35 .70** 62 ELA achievement .05 -.32 .26 .03 .13 -.34 .57** .58** .71** -.04 .80** .35 .40 .23 .33 .48* .07 .38 .33 -.15 .06 -.06 .35 -.02 .14 63 Math achievement .05 .02 .08 -.10 .11 -.23 .41 .02 .11 .11 .16 .23 -.27 .04 .01 -.20 -.04 -.07 .41 -.12 -.20 64 Grade point average .03 -.27 .22 -.01 .09 -.41 .65** .79** .94** .10 .80** .22 .33 .32 .15 .23 .03 .31 .15 -.21 .12 -.25 .63** .16 .21 65 Credits earned .43 .40 .21 .23.67** -.26 .44 .40 -.21 .18 .35 -.21 -.14 -.34 -.09 -.05 .04 .02 -.11 .11 .38 .30 66 Attendance Perception of administrative support for 67 instructional innovation and improvement 68 Teacher collective commitment .13 -.13 .27 .03 .17 -.41 .63** .81** .87** .20 .92** .25 .34 .34 .25 .34 -.07 .33 .20 -.15 .21 -.16 .54* .01 .12 .32 .16 .43 .31 .08 -.07 -.08 .25 .26 -.08 .40 .62** .72** .72** .31 .71** .31 .49* .30 .27 .08 .01 .26 .42 .54* .78** .58** .16 .44 .30 .53* .47* .58** .33 .12 .45* .26 .18 .03 -.05 .29 .28 -.17 .43 .54* .90** .77** .42 .56* .15 .67** .47* .64** .22 .33 .02 .15 .46* .34 .45* .36 .31 .04 -.05 .24 .21 .13 .40 .39 .76** .76** .36 .45* .01 .51* .21 .62** .29 .28 -.06 .10 .20 70 Support from district .29 .21 .38 .20 .05 -.02 -.10 .29 .28 .14 .32 .55* .68** .58** .53* .23 .55* .14 .42 .15 .20 -.03 -.20 -.12 71 Support from administration .39 .17 .49* .47* .09 -.29 .07 .39 .25 .11 .47* .35 .57** .10 .58** .25 .27 .21 .33 .25 -.16 .18 72 Commitment to change .06 .09 .08 -.05 .04 .05 -.10 .23 .27 -.08 .20 .28 .21 .41 .03 .37 .29 .01 -.08 .38 .15 73 Confidence in change .34 .11 .46* .38 .07 -.02 -.10 .07 .16 -.22 .35 .66** .66** .66** .45* .57** .41 .76** .46* .52* .29 .34 .08 -.06 .37 74 Individual teacher morale .13 -.11 .29 .15 .02 -.14 .03 .23 .42 -.26 .49* .64** .76** .68** .47* .64** .28 .74** .56* .47* .34 .33 .00 -.03 .27 75 Professional development .48* .35 .46* .43 .33 .15 -.16 .00 .06 -.05 .31 .38 .68** .21 .28 -.01 .16 .42 76 Perceived competence -.22 -.21 -.16 -.17 -.18 -.04 -.27 -.16 -.06 -.27 .00 .26 .28 77 RAI .12 .24 .03 -.10 .21 .03 -.30 .09 .03 .14 .13 .35 78 Engagement -.17 -.34 -.02 -.02 -.19 -.46* .08 .44 .28 .00 .22 79 Alignment .10 -.03 .15 .09 .11 -.22 -.21 .28 .09 .14 .14 80 Rigor .32 .28 .21 .15 .45* -.08 .36 .47* .24 .61** .24 69 Teacher mutual support .54* .47* .59** .76** .35 .58** .37 .53* .63** .63** .27 .39 .49* .25 .62** .32 .20 .33 .04 .30 .48* .26 .31 .23 -.07 -.04 .16 .38 .58** .46* .54* -.03 .34 .40 .29 .12 .51* -.34 -.30 -.22 -.05 .24 .41 -.10 .18 -.34 .09 .16 -.29 .31 .05 .38 .01 .06 .07 .25 .47* .13 .34 -.27 .12 .24 -.10 .02 .18 .07 -.16 -.07 -.39 -.23 .00 -.18 -.20 -.46* -.40 -.32 -.38 -.38 -.13 .23 -.07 -.24 Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. All other correlations based on school aggregates of multiply imputed data. 272 School-Level Correlations Among Outcome Variables, Year 1-2 (Part 5 of 6) Outcome 42 43 44 45 46 47 48 49 50 51 52 53 54 55 Fall Year 2 42 Attitudes towards school 43 Positive teacher support .77** 44 Lack of teacher support .86** 45 Engagement .81** 46 Perceived competence .72** .74** .46* .38 .70** .31 .56** 47 RAI .25 .35 .19 .16 .06 48 Teacher collective commitment .31 .13 .20 .40 .35 -.30 49 Teacher mutual support .26 .09 .22 .37 .20 -.33 50 Support from district .12 -.02 .29 .12 -.20 .00 51 Support from administration .27 .04 .35 .25 .11 -.40 52 Individual teacher morale -.02 .11 -.14 -.07 .17 -.38 53 Engagement -.28 -.32 -.07 -.36 -.30 -.48* 54 Alignment .38 -.03 .49* .50* .16 -.06 55 Rigor .37 .06 .40 .36 .35 -.12 .90** -.08 .23 .47* .67** .64** .61** .69** .05 .21 .60** .11 .01 .20 .21 .62** .63** .33 .48* .35 -.03 .38 .35 .03 .41 .21 .26 .67** .30 .26 .30 .13 .20 .11 -.49* .35 .15 .48* -.01 .02 .04 .00 .00 -.45 .01 -.14 -.23 Spring Year 2 56 Attitudes towards school .73** .64** .64** 57 Positive teacher support .60** .71** 58 Lack of teacher support .74** 59 Engagement .61** .42 60 Perceived competence .42 .42 .29 .25 .42 .20 .24 .26 .08 .10 61 RAI .28 .31 .18 .25 .18 .76** -.21 -.27 -.15 -.31 62 ELA achievement -.01 -.28 .11 -.03 .10 -.52* .32 .27 .23 .48* .37 63 Math achievement .14 .09 .27 -.10 .04 -.28 -.15 -.07 .26 .42 .21 -.08 -.25 .14 -.21 -.13 -.35 .14 .21 .29 .46* -.28 .07 .12 -.10 .39 .62** .49* .51* .48* .62** .53* .71** .73** 64 Grade point average .48* .48* .50* .26 .36 .52* .77** .50* .41 .11 .36 .39 .18 .35 .17 .40 .30 .45* .69** .48* .21 .35 .37 .14 .17 .10 -.66** .60** .38 .07 .47* .15 .15 .24 .32 .32 .06 -.27 -.25 .49* .46 .53* .41 -.14 .23 .35 .17** .37 .62** .18 .23 -.01 66 Attendance Perception of administrative support for 67 instructional innovation and improvement 68 Teacher collective commitment .09 -.06 .24 .33 .27 .27 .31 .22 -.08 .66** .81** -.06 .54* .31 .20 .20 .39 .28 -.07 .85** .87** .11 .46* .70** .04 .54* .32 69 Teacher mutual support .37 .31 .20 .43 .38 .04 .71** .81** .22 .49* .68** -.14 .46 .29 70 Support from district .27 .27 .20 .26 .15 .02 .41 .65** .60** .57** .68** .03 .31 .07 71 Support from administration .41 .24 .35 .43 .29 -.28 .57** .78** .62** .89** .64** .04 .52* .31 72 Commitment to change .07 .21 .02 -.08 .07 .08 -.06 -.07 -.32 .06 .09 .11 65 Credits earned -.10 -.39 -.54* -.45* -.64** .34 .44 73 Confidence in change .35 .20 .31 .40 .20 -.08 .75** .80** 74 Individual teacher morale .08 .05 .00 .13 .15 -.25 .79** .78** .73** .70** 75 Professional development 76 Perceived competence 77 RAI .25 .30 .55* .44* .64** .64** .19 .46* .85** .14 .56* .13 -.11 .76** .27 .17 .64** .28 .24 .11 .08 .41 .34 .13 .10 .39 .48* -.37 -.25 -.48* -.13 -.11 -.14 .48* .46* -.19 -.03 .44 -.31 .67** -.12 .17 .68** .42 .20 -.09 .05 .07 .27 -.19 .05 .34 -.02 .48* .48* -.05 -.04 .07 78 Engagement -.27 -.17 -.26 -.24 -.14 -.52* .28 .44 .00 .23 .42 .71** -.37 .00 79 Alignment -.05 .07 -.22 -.01 .20 -.33 .35 .41 -.10 .12 .41 .17 -.42 .11 .29 .24 .30 .06 .25 -.01 -.29 -.29 -.20 -.04 -.27 .26 -.19 .49* 80 Rigor Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables were conducted on non-imputed data. All other correlations are based on school aggregates of multiply imputed data. 273 School-Level Correlations Among Outcome Variables, Year 1-2 (Part 6 of 6) Outcome 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 -.06 Spring Year 2 56 Attitudes towards school 57 Positive teacher support .86** 58 Lack of teacher support .90** .66** 59 Engagement .71** 60 Perceived competence .85** .70** .63** .60** 61 RAI .40 .59** .35 .47* .17 62 ELA achievement -.15 -.39 .07 -.15 -.05 -.58** 63 Math achievement -.10 -.07 .11 -.29 -.30 -.20 .46* 64 Grade point average -.11 -.33 .20 -.18 -.14 -.47* .70** .53* 65 Credits earned .29 .11 .24 .28 .42 .17 .08 .02 66 Attendance Perception of administrative support for 67 instructional innovation and improvement 68 Teacher collective commitment .02 -.18 .21 -.04 .03 -.51* .38 .20 .44 .30 .32 -.20 .22 .03 .24 .04 .35 .25 .03 .36 .33 .16 -.17 .24 -.03 .21 .17 .28 .81** 69 Teacher mutual support .40 .22 .41 .43 .34 -.05 .12 -.05 .15 .31 .23 .83** .91** 70 Support from district .29 .16 .36 .25 .16 -.16 .15 .14 .27 .04 .33 .85** .74** .82** 71 Support from administration .30 .08 .40 .33 .21 -.29 .29 .15 .29 .09 .42 .89** .64** .70** .77** 72 Commitment to change .22 .22 .27 -.11 .22 .02 .04 .18 .19 .07 .19 .72** .60** .64** .66** 73 Confidence in change .40 .18 .41 .45* .37 -.24 .24 -.20 .14 -.02 .29 .91** .76** .71** .65** .81** 74 Individual teacher morale .13 -.08 .22 .20 .15 -.43 .44 -.06 .36 .06 .42 .80** .87** .77** .71** .62** 75 Professional development .25 .28 .33 .82** .65** .85** .29 .32 .07 .25 .54* .40 .00 .17 -.37 .00 .29 -.18 -.32 -.26 .18 .01 -.08 .04 -.34 -.11 .07 -.01 .16 .10 .12 .25 .14 .03 .10 .01 -.01 .19 78 Engagement -.20 -.38 .01 -.06 -.23 -.32 .11 .23 .36 79 Alignment -.05 -.26 .08 .18 -.09 -.10 .09 .13 .12 .10 .07 .25 .04 -.15 .07 .09 .47* .30 76 Perceived competence 77 RAI 80 Rigor .12 .66** .73** .74** .44 .43 .54* .80** .43 .54* .46* .32 .20 .03 .07 .25 .16 .37 .57** .55* .55* .27 .28 .30 .59** .15 .26 .15 .32 .24 .34 .20 .19 -.03 .30 -.10 .20 .13 .12 .38 .31 .35 .16 .10 -.01 .31 .36 .21 -.47* -.31 -.24 -.23 -.22 -.43 -.53* -.43 .07 .62** .73** .73** -.40 -.39 -.06 -.06 .17 .29 .78** .66** .51* .42 .36 .70** .49* .48* Note. **p < .01, *p < .05; N=20; Correlations are based on school aggregated student and teacher data. Correlations between EAR observation outcomes (i.e., engagement, alignment, and rigor) and all variables conducted on non-imputed data. All other correlations based on school aggregates of multiply imputed data 274 Appendix 16: Child-Level Interactions There were 12 main student outcomes tested: (1) Y1 Students’ Attitudes, (2) Y2 Students’ Attitudes, (3) Y1 Math Achievement, (4) Y2 Math Achievement, (5) Y1 ELA Achievement, (6) 3) Y2 ELA Achievement, (7) Y1 GPA, (8) Y2 GPA, (9) Y1 Credits Earned, (10) Y2 Credits Earned, (11) Y1 Attendance, (12) Y2 Attendance. For each, the sixth model added eleven interactions: (1) treatment X baseline, (2) treatment X gender, (3) treatment X Hispanic, (4) treatment X Black, (7) treatment X Asian/Pacific Islander, (6) treatment X American Indian/Other, (7) treatment by free/reduced price lunch, (8) treatment X special education, (9) treatment X ELL, (10) treatment X math test type where applicable, and (11) treatment X semesters enrolled. For the first ten interactions, no interpretable pattern appeared. The table below shows all interactions that were marginal or significant. Those not tabled had p values higher than .10. When the treatment X semester interaction was significant, it is presented in the main body of the text. Note: These are fixed effect models and random effects for these interactions were not included. Y1 students' attitudes Y2 students' attitudes Y1 math achievement Y2 math achievement Y1 GPA Y2 GPA Y1 credits earned Point-in-time models Tx x baseline Tx x Hispanic Tx x Asian/Pacific Isl. Tx x special education Tx x math test type Growth models Tx x ELL Implementation models Tx x baseline Tx x Hispanic t -0.06** 0.04t (0.02) (0.02) -0.05* (0.03) -0.07*** (0.02) -0.08t (0.04) -0.08* (0.04) 0.00* (0.00) -0.001* (0.00) < .10, *<.05, **<.01, ***<.001 not tested not tested 0.00t not tested not tested (0.00) -0.09t (0.05) -0.00** (0.00) not tested 275 Endnotes 1 There are growing literatures on instructional improvement in preschools and elementary schools that are not covered here. 2 The name “Literacy Matters” was adopted by IRRE after the ECED Efficacy Trial was underway. Schools participating in the ECED Efficacy Trial knew it as ECED Literacy. 3 As with “Literacy Matters”, the name “Reading Matters” was adopted after the ECED Efficacy Trial was underway. The schools participating in this project called it 9th-Grade ECED Literacy. 4 As with the other names, “Writing Matters” was adopted after the ECED Efficacy Trial was underway. The schools participating in this project called it 10th-Grade ECED Literacy. 5 This blocking at the district level was necessary for recruitment and effective use of project resources. We did not believe that districts would agree to participate in the study unless they were guaranteed that some of their schools would receive the supports. Further, because IRRE consultants had to travel to provide the supports, it would have been inefficient to have fewer than two treatment schools in the same geographic region. 6 Although schools did not pay IRRE for the supports, there were some costs associated with participation. The treatment schools had to cover the costs of: a .50 FTE literacy coach and a .50 FTE math coach, three professional development days for each ECED teacher during two summers, substitute teachers so that each ECED teacher could participate in four half-day PD sessions per year, PDAs for at least 3 district leaders and 5 school leaders, and photocopying of the student materials for the FTF Literacy curriculum. Additionally, staff time had to be devoted to coordinating the ECED supports in treatment schools, coordinating the research activities in all schools, and providing the research team with student records. 7 In spring of 2009, District 4 agreed to participate, signed the MOU, and participated in the random assignment process with the intention that they would begin participation in the summer of 2009 as part of Recruitment Group 1. However, very shortly after signing the MOU and prior to starting any supports, several members of the district’s leadership – including the superintendent – left the district. After negotiating with the new interimsuperintendent, that district’s participation was delayed one year and they participated in the second recruitment group, which began in the summer of 2010. 8 As noted below, two schools served only 9th-graders during the first year of ECED and therefore had very small enrollments. 9 Schools 4 and 5. 10 Schools 6, 7, and 19. 11 Schools 18 and 20. 276 12 School 2, in District 1, part of Recruitment Group 1. 13 School 17, in District 5, part of Recruitment Group 2. 14 Our definition of ‘target teacher’ changed several times during data collection. This document summarizes only our final decision about which teachers and teacher data to include. 15 Three individuals taught both math and English/literacy and are counted in both groups. 16 Teachers who changed schools during the course of the study and taught a target math class at more than one school are treated as two separate individuals in these data. These 238 math teachers 232 different individuals. 17 Because different districts use different names for 9th- and 10th-grade English, we counted any class for which a student got a regular, required English credit as 9th- or 10th-grade English. So, for example, English as a Second Language (ESL) and remedial English classes counted as regular English if they replaced the required regular English course and the enrolled students did not have to ‘make-up’ the English credit later. If, on the other hand, the student did have to take 9th- or 10th-grade English upon successful completion of ESL or remedial, those courses were not counted as 9th- or 10th- grade English. 18 Three of these individuals taught math and English/literacy and are counted in both groups. 19 Teachers who changed schools during the course of the study and taught a target ELA class at more than one school are treated as two separate individuals in these data. These 298 cases actually represent 295 different individuals. 20 Some treatment schools offered ECED Literacy in the fall only and 9th- and 10th-grade English in the spring only, taught by the same teachers. In those cases, the teachers who taught ECED Literacy in the fall continued to receive ECED supports in the spring. 21 This section refers to any course at the school, whether or not it was targeted by ECED. The next section refers only to the ECED target courses of ECED Literacy, 9th/10th-grade English, Algebra 1, or Geometry. 22 Two schools (schools 4 and 14) included accelerated or advanced programs for high achieving students from throughout the district. Students in those programs were excluded from the study all together and are excluded from all statistics in this report. The decision to exclude the students in these programs was made prior to the districts’ agreeing to participate and prior to the random assignment. Students in those programs have very full course schedules and it would not have been possible to add the ECED Literacy course to their schedules, so those schools could not have participated if the project had not agreed to exclude the students in those programs. School 4 ended up being in the control condition and school 14 in the treatment condition. The students at school 4 in the accelerated program are technically enrolled in school 4. The students in the accelerated program at school 14 are 277 technically in a separate school with its own school administration and federal identification number; however, the program is physically housed on the same campus as school 14. 23 This variable refers to the student’s age when s/he took the Wave 1 survey. For students who did not take the Wave 1 survey (including all students in Grade Cohort 3), it is the student’s age on the average date when students in his/her district took the Wave 1 survey. 24 The name ‘Literacy Matters’ was adopted by IRRE after ECED was underway. Participating schools knew these courses simply as ECED Literacy. 25 These students were also supposed to have an equivalent amount of 9th-/10th-grade English, but those courses and their teachers were not specifically targeted by the intervention. 26 These two schools did not offer Literacy Matters in Year 2 because they had stopped participating. So, no student in these schools took the prescribed amount. 27 Reminders were sent each wave except for Wave 1 for teachers in schools in the first recruitment group. The decision to add a fall teacher survey was made at the last minute and we had not obtained contact information for the teachers. For that reason we were unable to follow-up with teachers who did not return the survey. 28 The different procedures for the fall and spring administration were due to budget constraints. The fall surveys were not part of the original study design and the cost of their administration was not included in the subcontract with IRRE, who manages and maintains the on-line survey system. 29 Number of teachers who responded divided by n. 30 These teachers were not asked to participate because they were not teaching a course that was targeted that term and had not taught a target course in the past so were not yet part of the sample. 31 Schools 2 and 17 dropped out of the study. School 2 did not allow teacher surveys in Waves 2 or 3; school 17 did not allow teacher surveys in Waves 3 or 4. 32 Includes teachers who were asked to participate but did not, and those who should have been asked to participate, but were not due to an oversight or because they did not appear on the schedules provided by the schools. Most likely this last group was hired after the surveys were conducted. 33 Number of teachers who responded divided by n minus those not at the school at the time of administration. 34 The dropped three items were: (1) My job had become just a matter of putting in time; (2) Time goes by very slowly when I’m at work; and (3) When I’m teaching, I feel bored. 35 The RAI score was calculated using the scoring methods established by the authors: the mean 278 of the two external items was weighted -2, the mean of the two introjected items was weighted -1, the mean of the two identified was weighted +1, and the mean of the two intrinsic items was weighted +2. In other words, the controlled subscales are weighted negatively, and the autonomous subscales are weighted positively. The more controlled the regulatory style represented by a subscale, the larger its negative weight; and the more autonomous the regulatory style represented by a subscale, the larger its positive weight. (http://www.selfdeterminationtheory.org/questionnaires/10-questionnaires/48). 36 E1 and E2 must be multiplied together to be meaningful because E2 refers to the proportion of those students who were on task (in E1) who were actively engaged. 37 Observer 3 had been an the interim superintendent in a District 3 the year before she began working for ECED, but she did not collect any data in that district at any point in the project. 38 The only exception was District 2, Wave 2, when there was a one-week break between the two weeks of data collection. 39 Number of teachers observed divided by n. 40 School 2 and 17 dropped out of the study. School 2 did not allow observations in Waves 2 or 3; School 17 did not allow observations in Waves 3 or 4. 41 These individuals should have been observed by were not. Reasons include changes in teacher assignments that research team learned about after observations, oversight, miscommunication with data collectors, changes in definitions of target teachers, and teacher refusals. 42 Number of teacher observed divided by n minus the number not teaching target course and no longer/not yet at school. 43 These scores were calculated by calculating the ICC between each of the two observers’ scores to the consensus score and then taking the average of those values. Of the 281 visits in which two observers were present, we had consensus scores for 277 cases. The others were missing due to technical difficulties or observer error. 44 Includes students that should have been asked to complete the survey but were not due to some type of error or misunderstanding with the school. It includes students who the school indicated were in self-contained special education or ‘newcomer’, but district records did not concur. Also, at Wave 1, one treatment school mistakenly excluded students who were not enrolled in Literacy Matters. 45 Includes students who were chronically absent, truant, suspended, in the process of being expelled, incarcerated, or home-bound due to illness/pregnancy. 46 These students disenrolled between the time the roster was created and the administration of the survey. It includes ‘no shows’— 279 students who enrolled but never attended—if they were not removed from the rosters by the time they were produced for the ECED Efficacy Trial. 47 School 2 stopped participating in ECED supports after Wave 1. They administered student surveys to all 9th- and 10th-graders in Wave 1 and to 10th-graders only in Wave 4. They did not administer student surveys at all in Waves 2 and 3. 48 As with teacher questionnaires, the items comprising the Relative Autonomy Index (RAI) were left out of the exploratory analyses. RAI scores for students were calculated using the same formula described for teacher RAI scores. 49 The five dropped items were: (1) My teachers are fair with me; (2) When I’m doing a class assignment or homework, it’s not clear to me what I’m supposed to be learning; (3) Students in my school get to know each other well; (4) Students in my school show respect for each other; and (5) In my school, the students push each other to do well. 50 As noted above, parents were given a way to exempt their student’s records from being released. 51 Some districts provided a three response categories for free/reduced price lunch (free, reduced, paid), whereas others only provided free/reduced lunch (yes or no). When a district provided three response categories, the categories of free and reduced were collapsed into a single category, resulting in a dichotomous variable that was comparable across districts. Similarly, some districts were able to provide more detail regarding special education and English-language learner status than others. To make them comparable across districts, a dichotomous variable for each student for each year was created. The special education variables indicate if the student did or did not receive special educational services that year. Students in self-contained special education are not in the study, so those in special education were all in inclusive settings. The English fluency variables refer to whether or not a student received services for English language learners (ELL) that year. Students who did not receive ELL services – either because they were native English speakers or because they had achieved a level of fluency in English that no longer required special supports according to the district – were labeled as ‘no EL’ (0). Those who received services are labeled as ‘EL’ (1). As with special education, students whose English skills were so limited as to be excluded from the regular curriculum (‘newcomers’) are not in the study sample. 52 10th-grade tests scores were not requested for students who were in 9th-grade in the second year of the study (Grade Cohort 3) because those were administered the year after data collection ended. 53 These general rules were constructed by the ECED research team, in conjunction with an expert in missing data analysis, Dr. Jennifer Hill, Associate Professor of Applied Statistics at New York University. 54 Prior to setting this rule, we checked with each district to ensure that there had been no major changes to their testing system that would make combining across years unadvisable. The 280 only such change was in District 3 between 2009 and 2010. See the specific information about how District 3 tests were handled for more information. 55 Note that the tests are ordered from lowest to highest level in Appendix 10. 56 District 4 did ultimately provide the needed data on August 26, 2013, four days before the end of this grant. By that time, those data had been imputed and analyses conducting using the imputed data. We will be able to incorporate that districts’ attendance data into future analyses, but they are not included in the analyses in this report. 57 As noted earlier, one district sent the attendance information after these analyses were complete. We will be able to incorporate that districts’ attendance data into future analyses, but they are not included in the analyses in this report. 58 At a later time, we will tests the impact of a single year of exposure to ECED on the students in grade cohorts 2 and 3. 59 ICC(2) = k x ICC(1)/1 + (k – 1) X ICC(1) with k being the average group sample size. 60 The EAR protocol was not available to control schools, so there was no possibility of control schools using it. 61 Two treatment schools discontinued their participation in ECED supports prior to the second year. Those two scores had year 2 implementation scores that were similar to those seen in the control schools. 62 We have been working with a post-doctoral researcher at Columbia University with expertise in imputation to create the multiply imputed data sets. She has not yet had time to complete the imputation of the EAR Protocol data. For that reason, we are presenting non-imputed data here. We will conduct these same analyses with imputed data when they become available. 63 The deviance statistic is not provided in HLM 6 when multiply imputed data is used. Instead, we checked the deviance statistics using one of the five imputed datasets. 64 As noted above, we do not yet have the needed imputed data for the EAR Protocol analyses. For that reason, we are not presenting EAR Protocol data for the ELA teachers at this time. 65 The deviance statistic is not provided in HLM 6 when multiply imputed data is used. Instead, we checked the deviance statistics using one of the five imputed datasets. 66 The four that left during the study were in Districts 1, 3, 4, and 5. The one that left just as the project was ending was in District 2. 67 Schools 1, 4, 6, 7, 10, 12, 14, 15, 17, 18, and 20 68 District 1 281 69 In Year 2, one of the two treatment schools in that district did provide the entire ECED Literacy curriculum and entire ECED Literacy curriculum to lower-achieving 9th-graders. 70 District 2, School 2. 71 Syntax for this comparison is in file called ‘Fs in Algebra 1 and Geometry.’ 72 Is would be counted as the same as an F if the I was never raised to a passing grade. Theoretically students could have changed Is to a passing grade after we received the data files from the district. However, we typically received the files well after term had ended – often as much as a year later– so such changes were likely rare. 73 ‘RG’ refers to recruitment group. 74 Questions 59a and 59b were asked at Wave 4 only because no teacher was using the Year 2 curriculum during Year 1. 75 Observers 5 and 7 76 Observers 1 and 6 77 Observers 2 and 4 78 Observer 3 79 Observer 7 80 Observer 6 81 The schools in District 5 had very complicated schedules, where many courses met less often than daily. These were all weighted accordingly, using 1 period of a regular day as the standard.
© Copyright 2026 Paperzz