Protocol: Influence of Inquiry-Based Science Interventions on Middle School Students’ Cognitive, Behavioral, and Affective Outcomes Louis Nadelson, Susan Williams, & Herbert Turner III BACKGROUND This proposed systematic review will gather and summarize empirical evidence of the effectiveness of inquiry-based science instruction on cognitive, affective and behavioral outcomes of students in grades 5 through 8. This is of particular importance in the United States, given that the No Child Left Behind federal law requires states to assess student learning in science. According to the 2003 Third International Math and Science Study (TIMSS), almost 35% of the 46 participating countries indicated that they place ―a lot of emphasis‖ on inquiry skills (i.e., generating hypotheses, designing, and conducting investigations). This systematic review will evaluate the empirical relationship between student outcomes and inquiry-based science curricula resulting in data that can benefit educators, decision makers, and curriculum developers in upper elementary schools, and middle schools within the United States with potential for international implications. Inquiry-Based Science Pedagogy versus Traditional Science Pedagogy In a traditional science classroom, the teacher uses a textbook and worksheets to transmit knowledge and explain conceptual relationships to students. Students listen to their teachers and are later responsible for memorizing the information to restate on a test or some other written evaluation of knowledge. In this type of classroom, the teacher and printed sources (texts, worksheets) are the source of authority and students are passive receivers of information. All the students learning in these traditional instructional environments are expected to complete the same teacher prescribed activities (Anderson, 2002). The instructional approach in an inquiry-based science classroom looks very different from a traditional science classroom. Teacher and student roles are shifted, as well as the kinds of learning activities and evidence for learning. Teachers in inquiry classrooms take on the role of a coach and model learning and inquiry processes for students. In inquiry instruction students may be responsible for naming the scientific question under investigation, designing investigations to research their question, and interpreting findings from investigations. In an inquiry based curriculum, students may be working on different research questions, or may be at different phases in the research cycle. Students engaged in an inquiry curriculum are required 1 The Campbell Collaboration | www.campbellcollaboration.org to be more self-directed and along with the teacher there are multiple forms of knowledge authority in the classroom. For example, most middle school Life Science curricula include a unit on plants and adaptation. In these units, students typically learn about the function of a leaf (i.e., photosynthesis and transpiration) and in some classes how different leaves survive in varying climatic conditions. Depending on the pedagogical approach of the teacher, students in different classes can learn the same content, but in very different ways. Sandoval and Millwood (2007) describe an inquiry-based unit on plants and adaptation in a middle school science classroom. Students began the unit by being asked to look at a picture of some local mountains and report what they noticed. Students noticed that the colors of the vegetation in the photo looked different at different elevations. Students’ initial questions guided the development of their small group investigations as they relied on a variety of environmental data from a sensor network. At the end of the unit, students wrote explanations about why the plants looked different, which included issues of adaptability, climate, and elevation, among others. Student explanations went through a blind peer review process from their classmates. The unit culminated in a class discussion during which students developed a shared understanding of the environmental factors that contributed to the difference in appearance of the vegetation in the photograph. In this curriculum students became active researchers, gaining skills associated with scientific research and learning through a process of active engagement. Why Inquiry-Based Science? For generations, science teaching has relied on methods that train students to follow directions with little connection to doing authentic scientific research. Although students have become accustomed to this method of learning, most do not form a deep conceptual understanding of science (National Center for Education Statistics, 2001). The National Science Education Standards (National Research Council, 1996) urge the use of inquiry instruction to help students develop deeper conceptual understanding in science. The prediction is that inquiry may give students control over their own formation of scientific knowledge. Students become responsible for identifying problems that need investigation, forming hypotheses, designing methods for testing their hypotheses, carrying out their methods, analyzing results, and finally forming conclusions. It is hypothesized that when students learn science through inquiry, they are essentially imitating practicing scientists, which creates the opportunity for deep conceptual change (Posner, Strike, Hewson, & Gertzog, 1982). Typical lecture-style science classrooms do not provide this opportunity (Bruer, 1993). In lecture-style classrooms, where there is a laboratory component, students practice science using ―cookbook‖ methods. They follow prescriptive directions of a laboratory exercise working for results that reveal the ―right‖ answer. When students are asked to conduct their own inquiry it challenges their beliefs about how science is done and how it is learned (Roth & Roychoudhury, 1994; Tobin, Tippins, & Hook, 1995). The outcomes of some published research comparing students who are taught science using an inquiry approach with students taught using more traditional approaches have revealed differential levels of science achievement (Bredderman, 1983; Guzzetti, Snyder, Glass, & Gamas, 1993; Lee & Songer, 2003; Shymansky, Hedges, & Woodworth, 1990). According to the U.S. Department of Education (2006) ―America’s schools are not producing the science excellence required for global economic leadership and homeland security in the 21st 2 The Campbell Collaboration | www.campbellcollaboration.org century.‖ Data from the federal government defines the challenge in science education in the U.S. as follows: Internationally, U.S. students ranked 8th out of 12 countries on the Program for International Student Assessment (PISA) science test (2003). According to the 1995 Third International Mathematics and Science Study (TIMSS), U.S. 4th graders ranked 2nd, but by 12th grade, they had fallen to 16th, behind nearly every industrialized rival and ahead of only Cyprus and South Africa. Eighty-two percent of our nation’s 12th graders performed below the proficient level on the 2000 National Assessment of Educational Progress (NAEP) science test. "More Americans will have to understand and work competently with science and math on a daily basis . . . the inadequacies of our systems of research and education pose a greater threat to U.S. national security over the next quarter century than any potential conventional war that we might imagine" (U.S. Commission on National Security in the Twenty-First Century, 2001). The goals of schooling have changed over time with each generation’s graduates being expected to identify and solve problems of increasing complexity at ever-increasing levels of proficiency. With the advent of the No Child Left Behind (NCLB) Law, this level of performance is expected for all children independent of their gender, race, socioeconomic level, or English language proficiency. Since NCLB was enacted, schools have focused curricula, instruction, and assessment efforts almost exclusively on mathematics and reading – the two areas for which states were initially held accountable to meet adequate yearly progress (AYP). As dictated by the NCLB law, APY has been expanded to include science. NCLB now requires that states measure students’ progress in science at least once in each of three grade spans (3-5, 6-9, 10-12) each year. Previous Reviews and Meta-Analyses Despite the dearth of evidence regarding instruction with specific science curricula, much is known about science instruction in general. Early meta-analyses summarizing the research on innovative methods of science instruction developed during the 1960-1980 period indicated differential levels of student achievement by students taught using inquiry approaches when compared to students receiving a more traditional approach to science instruction (Lott, 1983; Shymansky et al., 1990; Shymansky, Kyle, & Alport, 1983; Weinstein, Boulanger, & Walberg, 1982; Wise & Okey, 1983). Shymansky and his colleagues (1983; 1990) summarized studies examining the effect of inquiry approaches on 18 different outcome measures, including science achievement, attitude toward science, and science process skills. Their re-analysis of the 1983 data in 1990 resulted in their reporting mean effect sizes that ranged from .18 to .36 (α=.05) when compared with non-inquiry curricula on outcomes of composite achievement, perception, and process, but no significant effect on analytic outcomes. It is also important to note that gains in content knowledge were not sacrificed to obtain gains in process skills and attitude. Lott (1983) found that while students of all ages benefited from inquiry type approaches, the benefits for intermediate students (grades 4-6) were the greatest (d=.24, p<.05, CI not available ). Similarly, Wise and Okey (1983) reported an overall curricular advantage using the inquirydiscovery method (d=.41 p<.05, CI not available) when compared to other teaching strategies. A body of research emphasizes the importance of the type of assessment in capturing student performance on process skills. For example, Weinstein et al. (1982) categorized outcome measures into Favorable Toward Inquiry Methods, Favorable Toward Traditional Methods, or Neutral Assessments, and reported an effect size of .25 (p<.05, CI not available) across all 3 The Campbell Collaboration | www.campbellcollaboration.org categories. In general, findings reported to date on inquiry science indicate that it leads to moderate and significant effects in student science achievement when compared to traditional instruction. Recently, Project 2061 completed an in-depth curricula analysis of middle school programs that are in wide use by school districts or states as identified by the Education Market Research Report (Kesidou & Roseman, 2002). The analysis highlighted nine complete science programs and the benefits of each program. Missing from this analysis were student outcome data; there were no indications of the effects on student achievement, interest, or other measures. The National Science Resource Center (1998) put forth a similar effort and did a content analysis of all inquiry-based middle school science supplementary and core curricula that met the National Science Education Standards. An abundance of inquiry-based curricular resources were identified, however, as with the Project 2061 study, there were no indications of the effects of the identified curricula on student outcomes. Need for this Systematic Review A serious challenge identified by the U.S. Department of Education is the lack of American competitiveness internationally in the sciences and mathematics. Meanwhile, a rapidly expanding knowledge base in the sciences makes it difficult for educational curricula to keep up with the changes. Adding to the urgency for resolution is the projected job growth in areas such as healthcare and computer science. The increased number of public policy issues related to biotech, technology, and the Internet make it extremely important that graduates have a strong foundation in science. Further, the NCLB law now requires the inclusion of science knowledge assessments as a measure of AYP. In response, schools are seeking instructional materials for science education that have been proven effective in increasing student achievement using rigorous, scientifically based research methods. To date there are no systematic reviews that we know of, which have rigorously evaluated the effects specific to middle school inquiry curricula that are based on student outcomes. Previous reviews, the most recent published in 1990 (Shymansky et al.), indicate that student achievement is higher in inquiry-based classrooms when compared to traditional classrooms. The inquiry-based curricula analyzed in Shymansky et al. are not explicitly defined, and now, almost two decades later, are almost certainly out of print and date. At the present time, there are no resources that educators can use to identify the most effective inquiry-based curricula to use in classrooms. Although informative, Shymansky et al.’s review, does not elucidate the nature of the measures of student achievement used in the included studies. The proposed systematic review will inform educators, policy makers and curriculum developers of the specific inquiry-based curricula that have been reported to be effective and ineffective at the middle school level while clearly communicating the nature of the assessments used to determine achievement, affect and behavioral outcomes. Middle school aged students were selected as the target for this study because interest in, and attitudes toward, science typically decline as students go through middle and high school (Atwater, Wiggins, & Gardner, 1995). Research has also shown that students’ attitudes toward science are strongly correlated with their achievement in science, as well as the number and type of science courses they take (Simpson & Oliver, 1990). Although there may be international variations in what might be labeled as middle school, by general convention it is widely accepted the middle school encompasses the 6th year of formal schooling (grade 5) to the 9th year of formal schooling (grade 8). 4 The Campbell Collaboration | www.campbellcollaboration.org OBJECTIVES This review will gather, evaluate, and summarize studies that empirically test the effects of inquiry based science instruction on the cognitive, affective and/or behavioral measures of middle school-age students. We will utilize the following questions to guide our inquiry: 1) What are the salient characteristics communicated in the literature with respect to structure and implementation of inquiry science curriculum instruction and the nature of the measures of student outcomes? 2) What does the quantitative research that has been published formally (in peer review journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and September 2008 describing random assignment control experimental and quasi-experimental investigations reveal about the influence of inquiry instruction on measures of student cognitive outcomes? 3) What does the quantitative research that has been published formally (in peer review journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and 2008 describing random assignment control experimental and quasi-experimental investigations reveal about the influence of inquiry instruction on measures of student affective outcomes in the context of learning science? 4) What does the quantitative research that has been published formally (in peer review journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and 2008 describing random assignment control experimental and quasi-experimental investigations reveal about the influence of inquiry instruction on measures of student behavioral outcomes in the context of learning science? METHODOLOGY Eligibility Criteria Types of interventions (and comparisons) to be included in the review One of the recurrent issues associated with inquiry curriculum is that inquiry means different things to different people. For example an author may use the terms ―discovery learning‖ to represent what may qualify as inquiry instruction. Because of the potential for conflation of terms, it is critical to examine the curriculum processes and not the labels used by authors representing an investigated intervention. For the purposes of this review, we adopt a classification scheme similar to what was originally proposed by Schwab (1962). In this scheme an inquiry curriculum contains three essential elements: 1) scientifically oriented question(s), 2) a methodology for gathering empirical evidence or data that can be used to address the question(s), and 3) an explanation or interpretation of the evidence in the context of the research question. To be deemed an inquiry curriculum and considered for this review, curricular/instruction science education interventions must contain all three of these essential elements. (See Table 1.) Schwab’s (1962) hierarchical scheme denotes levels of inquiry by the level of teacher and learner involvement. (See Fig. 1.) This four tiered classification scheme is based on the level of teacher and student responsibility for the three essential inquiry elements. The level of inquiry increases 5 The Campbell Collaboration | www.campbellcollaboration.org as the responsibility shifts from teacher to learner. At level 0, the teacher provides the questions, methodology for gathering data, and interpretation of the data. By level 4, the learner is responsible for all these elements. Although the specific classification of inquiry by Schwab’s model may leave some questions regarding difference with traditional curriculum, a systematic review that conducts analysis by level of inquiry may help illuminate the subtle difference that the level of inquiry structure may have on learning. Table 1. Schwab’s levels of inquiry. Inquiry Level Source of the question Data Collection Methods Interpretation of results Level 0 Given by teacher Given by teacher Given by teacher Level 1 Given by teacher Given by teacher Open to student Level 2 Given by teacher Open to student Open to student Level 3 Open to student Open to student Open to student This Schwab classification system will be used in the coding guide to identify and classify curricula by level of inquiry. Curriculum Source We recognize the importance of portability and reproducibility of any inquiry-based curriculum for which we report research findings. The challenge is that ―published‖ curriculum may encompass a wide range of channels from commercially available, public and government provided, to school district developed and disseminated (i.e., publication type). It is beyond the scope of this study to determine current availability and/or reproducibility of curricula in the studies. Therefore, while the ―publication type‖ of each curriculum will be identified and reported, it will not be used as a criterion for inclusion or exclusion. Setting and participants The intervention must have been carried out with students in science classes in grades 5-8 in public, private, parochial, or alternative schools. Studies that identify special needs (learning disabled and gifted) children will be included in the review only if the students were in immersion programs that routinely placed them in the regular classroom during science lessons or if they are being compared to an equivalent group of students, and the treatment curriculum was consistent and reproducible and not modified to meet individual needs as required in the conditions of IEPs. Studies that include students in grades in addition to grades 5-8 will be included if they reported results separately for each grade so that the target grades can be separated out for analysis. The intervention must have taken place in the school setting during the traditional academic schedule (i.e., Monday through Friday, during regularly scheduled school year). Interventions that took place in after school programs, summer school programs, or restricted to special needs classrooms, will be excluded from the review. 6 The Campbell Collaboration | www.campbellcollaboration.org Study design Controlled random assignment and quasi-experimental design studies will be considered for review. The inclusion both of these designs increases the probability that we will have a sufficient number of studies extracted from the literature search to conduct a reasonable metaanalysis. Reviewers will evaluate research studies that report the influence of well-defined inquiry interventions by conducting comparisons between clearly distinguished intervention and control (or comparison) groups, using a random assignment, controlled trial design. Control groups can represent no treatment or ―treatment as usual‖ conditions. If the experiment is a quasi-experimental design it must report pre and post intervention measures and report that prior to the intervention the students were engaged in traditional non-inquiry science instruction and the intervention is clearly an inquiry approach to teaching science. For inclusion quasi-experimental studies must report measures for participants with scores matched on relevant characteristics prior to the delivery of the intervention, or statistically equated, in order to ensure that groups were as similar as possible at baseline. Types of measures to be included The outcome measures of at least one cognitive student outcome must be reported (e.g., academic achievement, critical thinking/transfer, cognitive engagement). Depending on availability, the reviewers also expect to report on affective (e.g., student interest and/or motivational engagement) and behavioral (e.g., time on task, homework completion, discipline referral, class attendance) outcomes. All measures must be quantitative in nature, and report reliability statistics. Outcome measures using standardized measures of science achievement (knowledge of science, knowledge of scientific processes, knowledge of the nature of science) with validity and reliability will be included for the review (e.g. State level science achievement exams, Cornell Critical Thinking Test). Unstandardized measures of achievement, and other domains, with adequate face validity will also be included (e.g., student interest inventories, self-report measures for affective and cognitive engagement, and interest). Measures reported in the Buros Institute of Mental Measures Yearbook or published by the University of Nebraska Press will be considered standardized measures; and unless able to substantiate claims all other measures will be considered unstandardized. Information search and retrieval An attempt will be made to exhaustively search a range of published and unpublished documents as recommended by the Campbell Collaboration Information Retrieval Policy Brief and to retrieve all eligible studies based on the requirements previously stated. Our search will include, but will not be limited to: peer review publications, National Science Foundation funded project reports, reports from commercial publishers, conference proceedings, edited texts, dissertations, studies distributed through non-traditional channels including the Internet and studies recommended by recognized experts in the field. As recommended in the Information Retrieval Policy Brief a systematic, thorough, and comprehensive approach that minimizes bias will be utilized to assure that information retrieval is exhaustive and inclusive. Details of our search are described in the following paragraphs. 7 The Campbell Collaboration | www.campbellcollaboration.org Database searches Our search will begin with the bibliographic databases that are most appropriate for our topic including: Web of Science (WOS; including the Social Science Citation Index), SCOPUS (includes over 25,000 international journal titles), ERIC (Educational Resources Information Center: which is considered to be an international database), Education Abstracts, PsychInfo, Proquest Digital Dissertations, Google Scholar, and the U.S. Government Printing Office publications. The initial search will be conducted using the controlled vocabulary appropriate for each of these databases in each of four categories: the population of interest, the intervention, the outcomes, and the content. Keyword searching will be employed in appropriate contexts and for particular terms. Variants of search terms will be located using wild card characters, e.g., constructiv* to locate constructivism and constructivist. We anticipate that we will use the following search terms and databases, however, the form of all searches will be presented in the final report exactly as entered in the database search engine so that the searches can be replicated exactly. 1. Population of interest: middle school students (PsychInfo, ERIC), junior high school students (PsychInfo, ERIC), grade 5 (ERIC), grade 6 (ERIC), grade 7 (ERIC), grade 8 (ERIC), intermediate grades (ERIC), intermediate school students (PsychInfo). 2. Intervention: inquiry (ERIC, WOS), inquiry based learning (ERIC), active learning (ERIC), problem based learning (ERIC), problem solving (PsychInfo), discovery learning (ERIC), discovery teaching method (PsychInfo), experiential learning (ERIC, PsychInfo), constructivism (learning) (ERIC, PsychInfo), project based science (WOS), sciences (ERIC), science instruction (ERIC), student centered instruction (ERIC). 3. Outcomes: academic achievement (ERIC, PsychInfo), science achievement (ERIC, PsychInfo), student evaluation (ERIC), student improvement (ERIC), grades (scholastic) (ERIC), educational measurement (PsychInfo), affective behavior (ERIC), affective education (PsychInfo) affective measures (ERIC), behavioral objectives (ERIC), behavior (ERIC), behavior patterns (ERIC), student outcomes (WOS), student engagement (WOS). 4. Content: Science (WOS) biology (WOS), natural science (WOS) physical science (WOS), life science (WOS), earth science (WOS), science education (WOS), chemistry (WOS) STEM education (ERIC), physics (WOS). We will conduct our database searches using Boolean operators such as AND and OR to increase the efficiency and precision of our searches while reducing the probability of false positives. The use of Boolean operators in our search will allow us to be more comprehensive in our search methods by attending to a wide variety of variables in a single search. Furthermore, we can combine terms from the Population, Intervention, Outcomes and Content groupings and search for relevant articles simultaneously. This will also streamline the process of documenting our search criteria and associated results. Other Literature and the Corresponding Searches Other methods of locating pertinent studies will also be used. The bibliographies of retrieved studies will be searched to see if they contain eligible studies that would not be readily retrieved using databases of published literature, (e.g., personal communication, unpublished study). In addition, the reviewers will search the Social Science Citation index using the names of the 8 The Campbell Collaboration | www.campbellcollaboration.org authors of all retrieved studies to determine if they have conducted other investigations that might be eligible for inclusion as qualified research. In order to minimize publisher bias, we will further attempt to locate eligible unpublished studies by contacting experts in middle school science research or researchers of inquiry science curricula. We will ask these experts to identify studies or to recommend other experts that we should consult. We will use this snowball sampling technique until we have exhausted our list of contacts or the process is no longer fruitful. Searches of the Internet using reputable search engines, e.g., Google Scholar, etc. will be conducted to locate pertinent studies that were not published in peer-reviewed publications, or other works typically referenced in databases. Conference proceedings in related fields will be searched either electronically or by hand if electronic means are not available. In addition, book chapters, reports of relevant research organizations will also be searched. Studies identified through these searches will be obtained from the library, from inter-library loan, from the publisher, or directly from the authors themselves. Selection of studies The abstracts of initially identified reports of studies will be screened to determine if these studies meet the study eligibility criteria outlined in a previous section, e.g., student population, research design type, setting, or intervention. Studies that pass this screening or are not clearly ineligible will be subjected to a final screening by two trained reviewers. Final eligibility decisions for the inclusion of studies that cannot be classified though examination of the abstract will be determined by screening the full text of report for all aforementioned eligibility criteria. Assessment of study quality will be documented in the coding guide during data extraction. The Quality Assessment section of the coding guide will use simple checklists rather than quality scales. The checklists have been designed to determine which aspects of a quality investigation were attended too in the reports of the investigations. Some of our quality assessments are simply checklists used to simply determine the presence of detectable levels of reporting, using simple ―Yes‖ or ―No‖ responses. If warranted we have included additional levels of coding to be utilized to further clarify the quality details of the study. The assessment of study quality will consider the following characteristics of studies (when they are reported): Assessment of randomization procedures Handling of attrition Baseline equivalence of groups Method of measurement of the independent variable Reported reliability of measures For example, the checklist for baseline equivalence of groups includes the following items: 1. 2. 3. 4. 5. 9 Unspecified Statistical Control (e.g., ANCOVA, regression) Random Assignment Statistical Control and Random Assignment Gain Scores The Campbell Collaboration | www.campbellcollaboration.org 6. Matching 7. Other Methods for data extraction of included studies We have developed a coding form and manual for this review (see Appendix) using Lavenberg, (2007) and Pearson, Ferdig, Blomeyer, & Moran, 2005 as models. Coded elements include the following: Study level characteristics (e.g., publication type, location of study and so forth), Methodological characteristics (e.g., assignment to conditions, unit of analysis, fidelity of implementation and so forth) Participant characteristics (e.g., gender, age, ethnicity, baseline level on student outcomes and so forth) Curriculum characteristic (e.g., inquiry level, core or supplementary, duration/frequency and so forth) Outcomes (e.g., student achievement, affective, and behavioral measures) Coding For Inquiry Once a curriculum is positively identified as inquiry, it will be classified according to Schwab’s hierarchical scheme denoted by the level of teacher and learner involvement. (See Table 1).Using this Schwab’s classification scheme for coding we will be able to identify and classify curricula into similar levels of inquiry (See Table 1). Further, this scheme allows us to consistently use the term inquiry and attend to the different levels of inquiry. We assume the ability to compare the different levels of inquiry. However, the ability to actually compare the effects of various levels of inquiry to a control or comparison curriculum is contingent on whether inquiry condition is reported according to these levels or reported with sufficient detail that we can map the description of instruction to these levels. For studies to be eligible for this review they must describe the curriculum or inquiry science units in sufficient detail to identify it as an inquiry curriculum. If it cannot be identified as an inquiry curriculum from the published information and the curriculum cannot be located, we will attempt to contact the authors of the study or other researchers familiar with the investigation. If this is not possible we will eliminate the study from further evaluation because it is not possible to determine if the inquiry communicated in the study is truly a form of curriculum, or the use of ―inquiry‖ convenient science education term. Proposed quality assurance procedures (e.g. independent double data extraction etc.) During Phase I and Phase II of the coding process, we intend to use multiple coders to dual code 100% of the studies that meet our eligibility criteria in the information retrieval process. Coders will be trained in the use of the coding guide. Coding training will begin following an initial search of the literature. Initially, each coder will independently code 20 articles selected from our initial search using the Phase I questions in the coding guide. Upon completion, the coders and principal investigators will examine the outcome and determine the level of inter rater reliability. This will continue with a new set of articles until a 90% or greater interrater reliability is achieved for Phase I coding. Upon completion we will compare results and any significant differences will be reconciled through discussion facilitated by one of the Principal Investigators. Once interrater reliability is established for Phase I coding, the coders will work independently to code the remaining articles from our initial search of the literature. Upon completion, the results will be compared and the principal investigators will facilitate 10 The Campbell Collaboration | www.campbellcollaboration.org discussions that lead to consensus decisions on all coding. This process will be repeated for Phase II of the coding process. Following the completion of Phase I and Phase II coding, the PIs and coders will meet for two days to discuss and establish interrater reliability for Phase III coding. During this meeting we will review the Phase III coding scheme, independently code studies, compare results, determine interrater reliability and repeat to assure 90% or greater interrater reliability. Following the meeting both coders will work independently to code all pertinent studies at Phase III of the coding. At the 50% mark and upon completion we will compare results and any significant differences will be reconciled through discussion facilitated by one of the Principal Investigators. Overall during the analysis process, 100% of our included studies will be double coded, with difference in report coding reconciled. All Phase III codes will be entered into the TrialStat SRS database for storage and retrieval for further analysis. Methods for Effect Size Computations There are a variety of effect size formulas available to quantify the mean difference, on an outcome, between two groups in standard deviation units (Hunter & Schmidt, 2004; Lipsey & Wilson, 2001; and Rosenow & Rosenthal, 2003). The choice of which formula to use depends on three study characteristics: (1) the scale on which outcome variable(s) are measured, (2) types of research designs used in studies included in the review reviewed, and (3) format that outcome data is reported in studies included in the review. Depending on these three factors, we will compute effect sizes using one of the following approaches, all of which assume the comparison on an outcome is between two groups: (1) Measurement Scale of Outcome. For outcomes reported on a continuous (interval or interval ratio) scale, we will compute the effect size using the formula for the standardized mean difference (d index) with a small sample correction (Hedges, 1981).1 For outcomes reported on a dichotomous scale, we will compute an odds ratio. (2) Type of Research Design. For outcomes reported on a continuous scale in experiments or quasi-experiments, but the post-test (or post intervention) mean difference has been adjusted for pre-test (or pre intervention) differences, we will compute the effect size using the formula for the standardized mean difference (d index) with a small sample correction. However, we will use the adjusted means in the numerator of the formula. (3) Data Format for Outcomes. For studies that report outcome data in formats other than means, standard deviations or sample sizes, algebraically equivalent formulas will be used to calculate the effect size. For example, if a study reports only an independent t statistic and sample sizes for each group, the algebraic equivalent formula for computing an effect size will be applied. We will calculate effect sizes using two computer software packages: Comprehensive MetaAnalysis 2.2046 (CMA 2.2) and ES (Shadish, Robinson, and Lu, 1999). Because CMA 2.2 allows for effect size calculation using over 100 data entry formats, the software should be sufficient for 1 A small sample is defined as sampled for both groups combined less than 20 ( n < 20). 11 The Campbell Collaboration | www.campbellcollaboration.org calculating effect sizes from 99% of the data formats reported in the literature.2 In the event we encounter an exotic data reporting format that CMA 2.2 cannot handle, the ES software provides additional computational flexibility calculating. A common mistake made in the education research literature where randomized controlled trials are used to test the effectiveness of an intervention is to analyze the individual level data when random assignment was at the cluster level. For example, school may be randomly assigned to conditions in which case the effect size formula should be applied at the school level-that is, for outcomes measures on a continuous scale, school means, standard deviations, and sample sizes were used in the effect size formula for the standardized mean difference. In this review, we will use the unit of random assignment for effect size formulas, when possible. In studies where there is not enough information to applied an effect formula and the authors reported effect sizes reported using units not at the level of random assignment, we will apply a Hedges clustering correction to the significance level and enter the effect size and correction pvalue in the CMA2.2 using the ―computed effect size‖ option. Related to the last point, effect sizes calculated using the ES software can be entered into CMA 2.2 as a ―computed effect size‖ (rather than be computed using the formulas built into CMA 2.2) so that all effect sizes—whether computed using CMA2.2 or ES,—will be housed in the same analytic database for synthesizing. The methods for synthesizing effect size are discussed next. Methods for Synthesizing Effect Sizes General Approach A critical assumption of meta-analysis is that the effect sizes used to obtain the combined effect (weighted average of effect sizes) are independent. This means that when combining effect sizes, there can be only one effect size per study.3 So as to not violate the assumption of independence of effect sizes, only one effect size per study will be used in the meta-analysis. In situations when there is either a) more than one group comparison on the same outcome, or b) more than one outcome for the same group comparison, or both, we will either select one effect size or compute an unweighted average depending on which is more appropriate. For example, suppose in an included study there is one comparison between the inquiry-based science curriculum group and the control group, but for two achievement outcomes: 1) facts and concepts and 2) reasoning. Because the primary research objective is the influence of the inquiry science curriculum on student cognitive outcomes, it would be conceptually appropriate to use the average of the effect sizes from the two measures to represent the effect size for the study in the meta-analysis.4 CMA 2.2 has a formula screen that allows the user to verify the formula used to compute the effect size and how that formula was applied to the data in the database. This screen will be used to verify that all formulas were appropriate and applied correctly. 3 This is true if and only if a HLM model is not used to combine effect sizes and to account for their nonindependence. When HLM models are used, there can be more than one effect size per study (see Raudenbush and Byrk, 2002, p208). However, this is an underdeveloped area in the meta-analytic literature for the behavioral and social sciences and, therefore, is not used here. 4 In CMA2.2, this is done by selecting ―mean of the selected outcome‖ which takes the average of each of the two data points (e.g., means, sample sizes, and standard deviations) and computes the effect size that represents the study effect size and is used in the meta-analysis. 2 12 The Campbell Collaboration | www.campbellcollaboration.org Combining Effect Sizes We assume that the effect size in each study estimates a different effect size in the population (Lipsey & Wilson, 2001). Under this assumption, a random effects model should be used (Borenstein, Hedges, & Rothstein, 2007). When combining effect sizes under this model, it is assumed the effect size estimates will vary from study to study because of differences among the study population parameters (between-studies variation) and sampling of different subjects within the study populations (within-study variation). Results from the random effects model allow for inferences to the population of studies from which the set was sampled. In other words, the observed combined effect can be extrapolated beyond the studies included in the meta-analysis (Kline, 2004; Borenstein, Hedges, & Rothstein). We will report the combined effect (and its confidence interval), generated by both the fixed and random effect models as there are no additional computational cost to produce both using CMA 2.2. To start, effect sizes will be averaged across studies using a fixed effect model by applying an inverse variance weighting of the individual effect size to weight studies according to their variation in sample sizes, and then averaging these effect sizes. This is what Hunter and Schmidt (2004) refer to as a ―bare-bone‖ meta-analysis which controls only for sample size. The "bare-bones" meta-analysis is the usual first step in a meta-analysis, whether a fixed or random effects models used. This analysis gives the meta-analysts estimates of the effect size, controlling for no other study-level factors except sample size. We will conduct the bare-bones meta-analysis a random-effects model for the purpose of evaluating the effect of inquiry-based science curriculum for various outcomes. However, to obtain the I-squared statistic (and its confidence interval) which quantifies the amount of variability (or heterogeneity) in the effect sizes from which the combined (or average) effect size is derived, the bare-bones meta-analysis must be done using a fixed effects model. It is for this purpose, and this purpose only, that the fixed effects model will be used. It is important to distinguish this approach, simply using the fixed effects model to obtain Isquared (and its confidence interval) to quantify variability in study effects sizes from making a decision to move from a fixed effects model to a random effects model based on the values of Isquared (or the Q-statistic). Although this approach represented the convention wisdom among meta-analysts previously, it is no longer considered sound practice (see Borenstein, Hedges, & Rothstein, 2007: Meta-Analysis, fixed effect vs. random effects, page 29). Instead, we have made the decision to use the random effects model in the protocol and use the fixed-effects model initial to obtain the I-square to quantify the substantial amount of variability in the effect sizes. In CMA 2.2, this is implemented by running the "bare-bones" meta-analysis and reading the Isquared statistic and its confidence interval from the fixed effects model line, but the random effects information is also presented simultaneously--these will be the results used to shed light on our understanding of curriculum effects, if they exist or are detectable. This weighting provides more precise studies (e.g., studies with larger sample sizes) being given greater weight than less precise studies in the averaging of effect size. Under a fixed effects model, the inverse weight only accounts for within study sampling of participants but does not take into account additional random variability that occurs from between study sampling from a larger population of studies (Borenstein, Hedges, & Rothstein, 2007). However, a random 13 The Campbell Collaboration | www.campbellcollaboration.org effects model takes this variability into account by adding the additional between study variance components to the inverse variance weight. As a result, the weighting is more evenly distributed across effect sizes. The combined effect size (and its confidence interval) will be used to address the research questions and the basis for all subsequent analysis. Homogeneity Analysis A homogeneity analysis is used to evaluate how representative the average effect size is of the study effect sizes from which it was derived. Roughly, the more homogeneous (or less heterogeneous) the study effect sizes, the more representative is the average. Specifically, a homogeneity analysis examines whether the variation in a set of effect sizes may be attributed to sampling error alone or to other factors—that is, there is the expectation that effect size will have some variability (attributable to within study sampling of participants) and as such will deviate somewhat from the average effect. The statistic will be used to describe the percentage of total variation across studies due to heterogeneity rather than chance, or within study sampling as described in Higgins, Thompson, Deeks, and Douglas, 2003, p.558. The statistic lies between 0% and 100%, with 0% indicating no observed heterogeneity and increasing values indicating increasing heterogeneity. Heuristically, values of 25%, 50%, and 75% values can be interpreted as low, moderate, and high (see Higgins, Thompson, Deeks, and Douglas, 2003, p.558). The statistic, along with the Q statistic, is produced with a ―bare bones‖ meta-analysis in CMA 2.2. When the number of effect sizes in the study is small, the statistic is reliable that the Q statistics because the former is not sensitive to sample size like the latter. Further, the provides more information by quantify the amount of variation in the effect sizes between included studies. Recent work by Borenstein, Hedges, and Rothstein (2007) have argued that the practice of starting with a fixed effects model and making a decision on whether to use a random effects model based on the results of the homogeneity analysis should be discourage (see page 30 for the reason). For this reason, we state our assumption about the anticipate variability in effect sizes in the previous section, and based on that assumption use the results from the random effect model for addressing the review questions and subsequent moderator analysis, when appropriate. Publication Bias Publication bias refers to research that appears on a topic in the published literature and is systematically unrepresentative of the population of all studies on that topic (Rothstein, Sutton, and Borenstein, 2005). To assess the set of studies included in our meta-analysis for publication bias, we used the trim and fill procedure and visually inspected the resulting Funnel Plot produced in CMA 2.0. Rosenthal’s Fail Safe N, that reports the number of unpublished studies that would need to be identified to produce an effect size point estimate of zero, will also be reported.5 There is emerging work by Betsey Jane Becker and colleagues that calls the F Safe N in question. We will consult with the C2 Methods Group to determine if there is a better metric or approach for augmenting the funnel plot to assess the degree of publication bias. 5 14 The Campbell Collaboration | www.campbellcollaboration.org Incomplete Reporting of Study Data When information reported in an RCT was insufficient to compute an effect size, we will attempt to contact the author(s) to retrieve the missing data. However, in the event that the author is unresponsive or cannot be located, the data will be coded as missing. Traditionally, missing data for effect sizes has been addressed by a) setting the effect size to zero which bias the variance towards zero, b) replacing the effect size with the mean effect size which bias the average effect size estimate (see Allison 2001 and Pigott, 2001), or c) omit the study (or at least the comparison of the two groups in the study that has the missing effect size for the outcome). We plan to break new methodological ground by using multiple imputation. Briefly, multiple imputation is the process of replacing missing effect sizes with a regression-based estimate from a random draw of the posterior Bayesian distribution. Under certain assumptions and empirical conditions, replacing missing effect sizes using multiple imputation produces better estimate of the true average effect size (see Allison 2001 and Pigott, 2001). Sensitivity Analyses The sensitivity analyses will be used to evaluate how robust our meta-analytic results are to any one study, and handling of missing data. For example, the one study removed analysis in CMA 2.2 will be used to assess how the average effect size changes with one study removed relative to the average effect size with all studies included. All meta-analysis will be conducted by first using the listwise deletion data with group comparisons on outcomes with missing effect sizes imputed, and second using the multiple imputation dataset with group comparisons on outcomes with missing effect size imputed. If the results are similar, the results based on the imputed dataset will be reported and the listwise deleted results will be reported in the Appendix because the former will have, depending on how severe the missing data, more statistical power. If the results are dissimilar, the results will be reported vice versa. Post Hoc Subgroup and Moderator Analyses If is moderate or large for the ―bare bones‖ meta-analysis, we will conducted a moderator analysis to examine sources of this variation using the following study-level characteristics o o o o Publication Type: Published vs. Unpublished. Research Design: Experiment vs. Quasi-Experiment. Participant characteristics: Gender and ethnicity Curriculum characteristic: Level of Inquiry. The usefulness of moderator analysis is predicated on 1) whether the study-level characteristics that will be used as the moderator are reported in primary studies, and 2) how consistently they are reported across studies. The analysis will be conducted in CMA 2.2 using One-Way ANOVA (with study as the unit of analysis) with effect size as the dependent variable, the selected study characteristic serving as the moderator used as the factor in the analysis, and the values of the moderator serving as the levels. For example, a moderator analysis to determine whether heterogeneity in effect sizes can be explained by gender will be implemented in CMA 2.2 as a One-Way ANOVA with the effect size for the inquiry vs. traditional curricula comparison as the dependent variable, gender as a factor with two levels (male and female), and the between group difference in the average effect sizes across studies tested by an F test. Of course, the statistical test must be interpreted cautiously because the n size is based on studies, rather than subjects, and unless there are a large number of studies, the statistical test will be underpowered. 15 The Campbell Collaboration | www.campbellcollaboration.org Another caution to use of a moderator analysis is the elevated probability of the Type 1 error (wrongly concluding there is a difference where there is not) from conducting multiple One-Way ANOVA’s without controlling for this multiplicity. One way to control for this is through metaregression. Currently, CMA 2.2 allows for meta-regression with moderator measured on a continuous scale (i.e., interval or interval-ratio). However, if necessary, we will consider using SAS or SPSS to conduct the meta-regression, which is slightly more complicated. We will, of course, evaluate whether the benefits of simultaneous statistical controls outweigh the cost of additional complexity in implementing meta-regression with these statistical programs. SOURCES OF SUPPORT The review is directly supported by funding from the Kauffman Foundation through the Campbell Collaboration. AUTHOR(S) REVIEW TEA M Lead reviewer: Name: Louis S. Nadelson Title: Assistant Professor, Curriculum, Instruction, Foundational Studies Affiliation: College of Education, Boise State University Address: 910 University Dr. City, State, Province or County: Boise, ID Postal Code: 83725 Country: USA Phone: (208) 426-2856 Mobile: Email: [email protected] Co-author(s): Name: Susan Williams Title: Affiliation: Metiri Group Address: 600 Corporate Pointe, Suite 1180 City, State, Province or County: Culver City, CA Postal Code: 90230 Country: USA Phone: (310) 945-5150 Mobile: (615) 364-7787 Email: [email protected] Name: Herbert M. Turner, III Title: Affiliation: ANALYTICA, Inc. Address: 35 Goldfinch Circle City, State, Province or County: Phoenixville, PA Postal Code: 19460 Phone: (610) 933-1005 Email: [email protected] 16 The Campbell Collaboration | www.campbellcollaboration.org ROLES AND RESP ONSIBL IITIES • Content: Louis S. Nadelson and Susan M. Williams are the co-principal investigators of this project. They will develop the protocol, conduct the information retrieval, code all studies and collaborate with Herbert Turner and Cheryl Lemke to draft the final report. • Systematic review methods: Herbert M. Turner, will be evaluating the validity of the systematic review method protocol using the Campbell Collaboration documents as a guideline. • Statistical analysis: Herbert Turner is responsible for methodological reviews of the study and will be guiding the statistical examinations of our meta-analysis, should there be sufficient studies to warrant such an analysis. • Information retrieval: Louis Nadelson, Susan Williams, Herbert Turner, and Cheryl Lemke will all be involved in the approval of the information retrieval procedure. Louis Nadelson and Susan Williams will be responsible for conducting the actual information retrieval. REFERENCES Anderson, R. D. (2002). Reforming science teaching: What research says about inquiry. Journal of Science Teacher Education, 13(1), 1-12. Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta-analysis, fixed effect vs. random effects. (Available from Biostat, Englewood, NJ) Bredderman, T. (1983). Effects of activity-based elementary science on student outcomes: A quantitative synthesis. Review of Educational Research, 53(4), 499-518. Bruer, J. T. (1993). Schools for thought. Cambridge, MA: MIT Press. Guzzetti, B. J., Snyder, T. E., Glass, G. V., & Gamas, W. S. (1993). Promoting conceptual change in science: A comparative meta-analysis of instructional interventions from reading education and science education. Reading Research Quarterly, 28(2), 116-159. Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107-128. Hunter, J.E., & Schmidt, F.L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd Edition). Thousand Oaks, CA: Sage. Kesidou, S., & Roseman, J. E. (2002). How well do middle school science programs measure up? Findings from project 2061's curriculum review. Journal of Research in Science Teaching, 39(6), 522-549. Lavenberg, J. G. (2007). Effects of school-based cognitive-behavioral anger interventions: A meta-analysis. Ph.D. dissertation, University of Pennsylvania, United States -Pennsylvania. Retrieved September 19, 2008, from Dissertations & Theses: Full Text database. (Publication No. AAT 3271850). Lee, H., & Songer, N. B. (2003). Making authentic science accessible to students. International Journal of Science Education, 25(8), 923-948. 17 The Campbell Collaboration | www.campbellcollaboration.org Lipsey, M.W., and Wilson, D.B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. Lott, G. W. (1983). The effect of inquiry teaching and advance organizers upon student outcomes in science education. Journal of Research in Science Teaching, 20(5), 437-451. National Center for Education Statistics. (2001). Science: The nation's report card 2000. Washington, D.C.: National Center for Education. National Research Council. (1996). National science education standards. Washington, DC: National Academy Press. National Science Resources Center. (1998). Resources for teaching middle school science. Washington, D.C.: National Academy Press. Pearson, P.D., Ferdig, R.E., Blomeyer Jr, R., & Moran, J. (2005). The effects of technology on reading performances in the middle-school grades: A meta-analysis with recommendations for policy. Learning Point Associates. Posner, G., Strike, K., Hewson, P., & Gertzog, W. (1982). Accommodation of a scientific conception: Toward a theory of conceptual change. Science Education, 66, 221-227. Raudenbush, S.W., & Bryk, A.S. (2002). Hieararchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Roth, W.-M., & Roychoudhury, A. (1994). Physics students' epistemologies and views about knowing and learning. Journal of Research in Science Teaching, 31(1), 5-30. Sandoval, W. A., & Millwood, K. A. (2007). What can argumentation tell us about epistemology? In S. Erduran & M. P. Jiménez-Aleixandre (Eds.), Arguemntation in science education: Perspectives from classroom-based research (pp. 68-85): Springer. Schwab, J. J. (1962). The teaching of science as enquiry. In J. J. Schwab & P. Brandwein (Eds.), The teaching of science. Cambridge, MA: Harvard University Press. Shymansky, J. A., Hedges, L. V., & Woodworth, G. (1990). A re-assessment of the effects of inquiry-based science curricula of the sixties on student achievement. Journal of Research in Science Teaching, 27(2), 127-144. Shymansky, J. A., Kyle, W. C., & Alport, J. M. (1983). The effects of new science curricula on student performance. Journal of Research in Science Teaching, 20, 387-404. Tobin, K., Tippins, D. J., & Hook, K. S. (1995). Students' beliefs about epistemology, science, and classroom learning: A question of fit. In S. M. Glynn & R. Duit (Eds.), Learning science in the schools: Research reforming practice (pp. 85-110). Mahwah, NJ: Erlbaum. U.S. Commission on National Security in the Twenty-First Century. (2001). Road map for national security: Imperative for change. o. Document Number) United States Department of Education. (2006). The facts about science achievement (Publication. Retrieved October 6, 2006: http://www.ed.gov/nclb/methods/science/science.html Weinstein, T., Boulanger, F. D., & Walberg, H. J. (1982). Science curriculum effects in high school: A quantitative synthesis. Journal of Research in Science Teaching, 19(6), 511522. Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various science teaching strategies on achievement. Journal of Research in Science Teaching, 20(5), 419-435. 18 The Campbell Collaboration | www.campbellcollaboration.org APPENDIX Influence of Inquiry-Based Science Interventions on Middle School Students’ Cognitive, Behavioral, and Affective Outcomes: A Campbell Collaboration Systematic Review 1.1.1.1 Education Review Group PROPOSED: INQUIRY SCIENCE INSTRUCTION CODING MANUAL September 20, 2008 Reviewers Louis S. Nadelson Curriculum, Instruction, Foundational Studies College of Education Boise State University 910 University Drive Boise, Idaho 83725 (208) 426-2856 [email protected] Susan Williams Metiri Group 600 Corporate Pointe Suite 1180 Culver City, CA 90230 Office: 310.945.5150 Mobile: 615-364-7787 [email protected] Methodolgical Consultant: Herbert M. Turner, III ANALYTICA, Inc. 35 Goldfinch Circle Phoenixville, PA 19460 610.933.1005 [email protected] 19 The Campbell Collaboration | www.campbellcollaboration.org ELIGIBILITY CRITERIA To be eligible, interventions must meet the criteria that define inquiry science instruction. Additional instruction may be included, but, at a minimum, all instruction must include the following: • A scientifically oriented question(s) • A methodology for gathering empirical evidence or data that can be used to address the question(s), and • An explanation or interpretation of the evidence in the context of the research question. Setting and participants. The intervention must have been carried out with students in science classes in grades 5,6,7, or 8 in public, private, parochial, or alternative schools. Studies that included students in grades other than grades 5-8 will be included if they report results separately for each grade so that the target grades can be separated out. The intervention must have taken place in the school setting during the traditional academic schedule (i.e., Monday through Friday, during regularly scheduled school year). Interventions that took place in an after school program will not be considered. Studies with special needs or gifted students will be included in the review if they were in immersion programs that routinely placed them in the regular classrooms during science lessons or if they are being compared to an equivalent group of students, and the treatment curriculum was consistent and reproducible and not modified to meet individual needs as required in the conditions of IEPs. Study design Reviewers will evaluate research studies that report the impact of a well-defined intervention by distinguishing between treatment and comparison groups, using a randomized, controlled trial design or a quasi-experimental design (specifically, the non-equivalent control group design). If a quasi-experimental design was employed, participants must have been matched on relevant characteristics prior to the delivery of the intervention, or statistically equated, in order to ensure that groups were as similar as possible at baseline. Comparison groups can represent no treatment, ―treatment as usual‖ conditions, or a different level of inquiry than the identified treatment condition. Types of outcomes included. The outcome measures of at least one academic cognitive student outcome must be reported (e.g., academic achievement, critical thinking/transfer,). Depending on availability, the reviewers also expect to report on cognitive engagement and/or affective (e.g., student interest and/or emotional engagement) and behavioral (e.g., time on task, homework completion, behavioral engagement) outcomes. All measures must be quantitative in nature, and report reliability statistics. Outcome measures using standardized measures of achievement with validity and reliability will be included for the review (e.g., National Assessment of Educational Progress, Partnership for the Assessment of Standards-based Science, Cornell Critical Thinking Test). Unstandardized measures of achievement, and other domains, with adequate face validity will also be included (e.g., student interest inventories, self-report measures for affective and cognitive engagement, and interest). Measures reported in the Buros Institute of Mental Measures Yearbook, published by the University of Nebraska Press, will be considered standardized measures; all others will be considered unstandardized measures. Dates of publication. The date of publication of the study must be 1990 or later, although the research itself may have been conducted prior to 1990. All studies up through September 2008 will be considered for inclusion. 20 The Campbell Collaboration | www.campbellcollaboration.org STUDY IDENTIFIERS Each study will be assigned a three digit code identifying code {STUDYID]. A study is defined as a research investigation in which two or more groups are compared with each other and includes the treatment, materials, measures, analyses, and results. If there is more than one paper describing the same study, the papers will be combined and coded as a single study. Each paper within the study will also have an identifying code with the first three digits representing the study and the last two digits representing the paper, e.g., 123.02 designating study #123 and paper #2 describing this study. Each study will be coded independently by two trained reviewers. Each reviewer will create a separate coding form. Each Reviewer will be assigned a PIN (or may choose to use their initials) and this PIN will be entered for every investigation they code. [ReviewerID] The date that coding began will be entered for each study. The full citation for the study will be entered using APA style. SCREENING FORMS Phase I Screening Form: Topic, Setting, Population, and General Design Relevance (1) How was this study identified? b. c. d. e. f. g. h. i. j. a. Electronic search of online database Bibliography of relevant study Cited in meta-analysis or systematic review Hand-search of journal Search of conference proceedings Web search (e.g., Google) Organizational website Contacting Expert in the Field Contacting a Referred Author/Researcher Other: _________________________________ (2) Does this study address inquiry science? To be eligible, at least one group in each study must meet the criteria that define inquiry science instruction. Additional instruction may be included, but, at a minimum, all instruction must include the following: • A scientifically oriented question(s) • A methodology for gathering empirical evidence or data that can be used to address the question(s), and • An explanation or interpretation of the evidence in the context of the research question. __ Author identified __ Screener identified YES UNSURE NO (STOP/exclude) (3) Does this study take place in a school setting? Setting and participants. The intervention must have been carried out with students in science classes in public, private, parochial, or alternative schools. The intervention must have taken place in the school setting during the traditional academic schedule (i.e., Monday through Friday). Interventions that took place in an after school program will not be considered. Studies with special needs or gifted students will be included in the review if they were in immersion programs that routinely placed them in the regular classrooms during science lessons or if they are being compared to an equivalent group of special needs or gifted students. The curriculum for these students must not be individualized as is frequently required by an IEP, but generalized and reproducible, YES UNSURE NO (STOP/exclude) (4) Are the participants grade 5, 6, 7, or 8 students? Studies that included students in grades other than grades 5-8 will be included if they report results separately for each grade so that the target grades can be separated out. YES UNSURE NO (STOP/exclude) (5) Are there at least two groups of participants? Reviewers will evaluate research studies that report the impact of a well-defined intervention by distinguishing between treatment and comparison groups, using a randomized, controlled trial design or a quasi-experimental design (specifically, the nonequivalent control group design). If a quasi-experimental design was employed, participants must have been matched on relevant characteristics prior to the delivery of the intervention, or statistically equated, in order to ensure that groups were as similar as possible at baseline. Comparison groups can represent no treatment, ―treatment as usual‖, or a different level of inquiry than the identified treatment condition. YES UNSURE NO (STOP/exclude) (6) Is the date of publication of the paper(s) describing this study 1990 or later? YES UNSURE NO (STOP/exclude) Comments: _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ ____________________________________________ Decision: __ Exclude __ Include/progress to Phase II screening Phase II Screening Form: Research Design, Intervention and Outcome Relevance As in Phase I, each study will be screened by two reviewers working independently. Each reviewer will create a form that will be identified by study # and the reviewer’s code. • Study ID number [STUDYID]*______________ Reviewers initials or PIN* [ReviewerID]___________ • Date Phase II review was begun: ___________ • First author * ________________________ • Year* ________________________ (1) Research design: __ Randomized controlled trial (individuals randomly assigned to condition) __ Group randomized controlled trial __ Quasi-experimental design __ Single group pretest/posttest design (Stop/exclude) __ Case study (Stop/exclude) __ Correlational or ex post facto design (Stop/exclude) __ Other: _____________________________________(Stop/exclude) __ Unsure/unable to determine Location of identifying information: __ Title page:___________ __ Abstract page:____________ __ Method section page:____________ (2) Is this an inquiry science intervention? YES UNSURE NO (Stop/Exclude) Page:_______________ __ Author identified __ Screener identified Comment: _______________________________________________________ _______________________________________________________ _______________________________________________________ _________________________________ (3) Do the comparison and treatment groups receive conceptually different interventions? (Comparison groups can represent no treatment, ―treatment as usual‖, or a different level of inquiry than the treatment condition.) YES UNSURE NO (Stop/Exclude) Page:________________ Comment: _______________________________________________________ _______________________________________________________ _______________________________________________________ _________________________________ (4) Indicate below whether cognitive outcomes (academic achievement or critical thinking/transfer) for students are measured. YES UNSURE NO (Stop/Exclude) Page:________________ Comment: _______________________________________________________ _______________________________________________________ _______________________________________________________ _______________________________________________________ ______________________________________ (5) Is a standardized measure used to record the outcome? (a) Academic Achievement? NO (b) Critical thinking/transfer? NO YES (Page # ) UNSURE YES (Page # ) UNSURE Comment: _______________________________________________________ _______________________________________________________ _______________________________________________________ _________________________________ (6) Does this study report outcomes such that an effect size may be calculated? YES UNSURE NO (Stop/Exclude) Page: _____________ Comment: _______________________________________________________ _______________________________________________________ _______________________________________________________ _________________________________ Additional Comments _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _________________________________ Decision: ____ Exclude ____ Include/Continue to Coding Phase Phase III Coding CODING ELEMENTS FOR INQUIRY SCIENCE STUDIES Coding: Reviewers initials or code___________ Date Phase III review was begun_____________ Study ID number [STUDYID] ______________ Study Information Author(s)* (Report last name, first; e.g., Doe, John). o _________________________________________ Year of Study (Report year of study; e.g., 2000). o _____________ Publication Features 1. Juried journal 2. non-juried journal 3. Doctoral dissertation 4. Conference proceedings 5. Book 6. Government Report 7. Organizational Report 8. Other ______________________________________ Methodological Characteristics Type of Design 1. Quasi-experimental/nonrandomized pre-post control group 2. Quasi-experimental time series 3. Randomized post-test only control group 4. Randomized pre-post control group 5. Other 26 Number of Comparisons Within Study (Report number; e.g., 1 or 2 or 3). o _______________________________ Unit of Analysis 1. Unspecified 2. Individual 3. Class 4. School 5. District 6. State 7. Mixed Does the unit of analysis match the unit of assignment? 1. Yes 2. No 9. Cannot determine The Campbell Collaboration | www.campbellcollaboration.org Method of Assignment: 1. Unspecified 2. Random, with or without matching 3. Non-random, matched ONLY on student characteristics or demographics 4. Non-random matched ONLY on pretest measures of outcome variables 5. Non-random matched on both of above 6. Other TREATMENT Demographic Information for treatment group 27 Student Sample Size (Report actual sample size; e.g., 3086). –––––––––––––––– School Sample Size (Report actual sample size; e.g., 73). ––––––––––––––––––– Student Sex (Check all that apply and provide numbers or percentages if available) 1. Males _____ 2. Females _____ 3. Mixed or not specified_____ Grade Level (Check all that apply and provide numbers or percentages if available) 1. 5th grade_____ 2. 6th grade; _____ 3. 7th grade _____ 4. 8th grade _____ Students' Ethnicity; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Black _____ 3. Hispanic_____ 4. Asian_____ 5. White_____ 6. Mixed_____ 7. Other_____ Students' Socioeconomic Status (Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Lower _____ 3. Lower middle_____ 4. Middle _____ 5. Upper middle_____ 6. Upper _____ 7. Mixed_____ Country; 1. Unspecified 2. USA The Campbell Collaboration | www.campbellcollaboration.org 3. 4. 5. 6. 7. 8. 9. Canada Mexico/Latin America Europe Asia South America Cross-Cultural Other _________________________________ Geographical Region in USA; Check all that apply and provide numbers or percentages if available) 1. Northeast _____ 2. Southeast _____ 3. Midwest _____ 4. South Central _____ 5. Southwest _____ 6. Northwest _____ 7. Other_____ School Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Public _____ 3. Private _____ 4. Special school_____ 5. Other_____ Community Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Urban_____ 3. Rural_____ 4. Suburban_____ 5. Other_____ Instructional Characteristics for treatment group Number of Inquiry Science Sessions (Unspecified = 0; List number of sessions [e.g., 12]). o ________________ Duration of Inquiry Science Sessions (Unspecified = 0; List number of average minutes per sessions [e.g., 40]). o ________________ Duration of Study (Unspecified = 00; List the number of months that the implementation of the curriculum occurred). o _________________ 28 Percentage of inquiry during research/instructional period; 1. Unspecified 2. 1-25% 3. 26%-50% 4. 51%-75% 5. 76% - 99% The Campbell Collaboration | www.campbellcollaboration.org 6. 100% 7. Other Learning Responsibility: 1. Level 0 2. Level 1 3. Level 2 4. Level 3 5. Unidentifiable Type of Learning Task(Inquiry group); 1. Unspecified 2. Basic skills/factual learning 3. Problem solving 4. Investigation 5. Project-based 6. Mixed types 7. Other Level of Technology 1. Unspecified 2. Supplementary/Digital Content 3. Primary/Digital Content 4. Regular Digital Productivity requirements 5. Other Feedback and Assessment Practices (Teacher/Student) 1. Unspecified 2. No feedback 3. Minimal feedback 4. Elaborate feedback 5. Other Instructional/Teaching Characteristics for treatment group • Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring student collaboration to accomplish a joint product; monitors and supports students collaboration in positive ways.) 1. No evidence 2. Some evidence 3. Extensive evidence • Contextualization/Making Meaning (e.g., Begins activities with what students already know from home, community, and school; encourages students to use content vocabulary to express their understanding.) 1. No evidence 2. Some evidence 3. Extensive evidence 29 The Campbell Collaboration | www.campbellcollaboration.org • Challenging Activities (e.g., Designs instructional tasks that advance students' understanding to more complex levels. Assures that students—for each instructional topic— see the whole picture as a basis for understanding the parts.) 1. No evidence 2. Some evidence 3. Extensive evidence • Instructional Conversation: (e,g., Arranges the classroom to accommodate conversation between the teacher and a small group of students on a regular and frequent basis. Guides conversation to include students' views, judgments, and rationales using text evidence and other substantive support.) 1. No evidence 2. Some evidence 3. Extensive evidence • Mode of Instruction 1. Unspecified 2. Whole-group instruction 3. Paired 4. Small-group instruction [3-5 members] 5. Individualized 6. Mixed 7. Other • Basis of grouping 1. Teacher selected purposeful 2. Teacher selected random 3. Student selected 4. No grouping • Role of Teacher 1. Unspecified 2. Deliverer of knowledge 3. Facilitator of groups/student learning 4. Modeling processes [e.g., problem solving] 5. Mixed 6. Other Experience and training context 30 Reported Teachers' Experience with Inquiry Science: 1. Unspecified 2. None 3. Some 4. Experienced Reported Students' Experience with Inquiry Science 1. Unspecified 2. None 3. Some 4. Experienced The Campbell Collaboration | www.campbellcollaboration.org Teacher Training in Inquiry Science (Unspecified = 0; List hours of training (e.g., 15). o ________________ • 1. 2. 3. 4. 5. Teacher Qualifications Unspecified Alternatively certified or provisional certificate Certified in content area Not certified in content area Other Policy related to Inquiry Science Level of Enacted Policy 1. Unspecified 2. School 3. District 4. State 5. Federal 6. Other Focus 1. 2. 3. 4. 5. Unspecified Reducing achievement gaps Increased use of inquiry science Increased critical thinking Other COMPARISON #1 Treatment for 1st Comparison Condition; 1. Unspecified 2. Student receives nothing/treatment as usual 3. Teacher receives equivalent PD/Treatment as usual 4. Alternate Treatment (Specify) Demographic Information for comparison group #1 31 Student Sample Size (Report actual sample size; e.g., 3086). –––––––––––––––– School Sample Size (Report actual sample size; e.g., 73). ––––––––––––––––––– Student Sex (Check all that apply and provide numbers or percentages if available) 1. Males _____ 2. Females _____ 3. Mixed or not specified_____ Grade Level (Check all that apply and provide numbers or percentages if available) The Campbell Collaboration | www.campbellcollaboration.org 1. 2. 3. 4. 32 5th grade_____ 6th grade____ 7th grade; _____ 8th grade _____ Students' Ethnicity; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Black _____ 3. Hispanic_____ 4. Asian_____ 5. White_____ 6. Mixed_____ 7. Other_____ Students' Socioeconomic Status (Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Lower _____ 3. Lower middle_____ 4. Middle _____ 5. Upper middle_____ 6. Upper _____ 7. Mixed_____ Country; 1. Unspecified_____ 2. USA_____ 3. Canada_____ 4. Mexico/Latin America_____ 5. Europe_____ 6. Asia_____ 7. South America_____ 8. Cross-Cultural_____ 9. Other_____ Geographical Region in USA; Check all that apply and provide numbers or percentages if available) 1. Northeast _____ 2. Southeast _____ 3. Midwest 4. South Central _____ 5. Southwest _____ 6. Northwest _____ 7. Other_____ School Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Public _____ 3. Private _____ 4. Special school_____ The Campbell Collaboration | www.campbellcollaboration.org 5. Other_____ Community Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Urban_____ 3. Rural_____ 4. Suburban_____ 5. Other_____ Instructional Characteristics for comparison group #1 Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of sessions [e.g., 12]). o ________________ Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of average minutes per sessions [e.g., 40]). o ________________ Duration of Study (Unspecified = 00; List the number of months that the implementation of the comparison curriculum occurred). o _________________ 33 Percentage of comparison science instruction during research/instructional period; 1. Unspecified 2. 1-25% 3. 26%-50% 4. 51%-75% 5. 76% - 99% 6. 100% 7. Other Learning Responsibility: 1. Level 0 2. Level 1 3. Level 2 4. Level 3 5. Unidentifiable Type of Learning Task(Comparison Instruction group); 1. Unspecified 2. Basic skills/factual learning 3. Problem solving 4. Investigation 5. Project-based 6. Mixed types 7. Other Level of Technology 1. Unspecified The Campbell Collaboration | www.campbellcollaboration.org 2. 3. 4. 5. Supplementary/Digital Content Primary/Digital Content Regular Digital Productivity requirements Other Feedback and Assessment Practices (Teacher/Student) 1. Unspecified 2. No feedback 3. Minimal feedback 4. Elaborate feedback 5. Other Instructional/Teaching Characteristics • Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring student collaboration to accomplish a joint product; monitors and supports students collaboration in positive ways.) 1. No evidence 2. Some evidence 3. Extensive evidence • Contextualization/Making Meaning (e.g., Begins activities with what students already know from home, community, and school; encourages students to use content vocabulary to express their understanding.) 1. No evidence 2. Some evidence 3. Extensive evidence • Challenging Activities (e.g., Designs instructional tasks that advance students' understanding to more complex levels. Assures that students—for each instructional topic— see the whole picture as a basis for understanding the parts.) 1. No evidence 2. Some evidence 3. Extensive evidence • Instructional Conversation: (e,g, Arranges the classroom to accommodate conversation between the teacher and a small group of students on a regular and frequent basis. Guides conversation to include students' views, judgments, and rationales using text evidence and other substantive support.) 1. No evidence 2. Some evidence 3. Extensive evidence • Mode of Instruction 1. Unspecified 2. Whole-group instruction 3. Paired 4. Small-group instruction [3-5 members] 5. Individualized 6. Mixed 7. Other 34 The Campbell Collaboration | www.campbellcollaboration.org • Basis of grouping 1. Teacher selected purposeful 2. Teacher selected random 3. Student selected 4. No grouping • Role of Teacher 1. Unspecified 2. Deliverer of knowledge 3. Facilitator of groups/student learning 4. Modeling processes [e.g., problem solving] 5. Mixed 6. Other Experience and training context for comparison group #1 Reported Teachers' Experience with Comparison Science Instruction: 1. Unspecified 2. None 3. Some 4. Experienced Reported Students' Experience with Comparison Science Instruction 1. Unspecified 2. None 3. Some 4. Experienced Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of training (e.g., 15). o ________________ • Teacher Qualifications 1. Unspecified 2. Alternatively certified or provisional certificate 3. Certified in content area 4. Not certified in content area 5. Other Policy related to Comparison Science Instruction Level of Enacted Policy 1. Unspecified 2. School 3. District 4. State 5. Federal 6. Other 35 Focus 1. Unspecified The Campbell Collaboration | www.campbellcollaboration.org 2. 3. 4. 5. Reducing achievement gaps Increased use of inquiry science Increased critical thinking Other COMPARISON #2 Treatment 2st Comparison Condition; 1. Unspecified 2. Student receives nothing/treatment as usual 3. Teacher receives equivalent PD/Treatment as usual 4. Alternate Treatment (Specify) Demographic Information for comparison group #2 36 Student Sample Size (Report actual sample size; e.g., 3086). –––––––––––––––– School Sample Size (Report actual sample size; e.g., 73). ––––––––––––––––––– Student Sex (Check all that apply and provide numbers or percentages if available) 1. Males _____ 2. Females _____ 3. Mixed or not specified_____ Grade Level (Check all that apply and provide numbers or percentages if available) 1. 5th grade_____ 2. 6th grade_____ 3. 7th grade; _____ 4. 8th grade _____ Students' Ethnicity; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Black _____ 3. Hispanic_____ 4. Asian_____ 5. White_____ 6. Mixed_____ 7. Other_____ Students' Socioeconomic Status (Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Lower _____ 3. Lower middle_____ 4. Middle _____ 5. Upper middle_____ 6. Upper _____ 7. Mixed_____ The Campbell Collaboration | www.campbellcollaboration.org Country; 1. Unspecified 2. USA 3. Canada 4. Mexico/Latin America 5. Europe 6. Asia 7. South America 8. Cross-Cultural 9. Other Geographical Region in USA; Check all that apply and provide numbers or percentages if available) 1. Northeast _____ 2. Southeast _____ 3. Midwest 4. South Central _____ 5. Southwest _____ 6. Northwest _____ 7. Other_____ School Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Public _____ 3. Private _____ 4. Special school_____ 5. Other_____ Community Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Urban_____ 3. Rural_____ 4. Suburban_____ 5. Other_____ Instructional Characteristics for comparison group #2 Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of sessions [e.g., 12]). o ________________ Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of average minutes per sessions [e.g., 40]). o ________________ Duration of Study (Unspecified = 00; List the number of months that the implementation of the comparison curriculum occurred). o _________________ 37 Percentage of comparison science instruction during research/instructional period; The Campbell Collaboration | www.campbellcollaboration.org 1. 2. 3. 4. 5. 6. 7. Unspecified 1-25% 26%-50% 51%-75% 76% - 99% 100% Other Learning Responsibility: 1. Level 0 2. Level 1 3. Level 2 4. Level 3 5. Unidentifiable Type of Learning Task(Comparison Instruction group); 1. Unspecified 2. Basic skills/factual learning 3. Problem solving 4. Investigation 5. Project-based 6. Mixed types 7. Other Level of Technology 1. Unspecified 2. Supplementary/Digital Content 3. Primary/Digital Content 4. Regular Digital Productivity requirements 5. Other Feedback and Assessment Practices (Teacher/Student) 1. Unspecified 2. No feedback 3. Minimal feedback 4. Elaborate feedback 5. Other Instructional/Teaching Characteristics for comparison group #2 • Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring student collaboration to accomplish a joint product; monitors and supports students collaboration in positive ways.) 1. No evidence 2. Some evidence 3. Extensive evidence • Contextualization/Making Meaning (e.g., Begins activities with what students already know from home, community, and school; encourages students to use content vocabulary to express their understanding.) 38 The Campbell Collaboration | www.campbellcollaboration.org 1. No evidence 2. Some evidence 3. Extensive evidence • Challenging Activities (e.g., Designs instructional tasks that advance students' understanding to more complex levels. Assures that students—for each instructional topic— see the whole picture as a basis for understanding the parts.) 1. No evidence 2. Some evidence 3. Extensive evidence • Instructional Conversation: (e,g., Arranges the classroom to accommodate conversation between the teacher and a small group of students on a regular and frequent basis. Guides conversation to include students' views, judgments, and rationales using text evidence and other substantive support.) 1. No evidence 2. Some evidence 3. Extensive evidence • Mode of Instruction 1. Unspecified 2. Whole-group instruction 3. Paired 4. Small-group instruction [3-5 members] 5. Individualized 6. Mixed 7. Other • Basis of grouping 1. Teacher selected purposeful 2. Teacher selected random 3. Student selected 4. No grouping • Role of Teacher 1. Unspecified 2. Deliverer of knowledge 3. Facilitator of groups/student learning 4. Modeling processes [e.g., problem solving] 5. Mixed 6. Other Experience and training context for comparison group #2 39 Reported Teachers' Experience with Comparison Science Instruction: 1. Unspecified 2. None 3. Some 4. Experienced Reported Students' Experience with Comparison Science Instruction The Campbell Collaboration | www.campbellcollaboration.org 1. 2. 3. 4. Unspecified None Some Experienced Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of training (e.g., 15). o ________________ • Teacher Qualifications 1. Unspecified 2. Alternatively certified or provisional certificate 3. Certified in content area 4. Not certified in content area 5. Other Policy related to Comparison Science Instruction for comparison group #2 Level of Enacted Policy 1. Unspecified 2. School 3. District 4. State 5. Federal 6. Other Focus 1. 2. 3. 4. 5. Unspecified Reducing achievement gaps Increased use of inquiry science Increased critical thinking Other COMPARISON GROUP #3 Treatment 3rdt Comparison Condition; 1. Unspecified 2. Student receives nothing/treatment as usual 3. Teacher receives equivalent PD/Treatment as usual 4. Alternate Treatment (Specify) ___________________________________________________ ___________________________________________________ Demographic Information for comparison group #3 40 Student Sample Size (Report actual sample size; e.g., 3086). –––––––––––––––– School Sample Size (Report actual sample size; e.g., 73). ––––––––––––––––––– Student Sex (Check all that apply and provide numbers or percentages if available) 1. Males _____ 2. Females _____ The Campbell Collaboration | www.campbellcollaboration.org 3. Mixed or not specified_____ 41 Grade Level (Check all that apply and provide numbers or percentages if available) 1. 5th grade_____ 2. 6th grade_____ 3. 7th grade; _____ 4. 8th grade _____ Students' Ethnicity; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Black _____ 3. Hispanic_____ 4. Asian_____ 5. White_____ 6. Mixed_____ 7. Other_____ Students' Socioeconomic Status (Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Lower _____ 3. Lower middle_____ 4. Middle _____ 5. Upper middle_____ 6. Upper _____ 7. Mixed_____ Country; 1. Unspecified 2. USA 3. Canada 4. Mexico/Latin America 5. Europe 6. Asia 7. South America 8. Cross-Cultural 9. Other Geographical Region in USA; Check all that apply and provide numbers or percentages if available) 1. Northeast _____ 2. Southeast _____ 3. Midwest 4. South Central _____ 5. Southwest _____ 6. Northwest _____ 7. Other_____ School Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ The Campbell Collaboration | www.campbellcollaboration.org 2. 3. 4. 5. Public _____ Private _____ Special school_____ Other_____ Community Type; Check all that apply and provide numbers or percentages if available) 1. Unspecified_____ 2. Urban_____ 3. Rural_____ 4. Suburban_____ 5. Other_____ Instructional Characteristics for comparison group #3 Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of sessions [e.g., 12]). o ________________ Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of average minutes per sessions [e.g., 40]). o ________________ Duration of Study (Unspecified = 00; List the number of months that the implementation of the comparison curriculum occurred). o _________________ 42 Percentage of comparison science instruction during research/instructional period; 1. Unspecified 2. 1-25% 3. 26%-50% 4. 51%-75% 5. 76% - 99% 6. 100% 7. Other Learning Responsibility: 1. Level 0 2. Level 1 3. Level 2 4. Level 3 5. Unidentifiable Type of Learning Task(Comparison Instruction group); 1. Unspecified 2. Basic skills/factual learning 3. Problem solving 4. Investigation 5. Project-based 6. Mixed types 7. Other The Campbell Collaboration | www.campbellcollaboration.org Level of Technology 1. Unspecified 2. Supplementary/Digital Content 3. Primary/Digital Content 4. Regular Digital Productivity requirements 5. Other Feedback and Assessment Practices (Teacher/Student) 1. Unspecified 2. No feedback 3. Minimal feedback 4. Elaborate feedback 5. Other Instructional/Teaching Characteristics for comparison group #3 • Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring student collaboration to accomplish a joint product; monitors and supports students collaboration in positive ways.) 1. No evidence 2. Some evidence 3. Extensive evidence • Contextualization/Making Meaning (e.g., Begins activities with what students already know from home, community, and school; encourages students to use content vocabulary to express their understanding.) 1. No evidence 2. Some evidence 3. Extensive evidence • Challenging Activities (e.g., Designs instructional tasks that advance students' understanding to more complex levels. Assures that students—for each instructional topic— see the whole picture as a basis for understanding the parts.) 1. No evidence 2. Some evidence 3. Extensive evidence • Instructional Conversation: (e,g, Arranges the classroom to accommodate conversation between the teacher and a small group of students on a regular and frequent basis. Guides conversation to include students' views, judgments, and rationales using text evidence and other substantive support.) 1. No evidence 2. Some evidence 3. Extensive evidence • 1. 2. 3. 4. 43 Mode of Instruction Unspecified Whole-group instruction Paired Small-group instruction [3-5 members] The Campbell Collaboration | www.campbellcollaboration.org 5. Individualized 6. Mixed 7. Other • 1. 2. 3. 4. Basis of grouping Teacher selected purposeful Teacher selected random Student selected No grouping • 1. 2. 3. 4. 5. 6. Role of Teacher Unspecified Deliverer of knowledge Facilitator of groups/student learning Modeling processes [e.g., problem solving] Mixed Other Experience and training context for comparison group #3 Reported Teachers' Experience with Comparison Science Instruction: 1. Unspecified 2. None 3. Some 4. Experienced Reported Students' Experience with Comparison Science Instruction 1. Unspecified 2. None 3. Some 4. Experienced Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of training (e.g., 15). o ________________ • Teacher Qualifications 1. Unspecified 2. Alternatively certified or provisional certificate 3. Certified in content area 4. Not certified in content area 5. Other Policy related to Comparison Science Instruction #3 Level of Enacted Policy 1. Unspecified 2. School 3. District 4. State 5. Federal 6. Other 44 The Campbell Collaboration | www.campbellcollaboration.org Focus 1. 2. 3. 4. 5. Unspecified Reducing achievement gaps Increased use of inquiry science Increased critical thinking Other Outcome Measure(s) Cognitive Outcomes: Science Achievement; 1. Unspecified 2. Testing company standardized achievement test 3. Federal/national standardized test 4. State-level achievement test 5. District-level achievement test 6. School-level test 7. Grade-level test 8. Teacher-made test 9. Researcher-developed test 10. Authentic assessment 11. Other 45 Cognitive Outcomes: Critical Thinking in Science; 1. Unspecified 2. Testing company standardized achievement test 3. Federal/national standardized test 4. State-level achievement test 5. District-level achievement test 6. School-level test 7. Grade-level test 8. Teacher-made test 9. Researcher-developed test 10. Authentic assessment 11. Other Affective Outcomes: Science 1. Unspecified 2. Student attitudes toward science, or instruction 3. Academic self-concept or motivation 4. Other Behavioral Outcomes: Science 1. Unspecified 2. Student time-on-task 3. Student perseverance 4. Tasks attempted 5. Tasks completed 6. Success rate 7. Positive peer interaction 8. Interactivity with computers The Campbell Collaboration | www.campbellcollaboration.org 9. Other Quality of Study Indicators (Rigor of statistical design and analysis) Method of Observation of Independent Variable:(i.e., fidelity of implementation, data collection) Select all that apply. 1. Unspecified 2. Systematic observation 3. Informal observation 4. Student survey or interview 5. Teacher survey or interview 6. Administrator survey or interview 7. Computer logs 8. Other Pretest Equivalency: Has the initial differences between the two groups been accounted for? 1. Unspecified 2. Statistical Control (e.g., ANCOVA, regression) 3. Random Assignment 4. Statistical Control and Random Assignment 5. Gain Scores 6. Matching 7. Other Reported Reliability of Measures (Unspecified = 00; Actual reliability statistic (e.g., 70 or 83). o ___________________ Effect size information: Manner in Which Outcome Scores Are Reported Outcome: Reference Unspecified Standard to =0 scores = Outcome 1 Measure Cognitive Cognitive Cognitive Affective Affective Affective Behavioral Behavioral Behavioral Raw scores = 2; Percentile ranks = 3 Effect size information: 46 The Campbell Collaboration | www.campbellcollaboration.org Gain scores =4 Other =5 Groups compared: Group 1: __________________________________________ Group 2: ______________________________________________ Outcome A: Group 1 Pretest Group 1 Posttest Group 1 Follow-up Time: Group 2 Pretest Group 2 Posttest Group 2 Follow-up Time: Group 1 Pretest Group 1 Posttest Group 1 Follow-up Time: Group 2 Pretest Group 2 Posttest Group 2 Follow-up Time: Mean SD n d (author reported) Other statistics reported: Outcome B: Mean SD N d (author reported) Other statistics reported: Copy this page as needed to report on multiple comparison groups or multiple outcomes. Potential Confounds/Sources of Invalidity 1. History: Have specific events occurred between the first and second measurement in addition to the treatment variable? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 2. Maturation: Are there processes within the participants operating as a function of the passage of time [e.g., growing older, more tired] that might account for changes in the dependent measure? 1. Adequately controlled by design 2. Definite weakness of design 47 The Campbell Collaboration | www.campbellcollaboration.org 3. Possible source of concern 4. Not a relevant factor 3. Testing: Is there an effect of taking a test upon the scores of a second testing? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 4. Instrumentation: Do changes in calibration or observers' scores produce changes in the obtained measurement? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 5. Statistical Regression: Have groups been selected on the basis of their extreme scores? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 6. Selection Bias: Have biases resulted in the differential selection of comparison groups? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 7. Mortality: Has there been a differential loss of participants from the treatment and comparison groups? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 8. Selection-Maturation Interaction: Is there an interaction between extraneous factors such as history, maturation, or testing and the specific selection differences that distinguish the treatment and comparison groups? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 9. Reactive or Interaction Effect of Testing: Does pretesting influence the participants' responsiveness to the treatment variable, making the results for a pretested population unrepresentative of the effects of the treatment variable for the unpretested universe from which the participants were selected? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 48 The Campbell Collaboration | www.campbellcollaboration.org 4. Not a relevant factor 10. Interaction of Selection Biases and Treatment: Are there selective factors upon which sampling was based which interact differentially with the treatment variable? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 11. Reactive Effects of Experimental Arrangements: Are there effects of the experimental setting that would preclude generalizing about the effect of the experimental variable upon persons being exposed to it in non-experimental settings? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 12. Multiple-Treatment Interference: Are there non-erasable effects of previous treatments applied to the same participants? 1. Adequately controlled by design 2. Definite weakness of design 3. Possible source of concern 4. Not a relevant factor 13. Statistical Power: Is the sample size large enough to reject the null hypothesis at a given level of probability, or are the estimate coefficients within reasonably small margins of error? [A sample > 60 for groups such as classes, schools, or districts; a sample >100 for individuals]. 1. Probable threat [< 60 for groups or < 100 for individuals as the unit of analysis] 2. Adequately minimized [> 60 for groups; > 100 for individuals] 49 The Campbell Collaboration | www.campbellcollaboration.org
© Copyright 2026 Paperzz