Influence of Inquiry-Based Science Interventions on Middle School

Protocol:
Influence of Inquiry-Based Science Interventions on
Middle School Students’ Cognitive, Behavioral, and
Affective Outcomes
Louis Nadelson, Susan Williams, & Herbert Turner III
BACKGROUND
This proposed systematic review will gather and summarize empirical evidence of the
effectiveness of inquiry-based science instruction on cognitive, affective and behavioral
outcomes of students in grades 5 through 8. This is of particular importance in the United
States, given that the No Child Left Behind federal law requires states to assess student learning
in science. According to the 2003 Third International Math and Science Study (TIMSS), almost
35% of the 46 participating countries indicated that they place ―a lot of emphasis‖ on inquiry
skills (i.e., generating hypotheses, designing, and conducting investigations). This systematic
review will evaluate the empirical relationship between student outcomes and inquiry-based
science curricula resulting in data that can benefit educators, decision makers, and curriculum
developers in upper elementary schools, and middle schools within the United States with
potential for international implications.
Inquiry-Based Science Pedagogy versus Traditional Science Pedagogy
In a traditional science classroom, the teacher uses a textbook and worksheets to transmit
knowledge and explain conceptual relationships to students. Students listen to their teachers
and are later responsible for memorizing the information to restate on a test or some other
written evaluation of knowledge. In this type of classroom, the teacher and printed sources
(texts, worksheets) are the source of authority and students are passive receivers of information.
All the students learning in these traditional instructional environments are expected to
complete the same teacher prescribed activities (Anderson, 2002).
The instructional approach in an inquiry-based science classroom looks very different from a
traditional science classroom. Teacher and student roles are shifted, as well as the kinds of
learning activities and evidence for learning. Teachers in inquiry classrooms take on the role of a
coach and model learning and inquiry processes for students. In inquiry instruction students
may be responsible for naming the scientific question under investigation, designing
investigations to research their question, and interpreting findings from investigations. In an
inquiry based curriculum, students may be working on different research questions, or may be
at different phases in the research cycle. Students engaged in an inquiry curriculum are required
1
The Campbell Collaboration | www.campbellcollaboration.org
to be more self-directed and along with the teacher there are multiple forms of knowledge
authority in the classroom.
For example, most middle school Life Science curricula include a unit on plants and adaptation.
In these units, students typically learn about the function of a leaf (i.e., photosynthesis and
transpiration) and in some classes how different leaves survive in varying climatic conditions.
Depending on the pedagogical approach of the teacher, students in different classes can learn
the same content, but in very different ways.
Sandoval and Millwood (2007) describe an inquiry-based unit on plants and adaptation in a
middle school science classroom. Students began the unit by being asked to look at a picture of
some local mountains and report what they noticed. Students noticed that the colors of the
vegetation in the photo looked different at different elevations. Students’ initial questions guided
the development of their small group investigations as they relied on a variety of environmental
data from a sensor network. At the end of the unit, students wrote explanations about why the
plants looked different, which included issues of adaptability, climate, and elevation, among
others. Student explanations went through a blind peer review process from their classmates.
The unit culminated in a class discussion during which students developed a shared
understanding of the environmental factors that contributed to the difference in appearance of
the vegetation in the photograph. In this curriculum students became active researchers, gaining
skills associated with scientific research and learning through a process of active engagement.
Why Inquiry-Based Science?
For generations, science teaching has relied on methods that train students to follow directions
with little connection to doing authentic scientific research. Although students have become
accustomed to this method of learning, most do not form a deep conceptual understanding of
science (National Center for Education Statistics, 2001). The National Science Education
Standards (National Research Council, 1996) urge the use of inquiry instruction to help students
develop deeper conceptual understanding in science. The prediction is that inquiry may give
students control over their own formation of scientific knowledge. Students become responsible
for identifying problems that need investigation, forming hypotheses, designing methods for
testing their hypotheses, carrying out their methods, analyzing results, and finally forming
conclusions.
It is hypothesized that when students learn science through inquiry, they are essentially
imitating practicing scientists, which creates the opportunity for deep conceptual change
(Posner, Strike, Hewson, & Gertzog, 1982). Typical lecture-style science classrooms do not
provide this opportunity (Bruer, 1993). In lecture-style classrooms, where there is a laboratory
component, students practice science using ―cookbook‖ methods. They follow prescriptive
directions of a laboratory exercise working for results that reveal the ―right‖ answer. When
students are asked to conduct their own inquiry it challenges their beliefs about how science is
done and how it is learned (Roth & Roychoudhury, 1994; Tobin, Tippins, & Hook, 1995). The
outcomes of some published research comparing students who are taught science using an
inquiry approach with students taught using more traditional approaches have revealed
differential levels of science achievement (Bredderman, 1983; Guzzetti, Snyder, Glass, & Gamas,
1993; Lee & Songer, 2003; Shymansky, Hedges, & Woodworth, 1990).
According to the U.S. Department of Education (2006) ―America’s schools are not producing the
science excellence required for global economic leadership and homeland security in the 21st
2
The Campbell Collaboration | www.campbellcollaboration.org
century.‖ Data from the federal government defines the challenge in science education in the
U.S. as follows:

Internationally, U.S. students ranked 8th out of 12 countries on the Program for
International Student Assessment (PISA) science test (2003).

According to the 1995 Third International Mathematics and Science Study (TIMSS), U.S.
4th graders ranked 2nd, but by 12th grade, they had fallen to 16th, behind nearly every
industrialized rival and ahead of only Cyprus and South Africa.

Eighty-two percent of our nation’s 12th graders performed below the proficient level on
the 2000 National Assessment of Educational Progress (NAEP) science test.
"More Americans will have to understand and work competently with science and math on a
daily basis . . . the inadequacies of our systems of research and education pose a greater threat to
U.S. national security over the next quarter century than any potential conventional war that we
might imagine" (U.S. Commission on National Security in the Twenty-First Century, 2001).
The goals of schooling have changed over time with each generation’s graduates being expected
to identify and solve problems of increasing complexity at ever-increasing levels of proficiency.
With the advent of the No Child Left Behind (NCLB) Law, this level of performance is expected
for all children independent of their gender, race, socioeconomic level, or English language
proficiency. Since NCLB was enacted, schools have focused curricula, instruction, and
assessment efforts almost exclusively on mathematics and reading – the two areas for which
states were initially held accountable to meet adequate yearly progress (AYP). As dictated by the
NCLB law, APY has been expanded to include science. NCLB now requires that states measure
students’ progress in science at least once in each of three grade spans (3-5, 6-9, 10-12) each
year.
Previous Reviews and Meta-Analyses
Despite the dearth of evidence regarding instruction with specific science curricula, much is
known about science instruction in general. Early meta-analyses summarizing the research on
innovative methods of science instruction developed during the 1960-1980 period indicated
differential levels of student achievement by students taught using inquiry approaches when
compared to students receiving a more traditional approach to science instruction (Lott, 1983;
Shymansky et al., 1990; Shymansky, Kyle, & Alport, 1983; Weinstein, Boulanger, & Walberg,
1982; Wise & Okey, 1983). Shymansky and his colleagues (1983; 1990) summarized studies
examining the effect of inquiry approaches on 18 different outcome measures, including science
achievement, attitude toward science, and science process skills. Their re-analysis of the 1983
data in 1990 resulted in their reporting mean effect sizes that ranged from .18 to .36 (α=.05)
when compared with non-inquiry curricula on outcomes of composite achievement, perception,
and process, but no significant effect on analytic outcomes. It is also important to note that gains
in content knowledge were not sacrificed to obtain gains in process skills and attitude. Lott
(1983) found that while students of all ages benefited from inquiry type approaches, the benefits
for intermediate students (grades 4-6) were the greatest (d=.24, p<.05, CI not available ).
Similarly, Wise and Okey (1983) reported an overall curricular advantage using the inquirydiscovery method (d=.41 p<.05, CI not available) when compared to other teaching strategies.
A body of research emphasizes the importance of the type of assessment in capturing student
performance on process skills. For example, Weinstein et al. (1982) categorized outcome
measures into Favorable Toward Inquiry Methods, Favorable Toward Traditional Methods, or
Neutral Assessments, and reported an effect size of .25 (p<.05, CI not available) across all
3
The Campbell Collaboration | www.campbellcollaboration.org
categories. In general, findings reported to date on inquiry science indicate that it leads to
moderate and significant effects in student science achievement when compared to traditional
instruction.
Recently, Project 2061 completed an in-depth curricula analysis of middle school programs that
are in wide use by school districts or states as identified by the Education Market Research
Report (Kesidou & Roseman, 2002). The analysis highlighted nine complete science programs
and the benefits of each program. Missing from this analysis were student outcome data; there
were no indications of the effects on student achievement, interest, or other measures. The
National Science Resource Center (1998) put forth a similar effort and did a content analysis of
all inquiry-based middle school science supplementary and core curricula that met the National
Science Education Standards. An abundance of inquiry-based curricular resources were
identified, however, as with the Project 2061 study, there were no indications of the effects of the
identified curricula on student outcomes.
Need for this Systematic Review
A serious challenge identified by the U.S. Department of Education is the lack of American
competitiveness internationally in the sciences and mathematics. Meanwhile, a rapidly
expanding knowledge base in the sciences makes it difficult for educational curricula to keep up
with the changes. Adding to the urgency for resolution is the projected job growth in areas such
as healthcare and computer science. The increased number of public policy issues related to
biotech, technology, and the Internet make it extremely important that graduates have a strong
foundation in science. Further, the NCLB law now requires the inclusion of science knowledge
assessments as a measure of AYP. In response, schools are seeking instructional materials for
science education that have been proven effective in increasing student achievement using
rigorous, scientifically based research methods.
To date there are no systematic reviews that we know of, which have rigorously evaluated the
effects specific to middle school inquiry curricula that are based on student outcomes. Previous
reviews, the most recent published in 1990 (Shymansky et al.), indicate that student
achievement is higher in inquiry-based classrooms when compared to traditional classrooms.
The inquiry-based curricula analyzed in Shymansky et al. are not explicitly defined, and now,
almost two decades later, are almost certainly out of print and date. At the present time, there
are no resources that educators can use to identify the most effective inquiry-based curricula to
use in classrooms. Although informative, Shymansky et al.’s review, does not elucidate the
nature of the measures of student achievement used in the included studies. The proposed
systematic review will inform educators, policy makers and curriculum developers of the specific
inquiry-based curricula that have been reported to be effective and ineffective at the middle
school level while clearly communicating the nature of the assessments used to determine
achievement, affect and behavioral outcomes.
Middle school aged students were selected as the target for this study because interest in, and
attitudes toward, science typically decline as students go through middle and high school
(Atwater, Wiggins, & Gardner, 1995). Research has also shown that students’ attitudes toward
science are strongly correlated with their achievement in science, as well as the number and type
of science courses they take (Simpson & Oliver, 1990). Although there may be international
variations in what might be labeled as middle school, by general convention it is widely accepted
the middle school encompasses the 6th year of formal schooling (grade 5) to the 9th year of
formal schooling (grade 8).
4
The Campbell Collaboration | www.campbellcollaboration.org
OBJECTIVES
This review will gather, evaluate, and summarize studies that empirically test the effects of
inquiry based science instruction on the cognitive, affective and/or behavioral measures of
middle school-age students.
We will utilize the following questions to guide our inquiry:
1) What are the salient characteristics communicated in the literature with respect to
structure and implementation of inquiry science curriculum instruction and the nature
of the measures of student outcomes?
2) What does the quantitative research that has been published formally (in peer review
journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and September 2008 describing random assignment control
experimental and quasi-experimental investigations reveal about the influence of inquiry
instruction on measures of student cognitive outcomes?
3) What does the quantitative research that has been published formally (in peer review
journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and 2008 describing random assignment control experimental
and quasi-experimental investigations reveal about the influence of inquiry instruction
on measures of student affective outcomes in the context of learning science?
4) What does the quantitative research that has been published formally (in peer review
journals, conference proceedings) informally (Internet, edited volumes, texts) and unpublished between 1990 and 2008 describing random assignment control experimental
and quasi-experimental investigations reveal about the influence of inquiry instruction
on measures of student behavioral outcomes in the context of learning science?
METHODOLOGY
Eligibility Criteria
Types of interventions (and comparisons) to be included in the review
One of the recurrent issues associated with inquiry curriculum is that inquiry means different
things to different people. For example an author may use the terms ―discovery learning‖ to
represent what may qualify as inquiry instruction. Because of the potential for conflation of
terms, it is critical to examine the curriculum processes and not the labels used by authors
representing an investigated intervention. For the purposes of this review, we adopt a
classification scheme similar to what was originally proposed by Schwab (1962). In this scheme
an inquiry curriculum contains three essential elements: 1) scientifically oriented question(s), 2)
a methodology for gathering empirical evidence or data that can be used to address the
question(s), and 3) an explanation or interpretation of the evidence in the context of the
research question. To be deemed an inquiry curriculum and considered for this review,
curricular/instruction science education interventions must contain all three of these essential
elements. (See Table 1.)
Schwab’s (1962) hierarchical scheme denotes levels of inquiry by the level of teacher and learner
involvement. (See Fig. 1.) This four tiered classification scheme is based on the level of teacher
and student responsibility for the three essential inquiry elements. The level of inquiry increases
5
The Campbell Collaboration | www.campbellcollaboration.org
as the responsibility shifts from teacher to learner. At level 0, the teacher provides the questions,
methodology for gathering data, and interpretation of the data. By level 4, the learner is
responsible for all these elements. Although the specific classification of inquiry by Schwab’s
model may leave some questions regarding difference with traditional curriculum, a systematic
review that conducts analysis by level of inquiry may help illuminate the subtle difference that
the level of inquiry structure may have on learning.
Table 1. Schwab’s levels of inquiry.
Inquiry Level
Source of the
question
Data Collection
Methods
Interpretation of
results
Level 0
Given by teacher
Given by teacher
Given by teacher
Level 1
Given by teacher
Given by teacher
Open to student
Level 2
Given by teacher
Open to student
Open to student
Level 3
Open to student
Open to student
Open to student
This Schwab classification system will be used in the coding guide to identify and classify
curricula by level of inquiry.
Curriculum Source
We recognize the importance of portability and reproducibility of any inquiry-based curriculum
for which we report research findings. The challenge is that ―published‖ curriculum may
encompass a wide range of channels from commercially available, public and government
provided, to school district developed and disseminated (i.e., publication type). It is beyond the
scope of this study to determine current availability and/or reproducibility of curricula in the
studies. Therefore, while the ―publication type‖ of each curriculum will be identified and
reported, it will not be used as a criterion for inclusion or exclusion.
Setting and participants
The intervention must have been carried out with students in science classes in grades 5-8 in
public, private, parochial, or alternative schools. Studies that identify special needs (learning
disabled and gifted) children will be included in the review only if the students were in
immersion programs that routinely placed them in the regular classroom during science lessons
or if they are being compared to an equivalent group of students, and the treatment curriculum
was consistent and reproducible and not modified to meet individual needs as required in the
conditions of IEPs. Studies that include students in grades in addition to grades 5-8 will be
included if they reported results separately for each grade so that the target grades can be
separated out for analysis. The intervention must have taken place in the school setting during
the traditional academic schedule (i.e., Monday through Friday, during regularly scheduled
school year). Interventions that took place in after school programs, summer school programs,
or restricted to special needs classrooms, will be excluded from the review.
6
The Campbell Collaboration | www.campbellcollaboration.org
Study design
Controlled random assignment and quasi-experimental design studies will be considered for
review. The inclusion both of these designs increases the probability that we will have a
sufficient number of studies extracted from the literature search to conduct a reasonable metaanalysis. Reviewers will evaluate research studies that report the influence of well-defined
inquiry interventions by conducting comparisons between clearly distinguished intervention
and control (or comparison) groups, using a random assignment, controlled trial design. Control
groups can represent no treatment or ―treatment as usual‖ conditions. If the experiment is a
quasi-experimental design it must report pre and post intervention measures and report that
prior to the intervention the students were engaged in traditional non-inquiry science
instruction and the intervention is clearly an inquiry approach to teaching science. For inclusion
quasi-experimental studies must report measures for participants with scores matched on
relevant characteristics prior to the delivery of the intervention, or statistically equated, in order
to ensure that groups were as similar as possible at baseline.
Types of measures to be included
The outcome measures of at least one cognitive student outcome must be reported (e.g.,
academic achievement, critical thinking/transfer, cognitive engagement). Depending on
availability, the reviewers also expect to report on affective (e.g., student interest and/or
motivational engagement) and behavioral (e.g., time on task, homework completion, discipline
referral, class attendance) outcomes. All measures must be quantitative in nature, and report
reliability statistics.
Outcome measures using standardized measures of science achievement (knowledge of science,
knowledge of scientific processes, knowledge of the nature of science) with validity and
reliability will be included for the review (e.g. State level science achievement exams, Cornell
Critical Thinking Test). Unstandardized measures of achievement, and other domains, with
adequate face validity will also be included (e.g., student interest inventories, self-report
measures for affective and cognitive engagement, and interest). Measures reported in the Buros
Institute of Mental Measures Yearbook or published by the University of Nebraska Press will be
considered standardized measures; and unless able to substantiate claims all other measures
will be considered unstandardized.
Information search and retrieval
An attempt will be made to exhaustively search a range of published and unpublished
documents as recommended by the Campbell Collaboration Information Retrieval Policy Brief
and to retrieve all eligible studies based on the requirements previously stated.
Our search will include, but will not be limited to: peer review publications, National Science
Foundation funded project reports, reports from commercial publishers, conference
proceedings, edited texts, dissertations, studies distributed through non-traditional channels
including the Internet and studies recommended by recognized experts in the field.
As recommended in the Information Retrieval Policy Brief a systematic, thorough, and
comprehensive approach that minimizes bias will be utilized to assure that information retrieval
is exhaustive and inclusive. Details of our search are described in the following paragraphs.
7
The Campbell Collaboration | www.campbellcollaboration.org
Database searches
Our search will begin with the bibliographic databases that are most appropriate for our topic
including: Web of Science (WOS; including the Social Science Citation Index), SCOPUS
(includes over 25,000 international journal titles), ERIC (Educational Resources Information
Center: which is considered to be an international database), Education Abstracts, PsychInfo,
Proquest Digital Dissertations, Google Scholar, and the U.S. Government Printing Office
publications.
The initial search will be conducted using the controlled vocabulary appropriate for each of
these databases in each of four categories: the population of interest, the intervention, the
outcomes, and the content. Keyword searching will be employed in appropriate contexts and for
particular terms. Variants of search terms will be located using wild card characters, e.g.,
constructiv* to locate constructivism and constructivist. We anticipate that we will use the
following search terms and databases, however, the form of all searches will be presented in the
final report exactly as entered in the database search engine so that the searches can be
replicated exactly.
1. Population of interest: middle school students (PsychInfo, ERIC), junior high school
students (PsychInfo, ERIC), grade 5 (ERIC), grade 6 (ERIC), grade 7 (ERIC), grade 8
(ERIC), intermediate grades (ERIC), intermediate school students (PsychInfo).
2. Intervention: inquiry (ERIC, WOS), inquiry based learning (ERIC), active learning
(ERIC), problem based learning (ERIC), problem solving (PsychInfo), discovery learning
(ERIC), discovery teaching method (PsychInfo), experiential learning (ERIC, PsychInfo),
constructivism (learning) (ERIC, PsychInfo), project based science (WOS), sciences
(ERIC), science instruction (ERIC), student centered instruction (ERIC).
3. Outcomes: academic achievement (ERIC, PsychInfo), science achievement (ERIC,
PsychInfo), student evaluation (ERIC), student improvement (ERIC), grades (scholastic)
(ERIC), educational measurement (PsychInfo), affective behavior (ERIC), affective
education (PsychInfo) affective measures (ERIC), behavioral objectives (ERIC), behavior
(ERIC), behavior patterns (ERIC), student outcomes (WOS), student engagement
(WOS).
4. Content: Science (WOS) biology (WOS), natural science (WOS) physical science (WOS),
life science (WOS), earth science (WOS), science education (WOS), chemistry (WOS)
STEM education (ERIC), physics (WOS).
We will conduct our database searches using Boolean operators such as AND and OR to increase
the efficiency and precision of our searches while reducing the probability of false positives. The
use of Boolean operators in our search will allow us to be more comprehensive in our search
methods by attending to a wide variety of variables in a single search. Furthermore, we can
combine terms from the Population, Intervention, Outcomes and Content groupings and search
for relevant articles simultaneously. This will also streamline the process of documenting our
search criteria and associated results.
Other Literature and the Corresponding Searches
Other methods of locating pertinent studies will also be used. The bibliographies of retrieved
studies will be searched to see if they contain eligible studies that would not be readily retrieved
using databases of published literature, (e.g., personal communication, unpublished study). In
addition, the reviewers will search the Social Science Citation index using the names of the
8
The Campbell Collaboration | www.campbellcollaboration.org
authors of all retrieved studies to determine if they have conducted other investigations that
might be eligible for inclusion as qualified research.
In order to minimize publisher bias, we will further attempt to locate eligible unpublished
studies by contacting experts in middle school science research or researchers of inquiry science
curricula. We will ask these experts to identify studies or to recommend other experts that we
should consult. We will use this snowball sampling technique until we have exhausted our list of
contacts or the process is no longer fruitful.
Searches of the Internet using reputable search engines, e.g., Google Scholar, etc. will be
conducted to locate pertinent studies that were not published in peer-reviewed publications, or
other works typically referenced in databases. Conference proceedings in related fields will be
searched either electronically or by hand if electronic means are not available. In addition, book
chapters, reports of relevant research organizations will also be searched.
Studies identified through these searches will be obtained from the library, from inter-library
loan, from the publisher, or directly from the authors themselves.
Selection of studies
The abstracts of initially identified reports of studies will be screened to determine if these
studies meet the study eligibility criteria outlined in a previous section, e.g., student population,
research design type, setting, or intervention. Studies that pass this screening or are not clearly
ineligible will be subjected to a final screening by two trained reviewers. Final eligibility
decisions for the inclusion of studies that cannot be classified though examination of the
abstract will be determined by screening the full text of report for all aforementioned eligibility
criteria.
Assessment of study quality will be documented in the coding guide during data extraction. The
Quality Assessment section of the coding guide will use simple checklists rather than quality
scales. The checklists have been designed to determine which aspects of a quality investigation
were attended too in the reports of the investigations. Some of our quality assessments are
simply checklists used to simply determine the presence of detectable levels of reporting, using
simple ―Yes‖ or ―No‖ responses. If warranted we have included additional levels of coding to be
utilized to further clarify the quality details of the study. The assessment of study quality will
consider the following characteristics of studies (when they are reported):
 Assessment of randomization procedures
 Handling of attrition
 Baseline equivalence of groups
 Method of measurement of the independent variable
 Reported reliability of measures
For example, the checklist for baseline equivalence of groups includes the following items:
1.
2.
3.
4.
5.
9
Unspecified
Statistical Control (e.g., ANCOVA, regression)
Random Assignment
Statistical Control and Random Assignment
Gain Scores
The Campbell Collaboration | www.campbellcollaboration.org
6. Matching
7. Other
Methods for data extraction of included studies
We have developed a coding form and manual for this review (see Appendix) using Lavenberg,
(2007) and Pearson, Ferdig, Blomeyer, & Moran, 2005 as models. Coded elements include
the following:
 Study level characteristics (e.g., publication type, location of study and so forth),
 Methodological characteristics (e.g., assignment to conditions, unit of analysis, fidelity
of implementation and so forth)
 Participant characteristics (e.g., gender, age, ethnicity, baseline level on student
outcomes and so forth)
 Curriculum characteristic (e.g., inquiry level, core or supplementary,
duration/frequency and so forth)
 Outcomes (e.g., student achievement, affective, and behavioral measures)
Coding For Inquiry
Once a curriculum is positively identified as inquiry, it will be classified according to Schwab’s
hierarchical scheme denoted by the level of teacher and learner involvement. (See Table 1).Using
this Schwab’s classification scheme for coding we will be able to identify and classify curricula
into similar levels of inquiry (See Table 1). Further, this scheme allows us to consistently use the
term inquiry and attend to the different levels of inquiry. We assume the ability to compare the
different levels of inquiry. However, the ability to actually compare the effects of various levels
of inquiry to a control or comparison curriculum is contingent on whether inquiry condition is
reported according to these levels or reported with sufficient detail that we can map the
description of instruction to these levels.
For studies to be eligible for this review they must describe the curriculum or inquiry science
units in sufficient detail to identify it as an inquiry curriculum. If it cannot be identified as an
inquiry curriculum from the published information and the curriculum cannot be located, we
will attempt to contact the authors of the study or other researchers familiar with the
investigation. If this is not possible we will eliminate the study from further evaluation because
it is not possible to determine if the inquiry communicated in the study is truly a form of
curriculum, or the use of ―inquiry‖ convenient science education term.
Proposed quality assurance procedures (e.g. independent double data extraction etc.)
During Phase I and Phase II of the coding process, we intend to use multiple coders to dual code
100% of the studies that meet our eligibility criteria in the information retrieval process. Coders
will be trained in the use of the coding guide. Coding training will begin following an initial
search of the literature. Initially, each coder will independently code 20 articles selected from
our initial search using the Phase I questions in the coding guide. Upon completion, the coders
and principal investigators will examine the outcome and determine the level of inter rater
reliability. This will continue with a new set of articles until a 90% or greater interrater
reliability is achieved for Phase I coding. Upon completion we will compare results and any
significant differences will be reconciled through discussion facilitated by one of the Principal
Investigators. Once interrater reliability is established for Phase I coding, the coders will work
independently to code the remaining articles from our initial search of the literature. Upon
completion, the results will be compared and the principal investigators will facilitate
10
The Campbell Collaboration | www.campbellcollaboration.org
discussions that lead to consensus decisions on all coding. This process will be repeated for
Phase II of the coding process.
Following the completion of Phase I and Phase II coding, the PIs and coders will meet for two
days to discuss and establish interrater reliability for Phase III coding. During this meeting we
will review the Phase III coding scheme, independently code studies, compare results,
determine interrater reliability and repeat to assure 90% or greater interrater reliability.
Following the meeting both coders will work independently to code all pertinent studies at
Phase III of the coding. At the 50% mark and upon completion we will compare results and any
significant differences will be reconciled through discussion facilitated by one of the Principal
Investigators. Overall during the analysis process, 100% of our included studies will be double
coded, with difference in report coding reconciled.
All Phase III codes will be entered into the TrialStat SRS database for storage and retrieval for
further analysis.
Methods for Effect Size Computations
There are a variety of effect size formulas available to quantify the mean difference, on an
outcome, between two groups in standard deviation units (Hunter & Schmidt, 2004; Lipsey &
Wilson, 2001; and Rosenow & Rosenthal, 2003). The choice of which formula to use depends on
three study characteristics: (1) the scale on which outcome variable(s) are measured, (2) types of
research designs used in studies included in the review reviewed, and (3) format that outcome
data is reported in studies included in the review.
Depending on these three factors, we will compute effect sizes using one of the following
approaches, all of which assume the comparison on an outcome is between two groups:
(1) Measurement Scale of Outcome. For outcomes reported on a continuous (interval or
interval ratio) scale, we will compute the effect size using the formula for the
standardized mean difference (d index) with a small sample correction (Hedges, 1981).1
For outcomes reported on a dichotomous scale, we will compute an odds ratio.
(2) Type of Research Design. For outcomes reported on a continuous scale in experiments or
quasi-experiments, but the post-test (or post intervention) mean difference has been
adjusted for pre-test (or pre intervention) differences, we will compute the effect size
using the formula for the standardized mean difference (d index) with a small sample
correction. However, we will use the adjusted means in the numerator of the formula.
(3) Data Format for Outcomes. For studies that report outcome data in formats other than
means, standard deviations or sample sizes, algebraically equivalent formulas will be
used to calculate the effect size. For example, if a study reports only an independent t
statistic and sample sizes for each group, the algebraic equivalent formula for computing
an effect size will be applied.
We will calculate effect sizes using two computer software packages: Comprehensive MetaAnalysis 2.2046 (CMA 2.2) and ES (Shadish, Robinson, and Lu, 1999). Because CMA 2.2 allows
for effect size calculation using over 100 data entry formats, the software should be sufficient for
1
A small sample is defined as sampled for both groups combined less than 20 ( n < 20).
11
The Campbell Collaboration | www.campbellcollaboration.org
calculating effect sizes from 99% of the data formats reported in the literature.2 In the event we
encounter an exotic data reporting format that CMA 2.2 cannot handle, the ES software
provides additional computational flexibility calculating.
A common mistake made in the education research literature where randomized controlled
trials are used to test the effectiveness of an intervention is to analyze the individual level data
when random assignment was at the cluster level. For example, school may be randomly
assigned to conditions in which case the effect size formula should be applied at the school level-that is, for outcomes measures on a continuous scale, school means, standard deviations, and
sample sizes were used in the effect size formula for the standardized mean difference. In this
review, we will use the unit of random assignment for effect size formulas, when possible. In
studies where there is not enough information to applied an effect formula and the authors
reported effect sizes reported using units not at the level of random assignment, we will apply a
Hedges clustering correction to the significance level and enter the effect size and correction pvalue in the CMA2.2 using the ―computed effect size‖ option.
Related to the last point, effect sizes calculated using the ES software can be entered into CMA
2.2 as a ―computed effect size‖ (rather than be computed using the formulas built into CMA 2.2)
so that all effect sizes—whether computed using CMA2.2 or ES,—will be housed in the same
analytic database for synthesizing. The methods for synthesizing effect size are discussed next.
Methods for Synthesizing Effect Sizes
General Approach
A critical assumption of meta-analysis is that the effect sizes used to obtain the combined effect
(weighted average of effect sizes) are independent. This means that when combining effect sizes,
there can be only one effect size per study.3 So as to not violate the assumption of independence
of effect sizes, only one effect size per study will be used in the meta-analysis. In situations when
there is either a) more than one group comparison on the same outcome, or b) more than one
outcome for the same group comparison, or both, we will either select one effect size or compute
an unweighted average depending on which is more appropriate. For example, suppose in an
included study there is one comparison between the inquiry-based science curriculum group
and the control group, but for two achievement outcomes: 1) facts and concepts and 2)
reasoning. Because the primary research objective is the influence of the inquiry science
curriculum on student cognitive outcomes, it would be conceptually appropriate to use the
average of the effect sizes from the two measures to represent the effect size for the study in the
meta-analysis.4
CMA 2.2 has a formula screen that allows the user to verify the formula used to compute the effect size and how that
formula was applied to the data in the database. This screen will be used to verify that all formulas were appropriate
and applied correctly.
3 This is true if and only if a HLM model is not used to combine effect sizes and to account for their nonindependence. When HLM models are used, there can be more than one effect size per study (see Raudenbush and
Byrk, 2002, p208). However, this is an underdeveloped area in the meta-analytic literature for the behavioral and
social sciences and, therefore, is not used here.
4 In CMA2.2, this is done by selecting ―mean of the selected outcome‖ which takes the average of each of the two data
points (e.g., means, sample sizes, and standard deviations) and computes the effect size that represents the study
effect size and is used in the meta-analysis.
2
12
The Campbell Collaboration | www.campbellcollaboration.org
Combining Effect Sizes
We assume that the effect size in each study estimates a different effect size in the population
(Lipsey & Wilson, 2001). Under this assumption, a random effects model should be used
(Borenstein, Hedges, & Rothstein, 2007). When combining effect sizes under this model, it is
assumed the effect size estimates will vary from study to study because of differences among the
study population parameters (between-studies variation) and sampling of different subjects
within the study populations (within-study variation). Results from the random effects model
allow for inferences to the population of studies from which the set was sampled. In other
words, the observed combined effect can be extrapolated beyond the studies included in the
meta-analysis (Kline, 2004; Borenstein, Hedges, & Rothstein). We will report the combined
effect (and its confidence interval), generated by both the fixed and random effect models as
there are no additional computational cost to produce both using CMA 2.2.
To start, effect sizes will be averaged across studies using a fixed effect model by applying an
inverse variance weighting of the individual effect size to weight studies according to their
variation in sample sizes, and then averaging these effect sizes. This is what Hunter and Schmidt
(2004) refer to as a ―bare-bone‖ meta-analysis which controls only for sample size.
The "bare-bones" meta-analysis is the usual first step in a meta-analysis, whether a fixed or
random effects models used. This analysis gives the meta-analysts estimates of the effect size,
controlling for no other study-level factors except sample size.
We will conduct the bare-bones meta-analysis a random-effects model for the purpose of
evaluating the effect of inquiry-based science curriculum for various outcomes.
However, to obtain the I-squared statistic (and its confidence interval) which quantifies the
amount of variability (or heterogeneity) in the effect sizes from which the combined (or average)
effect size is derived, the bare-bones meta-analysis must be done using a fixed effects model. It
is for this purpose, and this purpose only, that the fixed effects model will be used.
It is important to distinguish this approach, simply using the fixed effects model to obtain Isquared (and its confidence interval) to quantify variability in study effects sizes from making a
decision to move from a fixed effects model to a random effects model based on the values of Isquared (or the Q-statistic). Although this approach represented the convention wisdom among
meta-analysts previously, it is no longer considered sound practice (see Borenstein, Hedges, &
Rothstein, 2007: Meta-Analysis, fixed effect vs. random effects, page 29). Instead, we have
made the decision to use the random effects model in the protocol and use the fixed-effects
model initial to obtain the I-square to quantify the substantial amount of variability in the effect
sizes.
In CMA 2.2, this is implemented by running the "bare-bones" meta-analysis and reading the Isquared statistic and its confidence interval from the fixed effects model line, but the random
effects information is also presented simultaneously--these will be the results used to shed light
on our understanding of curriculum effects, if they exist or are detectable.
This weighting provides more precise studies (e.g., studies with larger sample sizes) being given
greater weight than less precise studies in the averaging of effect size. Under a fixed effects
model, the inverse weight only accounts for within study sampling of participants but does not
take into account additional random variability that occurs from between study sampling from a
larger population of studies (Borenstein, Hedges, & Rothstein, 2007). However, a random
13
The Campbell Collaboration | www.campbellcollaboration.org
effects model takes this variability into account by adding the additional between study variance
components to the inverse variance weight. As a result, the weighting is more evenly distributed
across effect sizes. The combined effect size (and its confidence interval) will be used to address
the research questions and the basis for all subsequent analysis.
Homogeneity Analysis
A homogeneity analysis is used to evaluate how representative the average effect size is of the
study effect sizes from which it was derived. Roughly, the more homogeneous (or less
heterogeneous) the study effect sizes, the more representative is the average. Specifically, a
homogeneity analysis examines whether the variation in a set of effect sizes may be attributed to
sampling error alone or to other factors—that is, there is the expectation that effect size will have
some variability (attributable to within study sampling of participants) and as such will deviate
somewhat from the average effect. The
statistic will be used to describe the percentage of
total variation across studies due to heterogeneity rather than chance, or within study sampling
as described in Higgins, Thompson, Deeks, and Douglas, 2003, p.558. The
statistic lies
between 0% and 100%, with 0% indicating no observed heterogeneity and increasing values
indicating increasing heterogeneity. Heuristically,
values of 25%, 50%, and 75% values can
be interpreted as low, moderate, and high (see Higgins, Thompson, Deeks, and Douglas, 2003,
p.558). The
statistic, along with the Q statistic, is produced with a ―bare bones‖ meta-analysis
in CMA 2.2. When the number of effect sizes in the study is small, the
statistic is reliable that
the Q statistics because the former is not sensitive to sample size like the latter. Further, the
provides more information by quantify the amount of variation in the effect sizes between
included studies.
Recent work by Borenstein, Hedges, and Rothstein (2007) have argued that the practice of
starting with a fixed effects model and making a decision on whether to use a random effects
model based on the results of the homogeneity analysis should be discourage (see page 30 for
the reason). For this reason, we state our assumption about the anticipate variability in effect
sizes in the previous section, and based on that assumption use the results from the random
effect model for addressing the review questions and subsequent moderator analysis, when
appropriate.
Publication Bias
Publication bias refers to research that appears on a topic in the published literature and is
systematically unrepresentative of the population of all studies on that topic (Rothstein, Sutton,
and Borenstein, 2005). To assess the set of studies included in our meta-analysis for publication
bias, we used the trim and fill procedure and visually inspected the resulting Funnel Plot
produced in CMA 2.0. Rosenthal’s Fail Safe N, that reports the number of unpublished studies
that would need to be identified to produce an effect size point estimate of zero, will also be
reported.5
There is emerging work by Betsey Jane Becker and colleagues that calls the F Safe N in question. We will consult
with the C2 Methods Group to determine if there is a better metric or approach for augmenting the funnel plot to
assess the degree of publication bias.
5
14
The Campbell Collaboration | www.campbellcollaboration.org
Incomplete Reporting of Study Data
When information reported in an RCT was insufficient to compute an effect size, we will attempt
to contact the author(s) to retrieve the missing data. However, in the event that the author is
unresponsive or cannot be located, the data will be coded as missing.
Traditionally, missing data for effect sizes has been addressed by a) setting the effect size to zero
which bias the variance towards zero, b) replacing the effect size with the mean effect size which
bias the average effect size estimate (see Allison 2001 and Pigott, 2001), or c) omit the study (or
at least the comparison of the two groups in the study that has the missing effect size for the
outcome). We plan to break new methodological ground by using multiple imputation. Briefly,
multiple imputation is the process of replacing missing effect sizes with a regression-based
estimate from a random draw of the posterior Bayesian distribution. Under certain assumptions
and empirical conditions, replacing missing effect sizes using multiple imputation produces
better estimate of the true average effect size (see Allison 2001 and Pigott, 2001).
Sensitivity Analyses
The sensitivity analyses will be used to evaluate how robust our meta-analytic results are to any
one study, and handling of missing data. For example, the one study removed analysis in CMA
2.2 will be used to assess how the average effect size changes with one study removed relative to
the average effect size with all studies included. All meta-analysis will be conducted by first
using the listwise deletion data with group comparisons on outcomes with missing effect sizes
imputed, and second using the multiple imputation dataset with group comparisons on
outcomes with missing effect size imputed. If the results are similar, the results based on the
imputed dataset will be reported and the listwise deleted results will be reported in the
Appendix because the former will have, depending on how severe the missing data, more
statistical power. If the results are dissimilar, the results will be reported vice versa.
Post Hoc Subgroup and Moderator Analyses
If
is moderate or large for the ―bare bones‖ meta-analysis, we will conducted a moderator
analysis to examine sources of this variation using the following study-level characteristics
o
o
o
o
Publication Type: Published vs. Unpublished.
Research Design: Experiment vs. Quasi-Experiment.
Participant characteristics: Gender and ethnicity
Curriculum characteristic: Level of Inquiry.
The usefulness of moderator analysis is predicated on 1) whether the study-level characteristics
that will be used as the moderator are reported in primary studies, and 2) how consistently they
are reported across studies. The analysis will be conducted in CMA 2.2 using One-Way ANOVA
(with study as the unit of analysis) with effect size as the dependent variable, the selected study
characteristic serving as the moderator used as the factor in the analysis, and the values of the
moderator serving as the levels. For example, a moderator analysis to determine whether
heterogeneity in effect sizes can be explained by gender will be implemented in CMA 2.2 as a
One-Way ANOVA with the effect size for the inquiry vs. traditional curricula comparison as the
dependent variable, gender as a factor with two levels (male and female), and the between group
difference in the average effect sizes across studies tested by an F test. Of course, the statistical
test must be interpreted cautiously because the n size is based on studies, rather than subjects,
and unless there are a large number of studies, the statistical test will be underpowered.
15
The Campbell Collaboration | www.campbellcollaboration.org
Another caution to use of a moderator analysis is the elevated probability of the Type 1 error
(wrongly concluding there is a difference where there is not) from conducting multiple One-Way
ANOVA’s without controlling for this multiplicity. One way to control for this is through metaregression. Currently, CMA 2.2 allows for meta-regression with moderator measured on a
continuous scale (i.e., interval or interval-ratio). However, if necessary, we will consider using
SAS or SPSS to conduct the meta-regression, which is slightly more complicated. We will, of
course, evaluate whether the benefits of simultaneous statistical controls outweigh the cost of
additional complexity in implementing meta-regression with these statistical programs.
SOURCES OF SUPPORT
The review is directly supported by funding from the Kauffman Foundation through the
Campbell Collaboration.
AUTHOR(S) REVIEW TEA M
Lead reviewer:
Name: Louis S. Nadelson
Title: Assistant Professor, Curriculum, Instruction, Foundational Studies
Affiliation: College of Education, Boise State University
Address: 910 University Dr.
City, State, Province or County: Boise, ID
Postal Code: 83725
Country: USA
Phone: (208) 426-2856
Mobile:
Email: [email protected]
Co-author(s):
Name: Susan Williams
Title:
Affiliation: Metiri Group
Address: 600 Corporate Pointe, Suite 1180
City, State, Province or County: Culver City, CA
Postal Code: 90230
Country: USA
Phone: (310) 945-5150
Mobile: (615) 364-7787
Email: [email protected]
Name: Herbert M. Turner, III
Title:
Affiliation: ANALYTICA, Inc.
Address: 35 Goldfinch Circle
City, State, Province or County: Phoenixville, PA
Postal Code: 19460
Phone: (610) 933-1005
Email: [email protected]
16
The Campbell Collaboration | www.campbellcollaboration.org
ROLES AND RESP ONSIBL IITIES
•
Content:
Louis S. Nadelson and Susan M. Williams are the co-principal investigators of this project. They
will develop the protocol, conduct the information retrieval, code all studies and collaborate
with Herbert Turner and Cheryl Lemke to draft the final report.
•
Systematic review methods:
Herbert M. Turner, will be evaluating the validity of the systematic review method protocol
using the Campbell Collaboration documents as a guideline.
•
Statistical analysis:
Herbert Turner is responsible for methodological reviews of the study and will be guiding the
statistical examinations of our meta-analysis, should there be sufficient studies to warrant such
an analysis.
•
Information retrieval:
Louis Nadelson, Susan Williams, Herbert Turner, and Cheryl Lemke will all be involved in the
approval of the information retrieval procedure. Louis Nadelson and Susan Williams will be
responsible for conducting the actual information retrieval.
REFERENCES
Anderson, R. D. (2002). Reforming science teaching: What research says about inquiry. Journal
of Science Teacher Education, 13(1), 1-12.
Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta-analysis, fixed effect vs. random
effects. (Available from Biostat, Englewood, NJ)
Bredderman, T. (1983). Effects of activity-based elementary science on student outcomes: A
quantitative synthesis. Review of Educational Research, 53(4), 499-518.
Bruer, J. T. (1993). Schools for thought. Cambridge, MA: MIT Press.
Guzzetti, B. J., Snyder, T. E., Glass, G. V., & Gamas, W. S. (1993). Promoting conceptual change
in science: A comparative meta-analysis of instructional interventions from reading
education and science education. Reading Research Quarterly, 28(2), 116-159.
Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related
estimators. Journal of Educational Statistics, 6, 107-128.
Hunter, J.E., & Schmidt, F.L. (2004). Methods of meta-analysis: Correcting error and bias in
research findings (2nd Edition). Thousand Oaks, CA: Sage.
Kesidou, S., & Roseman, J. E. (2002). How well do middle school science programs measure up?
Findings from project 2061's curriculum review. Journal of Research in Science
Teaching, 39(6), 522-549.
Lavenberg, J. G. (2007). Effects of school-based cognitive-behavioral anger interventions: A
meta-analysis. Ph.D. dissertation, University of Pennsylvania, United States -Pennsylvania. Retrieved September 19, 2008, from Dissertations & Theses: Full Text
database. (Publication No. AAT 3271850).
Lee, H., & Songer, N. B. (2003). Making authentic science accessible to students. International
Journal of Science Education, 25(8), 923-948.
17
The Campbell Collaboration | www.campbellcollaboration.org
Lipsey, M.W., and Wilson, D.B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
Lott, G. W. (1983). The effect of inquiry teaching and advance organizers upon student
outcomes in science education. Journal of Research in Science Teaching, 20(5), 437-451.
National Center for Education Statistics. (2001). Science: The nation's report card 2000.
Washington, D.C.: National Center for Education.
National Research Council. (1996). National science education standards. Washington, DC:
National Academy Press.
National Science Resources Center. (1998). Resources for teaching middle school science.
Washington, D.C.: National Academy Press.
Pearson, P.D., Ferdig, R.E., Blomeyer Jr, R., & Moran, J. (2005). The effects of technology on
reading performances in the middle-school grades: A meta-analysis with
recommendations for policy. Learning Point Associates.
Posner, G., Strike, K., Hewson, P., & Gertzog, W. (1982). Accommodation of a scientific
conception: Toward a theory of conceptual change. Science Education, 66, 221-227.
Raudenbush, S.W., & Bryk, A.S. (2002). Hieararchical linear models: Applications and data
analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Roth, W.-M., & Roychoudhury, A. (1994). Physics students' epistemologies and views about
knowing and learning. Journal of Research in Science Teaching, 31(1), 5-30.
Sandoval, W. A., & Millwood, K. A. (2007). What can argumentation tell us about epistemology?
In S. Erduran & M. P. Jiménez-Aleixandre (Eds.), Arguemntation in science education:
Perspectives from classroom-based research (pp. 68-85): Springer.
Schwab, J. J. (1962). The teaching of science as enquiry. In J. J. Schwab & P. Brandwein (Eds.),
The teaching of science. Cambridge, MA: Harvard University Press.
Shymansky, J. A., Hedges, L. V., & Woodworth, G. (1990). A re-assessment of the effects of
inquiry-based science curricula of the sixties on student achievement. Journal of
Research in Science Teaching, 27(2), 127-144.
Shymansky, J. A., Kyle, W. C., & Alport, J. M. (1983). The effects of new science curricula on
student performance. Journal of Research in Science Teaching, 20, 387-404.
Tobin, K., Tippins, D. J., & Hook, K. S. (1995). Students' beliefs about epistemology, science, and
classroom learning: A question of fit. In S. M. Glynn & R. Duit (Eds.), Learning science
in the schools: Research reforming practice (pp. 85-110). Mahwah, NJ: Erlbaum.
U.S. Commission on National Security in the Twenty-First Century. (2001). Road map for
national security: Imperative for change. o. Document Number)
United States Department of Education. (2006). The facts about science achievement
(Publication. Retrieved October 6, 2006:
http://www.ed.gov/nclb/methods/science/science.html
Weinstein, T., Boulanger, F. D., & Walberg, H. J. (1982). Science curriculum effects in high
school: A quantitative synthesis. Journal of Research in Science Teaching, 19(6), 511522.
Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various science teaching
strategies on achievement. Journal of Research in Science Teaching, 20(5), 419-435.
18
The Campbell Collaboration | www.campbellcollaboration.org
APPENDIX
Influence of Inquiry-Based Science Interventions on Middle School Students’
Cognitive, Behavioral, and Affective Outcomes:
A Campbell Collaboration Systematic Review
1.1.1.1 Education Review Group
PROPOSED: INQUIRY SCIENCE INSTRUCTION CODING MANUAL
 September 20, 2008
Reviewers
Louis S. Nadelson
Curriculum, Instruction, Foundational Studies
College of Education
Boise State University
910 University Drive
Boise, Idaho 83725
(208) 426-2856
[email protected]
Susan Williams
Metiri Group
600 Corporate Pointe
Suite 1180
Culver City, CA 90230
Office: 310.945.5150
Mobile: 615-364-7787
[email protected]
Methodolgical Consultant:
Herbert M. Turner, III
ANALYTICA, Inc.
35 Goldfinch Circle
Phoenixville, PA 19460
610.933.1005
[email protected]
19
The Campbell Collaboration | www.campbellcollaboration.org
ELIGIBILITY CRITERIA
To be eligible, interventions must meet the criteria that define inquiry science
instruction. Additional instruction may be included, but, at a minimum, all
instruction must include the following:
• A scientifically oriented question(s)
• A methodology for gathering empirical evidence or data that can be used to address the
question(s), and
• An explanation or interpretation of the evidence in the context of the research question.
Setting and participants. The intervention must have been carried out with students in science
classes in grades 5,6,7, or 8 in public, private, parochial, or alternative schools. Studies that
included students in grades other than grades 5-8 will be included if they report results
separately for each grade so that the target grades can be separated out. The intervention must
have taken place in the school setting during the traditional academic schedule (i.e., Monday
through Friday, during regularly scheduled school year). Interventions that took place in an
after school program will not be considered. Studies with special needs or gifted students will be
included in the review if they were in immersion programs that routinely placed them in the
regular classrooms during science lessons or if they are being compared to an equivalent group
of students, and the treatment curriculum was consistent and reproducible and not modified to
meet individual needs as required in the conditions of IEPs.
Study design Reviewers will evaluate research studies that report the impact of a well-defined
intervention by distinguishing between treatment and comparison groups, using a randomized,
controlled trial design or a quasi-experimental design (specifically, the non-equivalent control
group design). If a quasi-experimental design was employed, participants must have been
matched on relevant characteristics prior to the delivery of the intervention, or statistically
equated, in order to ensure that groups were as similar as possible at baseline. Comparison
groups can represent no treatment, ―treatment as usual‖ conditions, or a different level of
inquiry than the identified treatment condition.
Types of outcomes included. The outcome measures of at least one academic cognitive student
outcome must be reported (e.g., academic achievement, critical thinking/transfer,). Depending
on availability, the reviewers also expect to report on cognitive engagement and/or affective
(e.g., student interest and/or emotional engagement) and behavioral (e.g., time on task,
homework completion, behavioral engagement) outcomes. All measures must be quantitative in
nature, and report reliability statistics.
Outcome measures using standardized measures of achievement with validity and reliability will
be included for the review (e.g., National Assessment of Educational Progress, Partnership for
the Assessment of Standards-based Science, Cornell Critical Thinking Test). Unstandardized
measures of achievement, and other domains, with adequate face validity will also be included
(e.g., student interest inventories, self-report measures for affective and cognitive engagement,
and interest). Measures reported in the Buros Institute of Mental Measures Yearbook, published
by the University of Nebraska Press, will be considered standardized measures; all others will be
considered unstandardized measures.
Dates of publication. The date of publication of the study must be 1990 or later, although the
research itself may have been conducted prior to 1990. All studies up through September 2008
will be considered for inclusion.
20
The Campbell Collaboration | www.campbellcollaboration.org
STUDY IDENTIFIERS
Each study will be assigned a three digit code identifying code {STUDYID]. A study is
defined as a research investigation in which two or more groups are compared with each
other and includes the treatment, materials, measures, analyses, and results. If there is
more than one paper describing the same study, the papers will be combined and coded
as a single study. Each paper within the study will also have an identifying code with the
first three digits representing the study and the last two digits representing the paper,
e.g., 123.02 designating study #123 and paper #2 describing this study.
Each study will be coded independently by two trained reviewers.
Each reviewer will create a separate coding form.
Each Reviewer will be assigned a PIN (or may choose to use their initials) and this PIN
will be entered for every investigation they code. [ReviewerID]
The date that coding began will be entered for each study.
The full citation for the study will be entered using APA style.
SCREENING FORMS
Phase I Screening Form: Topic, Setting, Population, and General Design
Relevance
(1) How was this study identified?
b.
c.
d.
e.
f.
g.
h.
i.
j.
a. Electronic search of online database
Bibliography of relevant study
Cited in meta-analysis or systematic review
Hand-search of journal
Search of conference proceedings
Web search (e.g., Google)
Organizational website
Contacting Expert in the Field
Contacting a Referred Author/Researcher
Other:
_________________________________
(2) Does this study address inquiry science?
To be eligible, at least one group in each study must meet the criteria that
define inquiry science instruction. Additional instruction may be included,
but, at a minimum, all instruction must include the following:
• A scientifically oriented question(s)
• A methodology for gathering empirical evidence or data that can be used to address
the question(s), and
• An explanation or interpretation of the evidence in the context of the research
question.
__ Author identified
__ Screener identified
YES
UNSURE
NO (STOP/exclude)
(3) Does this study take place in a school setting?
Setting and participants. The intervention must have been carried out with students in
science classes in public, private, parochial, or alternative schools. The intervention must
have taken place in the school setting during the traditional academic schedule (i.e.,
Monday through Friday). Interventions that took place in an after school program will
not be considered. Studies with special needs or gifted students will be included in the
review if they were in immersion programs that routinely placed them in the regular
classrooms during science lessons or if they are being compared to an equivalent group
of special needs or gifted students. The curriculum for these students must not be
individualized as is frequently required by an IEP, but generalized and reproducible,
YES
UNSURE
NO (STOP/exclude)
(4) Are the participants grade 5, 6, 7, or 8 students?
Studies that included students in grades other than grades 5-8 will be included if they
report results separately for each grade so that the target grades can be separated out.
YES
UNSURE
NO (STOP/exclude)
(5) Are there at least two groups of participants?
Reviewers will evaluate research studies that report the impact of a well-defined
intervention by distinguishing between treatment and comparison groups, using a
randomized, controlled trial design or a quasi-experimental design (specifically, the nonequivalent control group design). If a quasi-experimental design was employed,
participants must have been matched on relevant characteristics prior to the delivery of
the intervention, or statistically equated, in order to ensure that groups were as similar
as possible at baseline. Comparison groups can represent no treatment, ―treatment as
usual‖, or a different level of inquiry than the identified treatment condition.
YES UNSURE
NO (STOP/exclude)
(6) Is the date of publication of the paper(s) describing this study 1990 or later?
YES
UNSURE
NO (STOP/exclude)
Comments:
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
____________________________________________
Decision:
__ Exclude
__ Include/progress to Phase II screening
Phase II Screening Form: Research Design, Intervention and Outcome
Relevance
As in Phase I, each study will be screened by two reviewers working independently. Each
reviewer will create a form that will be identified by study # and the reviewer’s code.
•
Study ID number [STUDYID]*______________
Reviewers initials or PIN* [ReviewerID]___________
•
Date Phase II review was begun: ___________
•
First author * ________________________
•
Year* ________________________
(1) Research design:
__ Randomized controlled trial (individuals randomly assigned to condition)
__ Group randomized controlled trial
__ Quasi-experimental design
__ Single group pretest/posttest design (Stop/exclude)
__ Case study (Stop/exclude)
__ Correlational or ex post facto design (Stop/exclude)
__ Other: _____________________________________(Stop/exclude)
__ Unsure/unable to determine
Location of identifying information:
__ Title page:___________
__ Abstract
page:____________
__ Method section
page:____________
(2) Is this an inquiry science intervention?
YES
UNSURE
NO (Stop/Exclude)
Page:_______________
__ Author identified
__ Screener identified
Comment:
_______________________________________________________
_______________________________________________________
_______________________________________________________
_________________________________
(3) Do the comparison and treatment groups receive conceptually different
interventions? (Comparison groups can represent no treatment, ―treatment as usual‖,
or a different level of inquiry than the treatment condition.)
YES
UNSURE
NO (Stop/Exclude)
Page:________________
Comment:
_______________________________________________________
_______________________________________________________
_______________________________________________________
_________________________________
(4) Indicate below whether cognitive outcomes (academic achievement or
critical thinking/transfer) for students are measured.
YES
UNSURE
NO (Stop/Exclude)
Page:________________
Comment:
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
______________________________________
(5) Is a standardized measure used to record the outcome?
(a) Academic Achievement?
NO
(b) Critical thinking/transfer?
NO
YES (Page #
)
UNSURE
YES (Page #
)
UNSURE
Comment:
_______________________________________________________
_______________________________________________________
_______________________________________________________
_________________________________
(6) Does this study report outcomes such that an effect size may be
calculated?
YES
UNSURE
NO (Stop/Exclude) Page: _____________
Comment:
_______________________________________________________
_______________________________________________________
_______________________________________________________
_________________________________
Additional Comments
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_________________________________
Decision:
____ Exclude
____ Include/Continue to Coding Phase
Phase III Coding
CODING ELEMENTS FOR INQUIRY SCIENCE STUDIES
Coding:
 Reviewers initials or code___________
 Date Phase III review was begun_____________
 Study ID number [STUDYID] ______________
Study Information
 Author(s)* (Report last name, first; e.g., Doe, John).
o _________________________________________

Year of Study (Report year of study; e.g., 2000).
o _____________

Publication Features
1. Juried journal
2. non-juried journal
3. Doctoral dissertation
4. Conference proceedings
5. Book
6. Government Report
7. Organizational Report
8. Other ______________________________________
Methodological Characteristics
 Type of Design
1. Quasi-experimental/nonrandomized pre-post control group
2. Quasi-experimental time series
3. Randomized post-test only control group
4. Randomized pre-post control group
5. Other
26

Number of Comparisons Within Study (Report number; e.g., 1 or 2 or 3).
o _______________________________

Unit of Analysis
1. Unspecified
2. Individual
3. Class
4. School
5. District
6. State
7. Mixed

Does the unit of analysis match the unit of assignment?
1. Yes
2. No
9. Cannot determine
The Campbell Collaboration | www.campbellcollaboration.org

Method of Assignment:
1. Unspecified
2. Random, with or without matching
3. Non-random, matched ONLY on student characteristics or demographics
4. Non-random matched ONLY on pretest measures of outcome variables
5. Non-random matched on both of above
6. Other
TREATMENT
Demographic Information for treatment group
27

Student Sample Size (Report actual sample size; e.g., 3086). ––––––––––––––––

School Sample Size (Report actual sample size; e.g., 73). –––––––––––––––––––

Student Sex (Check all that apply and provide numbers or percentages if available)
1. Males _____
2. Females _____
3. Mixed or not specified_____

Grade Level (Check all that apply and provide numbers or percentages if available)
1. 5th grade_____
2. 6th grade; _____
3. 7th grade _____
4. 8th grade _____

Students' Ethnicity; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Black _____
3. Hispanic_____
4. Asian_____
5. White_____
6. Mixed_____
7. Other_____

Students' Socioeconomic Status (Check all that apply and provide numbers or
percentages if available)
1. Unspecified_____
2. Lower _____
3. Lower middle_____
4. Middle _____
5. Upper middle_____
6. Upper _____
7. Mixed_____

Country;
1. Unspecified
2. USA
The Campbell Collaboration | www.campbellcollaboration.org
3.
4.
5.
6.
7.
8.
9.
Canada
Mexico/Latin America
Europe
Asia
South America
Cross-Cultural
Other _________________________________

Geographical Region in USA; Check all that apply and provide numbers or percentages
if available)
1. Northeast _____
2. Southeast _____
3. Midwest _____
4. South Central _____
5. Southwest _____
6. Northwest _____
7. Other_____

School Type; Check all that apply and provide numbers or percentages if available)
1. Unspecified_____
2. Public _____
3. Private _____
4. Special school_____
5. Other_____

Community Type; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Urban_____
3. Rural_____
4. Suburban_____
5. Other_____
Instructional Characteristics for treatment group

Number of Inquiry Science Sessions (Unspecified = 0; List number of sessions [e.g., 12]).
o ________________

Duration of Inquiry Science Sessions (Unspecified = 0; List number of average minutes
per sessions [e.g., 40]).
o ________________
Duration of Study (Unspecified = 00; List the number of months that the
implementation of the curriculum occurred).
o _________________


28
Percentage of inquiry during research/instructional period;
1. Unspecified
2. 1-25%
3. 26%-50%
4. 51%-75%
5. 76% - 99%
The Campbell Collaboration | www.campbellcollaboration.org
6. 100%
7. Other

Learning Responsibility:
1. Level 0
2. Level 1
3. Level 2
4. Level 3
5. Unidentifiable

Type of Learning Task(Inquiry group);
1. Unspecified
2. Basic skills/factual learning
3. Problem solving
4. Investigation
5. Project-based
6. Mixed types
7. Other

Level of Technology
1. Unspecified
2. Supplementary/Digital Content
3. Primary/Digital Content
4. Regular Digital Productivity requirements
5. Other

Feedback and Assessment Practices (Teacher/Student)
1. Unspecified
2. No feedback
3. Minimal feedback
4. Elaborate feedback
5. Other
Instructional/Teaching Characteristics for treatment group
• Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring
student collaboration to accomplish a joint product; monitors and supports students
collaboration in positive ways.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Contextualization/Making Meaning (e.g., Begins activities with what students already
know from home, community, and school; encourages students to use content vocabulary
to express their understanding.)
1. No evidence
2. Some evidence
3. Extensive evidence
29
The Campbell Collaboration | www.campbellcollaboration.org
• Challenging Activities (e.g., Designs instructional tasks that advance students'
understanding to more complex levels. Assures that students—for each instructional topic—
see the whole picture as a basis for understanding the parts.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Instructional Conversation: (e,g., Arranges the classroom to accommodate conversation
between the teacher and a small group of students on a regular and frequent basis.
Guides conversation to include students' views, judgments, and rationales using text
evidence and other substantive support.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Mode of Instruction
1. Unspecified
2. Whole-group instruction
3. Paired
4. Small-group instruction [3-5 members]
5. Individualized
6. Mixed
7. Other
• Basis of grouping
1. Teacher selected purposeful
2. Teacher selected random
3. Student selected
4. No grouping
• Role of Teacher
1. Unspecified
2. Deliverer of knowledge
3. Facilitator of groups/student learning
4. Modeling processes [e.g., problem solving]
5. Mixed
6. Other
Experience and training context
30

Reported Teachers' Experience with Inquiry Science:
1. Unspecified
2. None
3. Some
4. Experienced

Reported Students' Experience with Inquiry Science
1. Unspecified
2. None
3. Some
4. Experienced
The Campbell Collaboration | www.campbellcollaboration.org

Teacher Training in Inquiry Science (Unspecified = 0; List hours of training (e.g., 15).
o ________________
•
1.
2.
3.
4.
5.
Teacher Qualifications
Unspecified
Alternatively certified or provisional certificate
Certified in content area
Not certified in content area
Other
Policy related to Inquiry Science
 Level of Enacted Policy
1. Unspecified
2. School
3. District
4. State
5. Federal
6. Other

Focus
1.
2.
3.
4.
5.
Unspecified
Reducing achievement gaps
Increased use of inquiry science
Increased critical thinking
Other
COMPARISON #1

Treatment for 1st Comparison Condition;
1. Unspecified
2. Student receives nothing/treatment as usual
3. Teacher receives equivalent PD/Treatment as usual
4. Alternate Treatment (Specify)
Demographic Information for comparison group #1
31

Student Sample Size (Report actual sample size; e.g., 3086). ––––––––––––––––

School Sample Size (Report actual sample size; e.g., 73). –––––––––––––––––––

Student Sex (Check all that apply and provide numbers or percentages if available)
1. Males _____
2. Females _____
3. Mixed or not specified_____

Grade Level (Check all that apply and provide numbers or percentages if available)
The Campbell Collaboration | www.campbellcollaboration.org
1.
2.
3.
4.
32
5th grade_____
6th grade____
7th grade; _____
8th grade _____

Students' Ethnicity; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Black _____
3. Hispanic_____
4. Asian_____
5. White_____
6. Mixed_____
7. Other_____

Students' Socioeconomic Status (Check all that apply and provide numbers or
percentages if available)
1. Unspecified_____
2. Lower _____
3. Lower middle_____
4. Middle _____
5. Upper middle_____
6. Upper _____
7. Mixed_____

Country;
1. Unspecified_____
2. USA_____
3. Canada_____
4. Mexico/Latin America_____
5. Europe_____
6. Asia_____
7. South America_____
8. Cross-Cultural_____
9. Other_____

Geographical Region in USA; Check all that apply and provide numbers or percentages
if available)
1. Northeast _____
2. Southeast _____
3. Midwest
4. South Central _____
5. Southwest _____
6. Northwest _____
7. Other_____

School Type; Check all that apply and provide numbers or percentages if available)
1. Unspecified_____
2. Public _____
3. Private _____
4. Special school_____
The Campbell Collaboration | www.campbellcollaboration.org
5. Other_____

Community Type; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Urban_____
3. Rural_____
4. Suburban_____
5. Other_____
Instructional Characteristics for comparison group #1

Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of
sessions [e.g., 12]).
o ________________

Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of
average minutes per sessions [e.g., 40]).
o ________________
Duration of Study (Unspecified = 00; List the number of months that the
implementation of the comparison curriculum occurred).
o _________________

33

Percentage of comparison science instruction during research/instructional period;
1. Unspecified
2. 1-25%
3. 26%-50%
4. 51%-75%
5. 76% - 99%
6. 100%
7. Other

Learning Responsibility:
1. Level 0
2. Level 1
3. Level 2
4. Level 3
5. Unidentifiable

Type of Learning Task(Comparison Instruction group);
1. Unspecified
2. Basic skills/factual learning
3. Problem solving
4. Investigation
5. Project-based
6. Mixed types
7. Other

Level of Technology
1. Unspecified
The Campbell Collaboration | www.campbellcollaboration.org
2.
3.
4.
5.

Supplementary/Digital Content
Primary/Digital Content
Regular Digital Productivity requirements
Other
Feedback and Assessment Practices (Teacher/Student)
1. Unspecified
2. No feedback
3. Minimal feedback
4. Elaborate feedback
5. Other
Instructional/Teaching Characteristics
• Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring
student collaboration to accomplish a joint product; monitors and supports students
collaboration in positive ways.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Contextualization/Making Meaning (e.g., Begins activities with what students already
know from home, community, and school; encourages students to use content vocabulary
to express their understanding.)
1. No evidence
2. Some evidence
3. Extensive evidence
•
Challenging Activities (e.g., Designs instructional tasks that advance students'
understanding to more complex levels. Assures that students—for each instructional topic—
see the whole picture as a basis for understanding the parts.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Instructional Conversation: (e,g, Arranges the classroom to accommodate conversation
between the teacher and a small group of students on a regular and frequent basis.
Guides conversation to include students' views, judgments, and rationales using text
evidence and other substantive support.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Mode of Instruction
1. Unspecified
2. Whole-group instruction
3. Paired
4. Small-group instruction [3-5 members]
5. Individualized
6. Mixed
7. Other
34
The Campbell Collaboration | www.campbellcollaboration.org
• Basis of grouping
1. Teacher selected purposeful
2. Teacher selected random
3. Student selected
4. No grouping
• Role of Teacher
1. Unspecified
2. Deliverer of knowledge
3. Facilitator of groups/student learning
4. Modeling processes [e.g., problem solving]
5. Mixed
6. Other
Experience and training context for comparison group #1

Reported Teachers' Experience with Comparison Science Instruction:
1. Unspecified
2. None
3. Some
4. Experienced

Reported Students' Experience with Comparison Science Instruction
1. Unspecified
2. None
3. Some
4. Experienced

Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of
training (e.g., 15).
o ________________
• Teacher Qualifications
1. Unspecified
2. Alternatively certified or provisional certificate
3. Certified in content area
4. Not certified in content area
5. Other
Policy related to Comparison Science Instruction
 Level of Enacted Policy
1. Unspecified
2. School
3. District
4. State
5. Federal
6. Other

35
Focus
1. Unspecified
The Campbell Collaboration | www.campbellcollaboration.org
2.
3.
4.
5.
Reducing achievement gaps
Increased use of inquiry science
Increased critical thinking
Other
COMPARISON #2

Treatment 2st Comparison Condition;
1. Unspecified
2. Student receives nothing/treatment as usual
3. Teacher receives equivalent PD/Treatment as usual
4. Alternate Treatment (Specify)
Demographic Information for comparison group #2
36

Student Sample Size (Report actual sample size; e.g., 3086). ––––––––––––––––

School Sample Size (Report actual sample size; e.g., 73). –––––––––––––––––––

Student Sex (Check all that apply and provide numbers or percentages if available)
1. Males _____
2. Females _____
3. Mixed or not specified_____

Grade Level (Check all that apply and provide numbers or percentages if available)
1. 5th grade_____
2. 6th grade_____
3. 7th grade; _____
4. 8th grade _____

Students' Ethnicity; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Black _____
3. Hispanic_____
4. Asian_____
5. White_____
6. Mixed_____
7. Other_____

Students' Socioeconomic Status (Check all that apply and provide numbers or
percentages if available)
1. Unspecified_____
2. Lower _____
3. Lower middle_____
4. Middle _____
5. Upper middle_____
6. Upper _____
7. Mixed_____
The Campbell Collaboration | www.campbellcollaboration.org

Country;
1. Unspecified
2. USA
3. Canada
4. Mexico/Latin America
5. Europe
6. Asia
7. South America
8. Cross-Cultural
9. Other

Geographical Region in USA; Check all that apply and provide numbers or percentages
if available)
1. Northeast _____
2. Southeast _____
3. Midwest
4. South Central _____
5. Southwest _____
6. Northwest _____
7. Other_____

School Type; Check all that apply and provide numbers or percentages if available)
1. Unspecified_____
2. Public _____
3. Private _____
4. Special school_____
5. Other_____

Community Type; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Urban_____
3. Rural_____
4. Suburban_____
5. Other_____
Instructional Characteristics for comparison group #2

Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of
sessions [e.g., 12]).
o ________________

Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of
average minutes per sessions [e.g., 40]).
o ________________
Duration of Study (Unspecified = 00; List the number of months that the
implementation of the comparison curriculum occurred).
o _________________


37
Percentage of comparison science instruction during research/instructional period;
The Campbell Collaboration | www.campbellcollaboration.org
1.
2.
3.
4.
5.
6.
7.
Unspecified
1-25%
26%-50%
51%-75%
76% - 99%
100%
Other

Learning Responsibility:
1. Level 0
2. Level 1
3. Level 2
4. Level 3
5. Unidentifiable

Type of Learning Task(Comparison Instruction group);
1. Unspecified
2. Basic skills/factual learning
3. Problem solving
4. Investigation
5. Project-based
6. Mixed types
7. Other

Level of Technology
1. Unspecified
2. Supplementary/Digital Content
3. Primary/Digital Content
4. Regular Digital Productivity requirements
5. Other

Feedback and Assessment Practices (Teacher/Student)
1. Unspecified
2. No feedback
3. Minimal feedback
4. Elaborate feedback
5. Other
Instructional/Teaching Characteristics for comparison group #2
• Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring
student collaboration to accomplish a joint product; monitors and supports students
collaboration in positive ways.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Contextualization/Making Meaning (e.g., Begins activities with what students already
know from home, community, and school; encourages students to use content vocabulary
to express their understanding.)
38
The Campbell Collaboration | www.campbellcollaboration.org
1. No evidence
2. Some evidence
3. Extensive evidence
•
Challenging Activities (e.g., Designs instructional tasks that advance students'
understanding to more complex levels. Assures that students—for each instructional topic—
see the whole picture as a basis for understanding the parts.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Instructional Conversation: (e,g., Arranges the classroom to accommodate conversation
between the teacher and a small group of students on a regular and frequent basis.
Guides conversation to include students' views, judgments, and rationales using text
evidence and other substantive support.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Mode of Instruction
1. Unspecified
2. Whole-group instruction
3. Paired
4. Small-group instruction [3-5 members]
5. Individualized
6. Mixed
7. Other
• Basis of grouping
1. Teacher selected purposeful
2. Teacher selected random
3. Student selected
4. No grouping
• Role of Teacher
1. Unspecified
2. Deliverer of knowledge
3. Facilitator of groups/student learning
4. Modeling processes [e.g., problem solving]
5. Mixed
6. Other
Experience and training context for comparison group #2
39

Reported Teachers' Experience with Comparison Science Instruction:
1. Unspecified
2. None
3. Some
4. Experienced

Reported Students' Experience with Comparison Science Instruction
The Campbell Collaboration | www.campbellcollaboration.org
1.
2.
3.
4.

Unspecified
None
Some
Experienced
Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of
training (e.g., 15).
o ________________
• Teacher Qualifications
1. Unspecified
2. Alternatively certified or provisional certificate
3. Certified in content area
4. Not certified in content area
5. Other
Policy related to Comparison Science Instruction for comparison group #2
 Level of Enacted Policy
1. Unspecified
2. School
3. District
4. State
5. Federal
6. Other

Focus
1.
2.
3.
4.
5.
Unspecified
Reducing achievement gaps
Increased use of inquiry science
Increased critical thinking
Other
COMPARISON GROUP #3

Treatment 3rdt Comparison Condition;
1. Unspecified
2. Student receives nothing/treatment as usual
3. Teacher receives equivalent PD/Treatment as usual
4. Alternate Treatment (Specify)
___________________________________________________
___________________________________________________
Demographic Information for comparison group #3
40

Student Sample Size (Report actual sample size; e.g., 3086). ––––––––––––––––

School Sample Size (Report actual sample size; e.g., 73). –––––––––––––––––––

Student Sex (Check all that apply and provide numbers or percentages if available)
1. Males _____
2. Females _____
The Campbell Collaboration | www.campbellcollaboration.org
3. Mixed or not specified_____
41

Grade Level (Check all that apply and provide numbers or percentages if available)
1. 5th grade_____
2. 6th grade_____
3. 7th grade; _____
4. 8th grade _____

Students' Ethnicity; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Black _____
3. Hispanic_____
4. Asian_____
5. White_____
6. Mixed_____
7. Other_____

Students' Socioeconomic Status (Check all that apply and provide numbers or
percentages if available)
1. Unspecified_____
2. Lower _____
3. Lower middle_____
4. Middle _____
5. Upper middle_____
6. Upper _____
7. Mixed_____

Country;
1. Unspecified
2. USA
3. Canada
4. Mexico/Latin America
5. Europe
6. Asia
7. South America
8. Cross-Cultural
9. Other

Geographical Region in USA; Check all that apply and provide numbers or percentages
if available)
1. Northeast _____
2. Southeast _____
3. Midwest
4. South Central _____
5. Southwest _____
6. Northwest _____
7. Other_____

School Type; Check all that apply and provide numbers or percentages if available)
1. Unspecified_____
The Campbell Collaboration | www.campbellcollaboration.org
2.
3.
4.
5.

Public _____
Private _____
Special school_____
Other_____
Community Type; Check all that apply and provide numbers or percentages if
available)
1. Unspecified_____
2. Urban_____
3. Rural_____
4. Suburban_____
5. Other_____
Instructional Characteristics for comparison group #3

Number of Comparison Instruction Science Sessions (Unspecified = 0; List number of
sessions [e.g., 12]).
o ________________

Duration of Comparison Instruction Science Sessions (Unspecified = 0; List number of
average minutes per sessions [e.g., 40]).
o ________________
Duration of Study (Unspecified = 00; List the number of months that the
implementation of the comparison curriculum occurred).
o _________________

42

Percentage of comparison science instruction during research/instructional period;
1. Unspecified
2. 1-25%
3. 26%-50%
4. 51%-75%
5. 76% - 99%
6. 100%
7. Other

Learning Responsibility:
1. Level 0
2. Level 1
3. Level 2
4. Level 3
5. Unidentifiable

Type of Learning Task(Comparison Instruction group);
1. Unspecified
2. Basic skills/factual learning
3. Problem solving
4. Investigation
5. Project-based
6. Mixed types
7. Other
The Campbell Collaboration | www.campbellcollaboration.org

Level of Technology
1. Unspecified
2. Supplementary/Digital Content
3. Primary/Digital Content
4. Regular Digital Productivity requirements
5. Other

Feedback and Assessment Practices (Teacher/Student)
1. Unspecified
2. No feedback
3. Minimal feedback
4. Elaborate feedback
5. Other
Instructional/Teaching Characteristics for comparison group #3
• Joint Productive Activity/Collaboration (e.g., Designs instructional activities requiring
student collaboration to accomplish a joint product; monitors and supports students
collaboration in positive ways.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Contextualization/Making Meaning (e.g., Begins activities with what students already
know from home, community, and school; encourages students to use content vocabulary
to express their understanding.)
1. No evidence
2. Some evidence
3. Extensive evidence
•
Challenging Activities (e.g., Designs instructional tasks that advance students'
understanding to more complex levels. Assures that students—for each instructional topic—
see the whole picture as a basis for understanding the parts.)
1. No evidence
2. Some evidence
3. Extensive evidence
• Instructional Conversation: (e,g, Arranges the classroom to accommodate conversation
between the teacher and a small group of students on a regular and frequent basis.
Guides conversation to include students' views, judgments, and rationales using text
evidence and other substantive support.)
1. No evidence
2. Some evidence
3. Extensive evidence
•
1.
2.
3.
4.
43
Mode of Instruction
Unspecified
Whole-group instruction
Paired
Small-group instruction [3-5 members]
The Campbell Collaboration | www.campbellcollaboration.org
5. Individualized
6. Mixed
7. Other
•
1.
2.
3.
4.
Basis of grouping
Teacher selected purposeful
Teacher selected random
Student selected
No grouping
•
1.
2.
3.
4.
5.
6.
Role of Teacher
Unspecified
Deliverer of knowledge
Facilitator of groups/student learning
Modeling processes [e.g., problem solving]
Mixed
Other
Experience and training context for comparison group #3

Reported Teachers' Experience with Comparison Science Instruction:
1. Unspecified
2. None
3. Some
4. Experienced

Reported Students' Experience with Comparison Science Instruction
1. Unspecified
2. None
3. Some
4. Experienced

Teacher Training in Comparison Science Instruction (Unspecified = 0; List hours of
training (e.g., 15).
o ________________
• Teacher Qualifications
1. Unspecified
2. Alternatively certified or provisional certificate
3. Certified in content area
4. Not certified in content area
5. Other
Policy related to Comparison Science Instruction #3
 Level of Enacted Policy
1. Unspecified
2. School
3. District
4. State
5. Federal
6. Other
44
The Campbell Collaboration | www.campbellcollaboration.org

Focus
1.
2.
3.
4.
5.
Unspecified
Reducing achievement gaps
Increased use of inquiry science
Increased critical thinking
Other
Outcome Measure(s)
 Cognitive Outcomes: Science Achievement;
1. Unspecified
2. Testing company standardized achievement test
3. Federal/national standardized test
4. State-level achievement test
5. District-level achievement test
6. School-level test
7. Grade-level test
8. Teacher-made test
9. Researcher-developed test
10. Authentic assessment
11. Other
45

Cognitive Outcomes: Critical Thinking in Science;
1. Unspecified
2. Testing company standardized achievement test
3. Federal/national standardized test
4. State-level achievement test
5. District-level achievement test
6. School-level test
7. Grade-level test
8. Teacher-made test
9. Researcher-developed test
10. Authentic assessment
11. Other

Affective Outcomes: Science
1. Unspecified
2. Student attitudes toward science, or instruction
3. Academic self-concept or motivation
4. Other

Behavioral Outcomes: Science
1. Unspecified
2. Student time-on-task
3. Student perseverance
4. Tasks attempted
5. Tasks completed
6. Success rate
7. Positive peer interaction
8. Interactivity with computers
The Campbell Collaboration | www.campbellcollaboration.org
9. Other
Quality of Study Indicators (Rigor of statistical design and analysis)

Method of Observation of Independent Variable:(i.e., fidelity of implementation,
data collection) Select all that apply.
1. Unspecified
2. Systematic observation
3. Informal observation
4. Student survey or interview
5. Teacher survey or interview
6. Administrator survey or interview
7. Computer logs
8. Other

Pretest Equivalency: Has the initial differences between the two groups been
accounted for?
1. Unspecified
2. Statistical Control (e.g., ANCOVA, regression)
3. Random Assignment
4. Statistical Control and Random Assignment
5. Gain Scores
6. Matching
7. Other

Reported Reliability of Measures (Unspecified = 00; Actual reliability statistic (e.g.,
70 or 83).
o ___________________
Effect size information:
Manner in Which Outcome Scores Are Reported
Outcome:
Reference Unspecified Standard
to
=0
scores =
Outcome
1
Measure
Cognitive
Cognitive
Cognitive
Affective
Affective
Affective
Behavioral
Behavioral
Behavioral
Raw
scores
= 2;
Percentile
ranks = 3
Effect size information:
46
The Campbell Collaboration | www.campbellcollaboration.org
Gain
scores
=4
Other
=5
Groups compared:
Group 1: __________________________________________
Group 2: ______________________________________________
Outcome
A:
Group 1
Pretest
Group 1
Posttest
Group 1
Follow-up
Time:
Group 2
Pretest
Group 2
Posttest
Group 2
Follow-up
Time:
Group 1
Pretest
Group 1
Posttest
Group 1
Follow-up
Time:
Group 2
Pretest
Group 2
Posttest
Group 2
Follow-up
Time:
Mean
SD
n
d (author
reported)
Other
statistics
reported:
Outcome
B:
Mean
SD
N
d (author
reported)
Other
statistics
reported:
Copy this page as needed to report on multiple comparison groups or multiple outcomes.
Potential Confounds/Sources of Invalidity
1. History: Have specific events occurred between the first and second measurement in
addition to the treatment variable?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
2. Maturation: Are there processes within the participants operating as a function of the
passage of time [e.g., growing older, more tired] that might account for changes in the
dependent measure?
1. Adequately controlled by design
2. Definite weakness of design
47
The Campbell Collaboration | www.campbellcollaboration.org
3. Possible source of concern
4. Not a relevant factor
3. Testing: Is there an effect of taking a test upon the scores of a second testing?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
4. Instrumentation: Do changes in calibration or observers' scores produce changes in
the obtained measurement?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
5. Statistical Regression: Have groups been selected on the basis of their extreme
scores?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
6. Selection Bias: Have biases resulted in the differential selection of comparison groups?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
7. Mortality: Has there been a differential loss of participants from the treatment and
comparison groups?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
8. Selection-Maturation Interaction: Is there an interaction between extraneous
factors such as history, maturation, or testing and the specific selection differences that
distinguish the treatment and comparison groups?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
9. Reactive or Interaction Effect of Testing: Does pretesting influence the
participants' responsiveness to the treatment variable, making the results for a pretested
population unrepresentative of the effects of the treatment variable for the unpretested
universe from which the participants were selected?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
48
The Campbell Collaboration | www.campbellcollaboration.org
4. Not a relevant factor
10. Interaction of Selection Biases and Treatment: Are there selective factors upon
which sampling was based which interact differentially with the treatment variable?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
11. Reactive Effects of Experimental Arrangements: Are there effects of the
experimental setting that would preclude generalizing about the effect of the
experimental variable upon persons being exposed to it in non-experimental settings?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
12. Multiple-Treatment Interference: Are there non-erasable effects of previous
treatments applied to the same participants?
1. Adequately controlled by design
2. Definite weakness of design
3. Possible source of concern
4. Not a relevant factor
13. Statistical Power: Is the sample size large enough to reject the null hypothesis at a
given level of probability, or are the estimate coefficients within reasonably small
margins of error? [A sample > 60 for groups such as classes, schools, or districts; a
sample >100 for individuals].
1. Probable threat [< 60 for groups or < 100 for individuals as the unit of analysis]
2. Adequately minimized [> 60 for groups; > 100 for individuals]
49
The Campbell Collaboration | www.campbellcollaboration.org