SCIENTIFIC REASONING SKILLS DEVELOPMENT IN THE

SCIENTIFIC REASONING SKILLS DEVELOPMENT
IN THE INTRODUCTORY BIOLOGY COURSES FOR UNDERGRADUATES
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy
in the Graduate School of The Ohio State University
By
Melissa S. Schen, M.S.
*****
The Ohio State University
2007
Dissertation Committee
Professor Anita Roychoudhury, Co-Advisor
Approved by
______________________________
Co-Advisor
Professor Arthur L. White, Co-Advisor
Professor David Haury
______________________________
Co -Advisor
Graduate Program in Education
Copyright by
Melissa S. Schen
2007
ABSTRACT
Scientific reasoning is a skill of critical importance to those students who seek to
become professional scientists. Yet, there is little research on the development of such
reasoning in science majors. In addition, scientific reasoning is often investigated as two
separate entities: hypothetico-deductive reasoning and argumentation, even though these
skills may be linked. With regard to argumentation, most investigations look at its use in
discussing socioscientific issues, not in analyzing scientific data. As scientists often use
the same argumentation skills to develop and support conclusions, this avenue needs to
be investigated. This study seeks to address these issues and establish a baseline of both
hypothetico-deductive reasoning and argumentation of scientific data of biology majors
through their engagement in introductory biology coursework.
This descriptive study investigated the development of undergraduates’ scientific
reasoning skills by assessing them multiple times throughout a two-quarter introductory
biology course sequence for majors. Participants were assessed at the beginning of the
first quarter, end of the first quarter, and end of the second quarter. A split-half version of
the revised Lawson Classroom Test of Scientific Reasoning (LCTSR) and a paper and
pencil argumentation instrument developed for this study were utilized to assess student
hypothetico-deductive reasoning and argumentation skills, respectively. To identify
ii
factors that may influence scientific reasoning development, demographic information
regarding age, gender, science coursework completed, and future plans was collected.
Evidence for course emphasis on scientific reasoning was found in lecture notes,
assignments, and laboratory exercises.
This study did not find any trends of improvement in the students’ hypotheticodeductive reasoning or argumentation skills either during the first quarter or over both
quarters. Specific difficulties in the control of variables and direct hypothetico-deductive
reasoning were found through analysis of the LCTSR data. Students were also found to
have trouble identifying and rebutting counterarguments, compared to generating initial
arguments from scientific data sets. Although no overall improvement was found, a
moderate, positive relationship was detected between LCTSR and argumentation scores
at each administration, affirming the predicted association. Lastly, no difference was
determined between biology majors and other students also enrolled in the courses.
Overall, the results found here are similar to those classified in the literature for both
hypothetico-deductive reasoning and argumentation, indicating that biology majors may
be similar to other populations studied. Also, as no explicit attention was paid to
scientific reasoning skills in the two courses, these findings complement those that
illustrate a need for direct attention to foster the development of these skills. These results
suggest the need to develop direct and explicit methods in order to improve the scientific
reasoning skills of future biological scientists early in their undergraduate years.
iii
DEDICATION
Dedicated to my grandfather
iv
ACKNOWLEDGMENTS
I wish to thank my co-advisor, Arthur White, for his steady support, patient
editing, and sharing the plethora of his statistical knowledge.
I am also grateful to my co-advisor, Anita Roychoudhury, for her constant
encouragement and guidance - modeling both good teaching and good research.
I want to thank David Haury for his continuous support and guidance as a
committee member, teacher, and section head.
I would like to thank the faculty and staff of the Introductory Biology Program,
especially Judy Ridgway, John Cogan, and Amy Kovach for their support of this research
and providing access to the students. I am also thankful for the logistical support
provided by Rosemarie Thornton and the teaching assistants. Without their patience and
help, it would have been impossible to collect data.
I also wish to thank the students who participated in this study, particularly those
students who completed the instruments in all three administrations.
Lastly, I thank my friends and family for their love and support through this
journey. I am especially grateful for Denny, Scott, Kristi, Tim, and Helen Schen, as well
as Jessica Auman, Cynthia Bill, and Rhiannon Light.
v
VITA
March 15, 1976 .....................Born – Canton, Ohio
1998 ......................................B.A. Biology, Case Western Reserve University
2000 ......................................M.S. Biology, Case Western Reserve University
2000 – 2003 ..........................Lecturer, Case Western Reserve University
2006 – 2007 ..........................Adjunct Instructor, Columbus State Community College
FIELDS OF STUDY
Major Field: Education
Minor Field: Research Methods
vi
TABLE OF CONTENTS
Page
Abstract...........................................................................................................................ii
Dedication......................................................................................................................iv
Acknowledgments...........................................................................................................v
Vita ................................................................................................................................vi
List of tables....................................................................................................................x
List of figures...............................................................................................................xiii
Chapter 1: Introduction....................................................................................................1
Background and setting .............................................................................................1
The call................................................................................................................1
Scientific reasoning overview ..............................................................................4
Scientific reasoning in education..........................................................................5
Scientific reasoning in college biology.................................................................7
Statement of problem.................................................................................................9
Research questions ..................................................................................................11
Definition of terms ..................................................................................................12
Argumentation (dialogic) ...................................................................................12
Constitutive definition....................................................................................12
Operational definition ....................................................................................12
Deductive reasoning ..........................................................................................12
Constitutive definition....................................................................................12
Operational definition ....................................................................................12
Inductive reasoning............................................................................................13
Constitutive definition....................................................................................13
Operational definition ....................................................................................13
Hypothetico-deductive reasoning .......................................................................13
Constitutive definition....................................................................................13
Operational definition ....................................................................................13
Scientific reasoning............................................................................................13
Constitutive definition....................................................................................13
Operational definition ....................................................................................13
vii
Chapter 2: Literature Review.........................................................................................15
Scientific reasoning................................................................................................15
Deductive aspects of science..............................................................................16
The philosophy of Karl Popper. .....................................................................16
Deduction and hypothetico-deductive reasoning. ...........................................18
Inductive aspects of science ...............................................................................20
The philosophy of Thomas Kuhn and Imre Lakatos. ......................................20
Induction and argumentation. ........................................................................22
How do scientists actually reason? .....................................................................23
Hypothetico-deductive reasoning ability.................................................................27
Hypothetico-deductive reasoning ability and achievement in
college biology ..............................................................................................28
Hypothetico-deductive reasoning ability, student beliefs, and
conceptual change .........................................................................................31
Hypothetico-deductive reasoning ability and hypothesis testing .........................34
Summary ...........................................................................................................36
Argumentation ability ............................................................................................38
Informal reasoning, ability, and expertise...........................................................39
Informal reasoning and college students.............................................................43
Research on general improvements................................................................43
Interventions..................................................................................................44
Problems with informal reasoning. ................................................................46
Reasoning through argumentation in science..........................................................47
Coordination of theory and evidence..................................................................47
Interventions......................................................................................................50
Summary................................................................................................................53
Chapter 3: Methods .......................................................................................................56
Research design .....................................................................................................56
Participants ............................................................................................................58
Outcome measures .................................................................................................61
Dependent variables data collection ...................................................................61
Hypothetico-Deductive reasoning..................................................................61
Argumentation...............................................................................................66
Independent variables data collection.................................................................68
Chapter 4: Results and Conclusions...............................................................................70
Characterization of courses ....................................................................................70
Hypothetico-deductive reasoning ...........................................................................73
Initial distributions.............................................................................................73
Relationship of background characteristics and LCTSR autumn
quarter scores ................................................................................................77
Change in LCTSR scores ...................................................................................80
Change in overall total scores. .......................................................................81
Change in LCTSR item scores.......................................................................85
viii
Argumentation .......................................................................................................89
Initial distributions.............................................................................................89
Relationship of background characteristics and argumentation autumn
quarter scores ................................................................................................94
Change in argumentation scores.........................................................................95
Change in overall total scores. .......................................................................95
Change in argumentation subscale scores. .....................................................99
Correlation of hypothetico-deductive reasoning and argumentation...................... 103
Three-time participants ........................................................................................ 104
Change in overall SR scores............................................................................. 108
Change in LCTSR scores............................................................................. 108
Change in argumentation scores. ................................................................. 110
Change in LCTSR and argumentation subscale scores ..................................... 111
LCTSR item comparison. ............................................................................ 111
Argumentation subscale comparison............................................................ 115
Correlation of HD reasoning and argumentation .............................................. 117
Summary of key findings ..................................................................................... 118
Chapter 5: Discussion and Implications ....................................................................... 120
Limitations of the study........................................................................................ 121
The assumption of natural development of SR skills ............................................ 123
Particular findings for hypothetico-deductive reasoning ....................................... 125
Particular findings for argumentation ................................................................... 126
Biology majors as a population ............................................................................ 127
Future work.......................................................................................................... 129
References................................................................................................................... 131
Appendices:
A.
Argumentation instrument ............................................................................... 139
B.
Argumentation instrument scoring rubric ......................................................... 142
C.
Student demographic information instrument (winter 2007)............................. 144
ix
LIST OF TABLES
Table
Page
1
Demographics of all participants by major ....................................................... 59
2
Distribution of future plans for all participants by major .................................. 60
3
Original LCTSR question distribution to forms A and B .................................. 62
4
Principal components analysis rotated factor loadings of LCTSR form A......... 64
5
Principal components analysis rotated factor loadings of LCTSR form B......... 65
6
Principal components analysis rotated factor loadings of argumentation forms
A and B ........................................................................................................... 67
7
Reliability of argumentation forms A and B subscales ..................................... 68
8
Laboratory exercises characterization by level of inquiry ................................. 71
9
Average total LCTSR scores by administration and major ............................... 74
10
Average AU1 and AU2 LCTSR scores by lab section...................................... 75
11
Average WI LCTSR scores by lab section ....................................................... 77
12
Summary data of regression of AU2 LCTSR scores on college major .............. 79
13
Stepwise entry regression of AU2 LCTSR scores on college major.................. 80
14
Total number of individuals who completed the LCTSR instrument by
administration .................................................................................................. 81
15
Descriptive statistics of AU1 and AU2 LCTSR scores by major ...................... 82
16
Repeated measures MANOVA comparison of AU1 and AU2 LCTSR scores .. 82
x
17
Descriptive statistics of AU1 and WI LCTSR scores by major ......................... 84
18
Repeated measures MANOVA comparison of AU1 and WI LCTSR scores..... 84
19
Repeated measures MANOVA comparison of AU1 LCTSR item-pair scores .. 86
20
P-values from MANOVA post-hoc comparison of AU1 LCTSR
item-pair scores................................................................................................ 87
21
Repeated measures MANOVA comparison of AU2 LCTSR item-pair scores .. 88
22
P-values from MANOVA post-hoc comparison of AU2 LCTSR
item-pair scores................................................................................................ 88
23
Average total argumentation scores by administration and major ..................... 90
24
Average AU1 and AU2 argumentation scores by lab section............................ 91
25
Average WI argumentation scores by lab section ............................................. 93
26
Total number of individuals who completed the argumentation instrument by
administration .................................................................................................. 95
27
Descriptive statistics of AU1 and AU2 argumentation scores by major ............ 96
28
Repeated measures MANOVA comparison of AU1 and AU2 argumentation
scores............................................................................................................... 97
29
Descriptive statistics of AU1 and WI argumentation scores by major............... 98
30
Repeated measures MANOVA comparison of AU1 and WI argumentation
scores............................................................................................................... 98
31
Repeated measures MANOVA comparison of AU1 and AU2 argumentation
subscale scores by major.................................................................................. 102
32
Repeated measures MANOVA comparison of AU1 and WI argumentation
subscale scores by major.................................................................................. 103
33
Pearson product moment correlations between LCTSR and argumentation
scores by administration................................................................................... 104
34
MANOVA demographics comparison of three-time participants with all other
participants ...................................................................................................... 106
xi
35
Independent t-test comparison of number of science courses taken by three-time
participants and all other participants ............................................................... 106
36
Number of three-time participants who completed the LCTSR and argumentation
instruments ...................................................................................................... 107
37
Descriptive statistics of AU1, AU2, and WI LCTSR scores by major for
three-time participants ..................................................................................... 109
38
Repeated measures MANOVA comparison of AU1, AU2, and WI LCTSR
scores for three-time participants ..................................................................... 109
39
Descriptive statistics of AU1, AU2, and WI argumentation scores by major for
three-time participants ..................................................................................... 110
40
Repeated measures MANOVA comparison of AU1, AU2, and WI
argumentation scores for three-time participants .............................................. 111
41
Repeated measures MANOVA comparison of AU1 LCTSR item-pair scores
for three-time participants ................................................................................ 112
42
P-values from MANOVA post-hoc comparison of AU1 LCTSR item-pair scores
for three-time participants ................................................................................ 113
43
Repeated measures MANOVA comparison of AU2 LCTSR item-pair scores
for three-time participants ................................................................................ 114
44
P-values from MANOVA post-hoc comparison of AU2 LCTSR item-pair scores
for three-time participants ................................................................................ 115
45
Repeated measures MANOVA comparison of AU1 and AU2 argumentation
subscale scores by major for three-time participants......................................... 116
46
Pearson product moment correlations between LCTSR and argumentation
scores by administration for three-time participants ......................................... 117
xii
LIST OF FIGURES
Figure
Page
1
Schema of scientific reasoning and epistemology, identifying the roles of
hypothetico-deductive and inductive reasoning ................................................ 5
2
The relationship between hypothetico-deductive reasoning and argumentation
to compose scientific reasoning is connected through the relationship between
evidence and theory ......................................................................................... 14
3
Giere et al.’s (2006) model of an ideally complete report of a scientific
episode (p. 29) ................................................................................................. 17
4
Schematic of scientific reasoning, linking deductive and inductive reasoning... 25
5
Distribution of all 460 participants by major .................................................... 61
6
Mean AU1 LCTSR item-pair scores by major.................................................. 86
7
Mean AU2 LCTSR item-pair scores by major.................................................. 87
8
Mean AU1, AU2, and WI argumentation subscale scores by major.................. 100
9
Mean AU1 LCTSR item-pair scores by major for three-time participants......... 112
10
Mean AU2 LCTSR item-pair scores by major for three-time participants......... 114
11
Mean AU1 and AU2 argumentation subscale scores by major for three-time
participants ...................................................................................................... 116
xiii
CHAPTER 1
INTRODUCTION
“…'To know' science is a statement that one knows not only what a phenomenon
is, but also how it relates to other events, why it is important, and how this
particular view of the world came to be. Knowing any of these aspects in isolation
misses the point. Therefore, in learning science, students, as well as having the
opportunity to learn about the concepts of science, must also be given some
insight into its epistemology, the practices and methods of science, and its nature
as a social practice…” (Driver, Newton, & Osborne, 2000, p. 297)
Background and Setting
The Call
The field of biology is rapidly expanding, more often utilizing concepts and
theory from other disciplines, such as physics, mathematics, chemistry, engineering, and
computer science. As students choose a science major, they need and expect the course
and laboratory work that will develop them into a scientist. This work includes the
content knowledge and skills necessary to be able to design a solid experiment, analyze
the results, and apply the findings to future work both within and across disciplines
(Committee on Undergraduate Biology Education to Prepare Research Scientists for the
21st Century [CUBE], 2003). However, freshmen biology majors are entering college
under-prepared both in content and cognitive abilities. The National Science Foundation
1
(NSF) (1996) stated only 22% of high school graduates had taken biology, chemistry, and
physics in 1992. Seymour and Hewitt (1997) also found approximately 40% of science,
mathematics, engineering, and technology majors reported inadequate high school
preparation as a problem in their current coursework. These courses are necessary
prerequisites for freshmen in their introductory biology courses. In addition, Lawson
(1992a), states that as many as 50% of students in freshmen-level biology do not use
formal reasoning patterns, which include the ability to develop hypotheses, control
variables, and design an experimental protocol – skills crucial in the scientific process.
Other studies in everyday, informal reasoning have found that undergraduates have
difficulty distinguishing evidence with bias/fairness (Baron, 1991; Perkins, Farady, &
Bushey, 1991; Toplak & Stanovich, 2003), developing arguments with adequate evidence
(Baron; Cerbin, 1988; Perkins et al.), and differentiating between and linking evidence
with claims (D. Kuhn, 1992; 1993a; 1993b; Shaw, 1996). In taking these characteristics
together, students do not seem to be prepared to undertake the demands of an
introductory biology course, which has traditionally assumed the necessary content and
skills background to be in place.
An additional difficulty for biology majors in the first year is simply staying with
their declared major. Seymour and Hewitt (1997) completed a large ethnographic study
to determine why approximately 50% of students leave their intended or declared
science, mathematics, engineering, or technology (SMET) majors. They found that the
percentage of students who “switch” was comprised of both poor-ability students as well
as high-ability students and was even higher with regard to women or minorities. When
analyzing the data, Seymour and Hewitt found the most common factors related to
2
switching included faculty pedagogy and curriculum design/student assessment. Faculty
pedagogy was found to be an issue with 83% of students, including those who switched
and those who stayed in the SMET majors. The problems associated with pedagogy were
characterized as poor faculty involvement, attitude toward students, attitude toward
teaching, and teaching methodology that centered on direct lecture or even reading from
the textbook. In addition, students were overwhelmed with the large amounts of work and
rote memorization that was expected of them. Overall, students found the faculty to be
creating an atmosphere that abandoned them with copious amounts of content to
memorize and no aid or support. The students perceived this alienating atmosphere as one
which was designed to “weed out” students with lesser academic abilities. Those students
of a higher ability who should not have been “weeded out” instead perceived the work as
not challenging or intellectually stimulating. These attitudes are not unique to American
students, as they were corroborated by Marbach-Ad (2004) in a survey study of first-year
students at Tel-Aviv University.
Sigma Xi (1990) surveyed students and faculty in entry-level courses in SMET
and found many of the difficulties described by the students in Seymour and Hewitt’s
1997 study were centered in these courses. In addition to those already identified, faculty
believed that students did not leave these courses with an understanding of the nature of
science. The National Science Foundation’s (NSF) Advisory Committee to the NSF
Directorate for Education and Human Resources (1996) also found that students
perceived the introductory SMET courses as the major barrier to a continued major in
SMET. With this information in hand, the focus must then turn to find ways in which to
improve these courses and lessen attrition of science majors.
3
In order to address the issues brought forth by Seymour and Hewitt (1997),
Marbach-Ad (2004), Sigma Xi (1990), the Committee on Undergraduate Biology
Education to Prepare Research Scientists for the 21st Century (2003), and NSF (1996)
call for more information and research to be conducted on the introductory biology
courses, especially with regard to biology majors. Aside from the causes of attrition, little
research has been completed on biology majors’ experiences and requirements as a
unique group. As the first experiences are so crucial and students are entering college
underprepared, an emphasis should be placed on exactly how biology majors are affected
and prepared by their early coursework.
Scientific Reasoning Overview
Scientific reasoning (SR), i.e. evidence-based reasoning, is the heart of scientific
knowledge generation. It is the method through which evidence is collected and analyzed
and linkages between concepts and theories have been created. This reasoning has two
general aspects: deduction and induction. Science has traditionally been positivist,
utilizing deductive logic in the testing of hypotheses to remain objective and collect
evidence. One of the strongest proponent’s for this view of science, Karl Popper, believed
that searching for refutations to theories through hypothetico-deductive experimentation
and reasoning was the only way in which to truly conduct science – once a theory was
refuted, it was no longer useful in that form (Popper, 1965; 1993; Thorton, 2005).
However, as science became increasingly viewed as a social endeavor, the importance of
induction was implicitly legitimized by philosophers such as Thomas Kuhn (1993; 1996)
and Imre Lakatos (1993). This new view of science recognized the crucial collection of
evidence through traditional hypothetico-deductive methods, but also established the
4
importance of a collective use of both confirming and refuting evidence to bear on and
modify theoretically driven research paradigms/programs. This more current view of
science then takes into account both deductively-derived evidence and inductivelyderived theories (Figure 1).
Current Theory
New, Modified Theory
refuting
evidence
Hypothetico-Deductive
Reasoning
(confirming evidence)
Inductive
Reasoning
Figure 1. Schema of scientific reasoning and epistemology, identifying the roles of
hypothetico-deductive and inductive reasoning.
Scientific Reasoning in Education
The importance of SR in education is recognized by the “scientific ways of
knowing” standards in both Science for All Americans (American Association for the
Advancement of Science [AAAS], 1990) and The National Science Education Standards
(National Research Council [NRC], 1996). Even though these books are not geared
toward undergraduate education, they evidence the importance of SR for all individuals,
not just science majors, throughout their education. Both of these books emphasize the
nature of science through the linkage of true scientific evidence with theories, the
predictive ability of theories, and the changing nature of theories as new evidence
5
becomes available. The underlying ability to understand and participate in these
characteristics of science is SR.
Evidence of SR in education is not limited to the educational goals and standards.
It is also illustrated in several learning theories. Inhelder and Piaget (1958) characterized
several stages of intellectual development. The later two stages, concrete operational and
formal operational, are relevant to SR in that these are the stages during which advanced
reasoning skills, as described previously, begin to develop. Concrete operational
individuals are dependent on concrete experiences and are unable to develop hypotheses
based on those experiences. Those individuals who are able to generate and test
hypotheses, i.e. use hypothetico-deductive reasoning, have reached formal operational
thinking. The attainment of this level of reasoning may not occur for all individuals or in
all subject areas (Piaget, 1972), but is of crucial interest in the sciences. Those who study
informal reasoning have also found many parallels with SR. Once relegated to simple
“everyday reasoning,” informal reasoning was only studied for its use in day-to-day
needs and social issues. However, with the belief and evidence that informal reasoning
really was a form of internal argumentation, interest in its place in the sciences increased
(D. Kuhn, 1993a). The process of argumentation, especially the framework identified by
Toulmin (1958), mirrors the inductive process in science: multiple pieces of
data/evidence are given to support a claim/theory and are backed by
warrants/justifications. Although not directly Baconian, the principle of a theory
emerging from a collection of data is retained. Lastly, individual constructivism and
learning through scientific inquiry are based on the premise that knowledge is actively
built from observations and interactions with the world. This knowledge (i.e. evidence) is
6
analyzed and understood in regard to current information (i.e. theory), which in turn is
modified by the new knowledge (von Glasersfeld, 1993). Scientific inquiry generates
new information in a similar fashion. Current knowledge informs and shapes a conduct of
inquiry, which, through reflection on the results, sharpens the current base of knowledge
(Hodson, 1996; T. Kuhn, 1996; Leonard, 2000). All three of these learning theories relate
to the generation of scientific knowledge through reasoning, from generating a
hypothesis, to deducing predictions, to analyzing evidence in light of and bearing on
current theory. With this epistemological evidence lending theoretical support to SR in
education, much information has been gathered on ways to promote SR, both deductive
and inductive, in the classroom. However, this research has generally been limited to
hypothetico-deductive reasoning or argumentation per investigation and most often
involves K-12 students, preservice, and inservice teachers.
Scientific Reasoning in College Biology
The most important goal for undergraduate biology curriculums is content
acquisition and understanding. However, as noted by Science for All Americans (AAAS,
1990), The National Science Education Standards (NRC, 1996), and Transforming
Undergraduate Education in Science, Mathematics, Engineering, and Technology (NRC,
1999), there is more to a science education than just content knowledge. The acquisition
of SR skills and content knowledge are concurrent and dependent on each other (Means
& Voss, 1996). Strong reasoners need a breadth and depth of content knowledge from
which to identify appropriate and impacting information. Studies have demonstrated that
reasoners who display advanced argumentation skills also generally have a higher level
of content knowledge from which to draw evidence and alternative explanations (Perkins,
7
Allen, & Hafner, 1983; Voss & Means, 1991). Conversely, strong reasoning skills also
aid individuals in making connections between concepts. This is implied by the studies
which reveal that those individuals who are identified with advanced reasoning skills also
demonstrate greater conceptual understanding (Daempfle, 2002; Johnson & Lawson,
1998; Lawson, 1980; Sadler & Zeidler, 2005; Zohar, 1996; Zohar & Nemet, 2002).
The content knowledge students obtain is generally discovered in the lecture
courses. However, the practice of reasoning skills is rarely identified in the lecture.
Instead, lectures have remained as times when students sit passively to be inundated with
terms and facts with very little connection to other disciplines or ideas (Carter, Heppner,
Saigo, Twitty, & Walker, 1990; CUBE, 2003; Seymour & Hewitt, 1997). The most likely
opportunity for students to practice and develop their scientific reasoning skills is in the
laboratory, both in research and in courses (Bender, 2005). However, most introductory
biology laboratory exercises are still primarily composed of “cookbook” exercises
designed with a goal of concept affirmation instead of concept construction. On the other
hand, the use of investigative, inquiry-oriented laboratories has been demonstrated to
increase students’ use of science process skills, concept comprehension, and achievement
(Leonard, 1989). These laboratories have also been the focus of many studies to
determine ways in which to help accelerate and improve students’ reasoning skills and
overall cognitive development (Daempfle, 2002; Lawson, 1992a). Reasoning skills
development in these courses may have stemmed from participating in a process similar
to the three phases of scientific inquiry described by Windschitl and Buttemer (2000):
generating experiments, analyzing data, and defending their conclusion to critiques by
peers. By modeling this process in inquiry-oriented laboratories, students then replicate
8
the way in which scientists generate knowledge. The critical aspect of these phases is the
defense of critiques by peers, whereby scientists learn the socioculturally acceptable
standards for generating and backing a conclusion within the realm of science. Lawson
(1995) complements this finding, as he indicated that the development of hypotheticodeductive thought is through conflict with others and a desire to prove. In addition,
Hogan and Maglienti (2001) and Nersessian (1995) assert that the way in which scientists
develop their reasoning skills is through authentic practice within the science culture,
including defending conclusions and resolving alternative explanations. Therefore, the
use of inquiry in the laboratory not only allows students to practice and refine their
reasoning skills, it also provides them with experience in the manner which scientific
inquiry builds knowledge directly.
Statement of Problem
Students who chose to major in biology should expect to be guided into the
culture of science. To this end, students should be given the content and the skills
necessary, especially SR, which enables students to understand how the conceptual
knowledge they were given has been constructed and modified. However, curriculums
appear to have an overt focus on content given through “cookbook” laboratories and
lectures (CUBE, 2003; NRC, 1999; NSF, 1996; Seymour & Hewitt, 1997). It has
therefore been generally assumed that biology majors’ reasoning skills will be developed
through the natural participation in science course work and later in research. In support
of this belief, there is evidence that scientists’ reasoning skills develop through actual
participation in the social aspects of the science culture (Hogan & Maglienti, 2001).
However, as the literature describes initial difficulties regarding the coordination of
9
theory and evidence, as well as hypothetico-deductive reasoning in their peers, one can
infer that biology majors may have similar problems with reasoning skills (e.g. D. Kuhn,
1993a; Lawson, 1992a). As biology majors may experience similar initial difficulties in
SR and a significant portion of this reasoning may not be honed until the individual
enters the science community, this leaves a large portion of their preparation with regard
to reasoning unaccounted for. For instance, how well do biology majors reason,
inductively and deductively, when they begin their curriculum? How well does this
reasoning develop? What aspects of the curriculum appear to influence this development?
Currently, these questions are difficult to answer as the research in SR often looks
individually at deductive and inductive reasoning skills, focuses on socioscientific issues,
and utilizes primarily young individuals (K-12), non-science majors, or lay adults (e.g. D.
Kuhn, 1993b; D. Kuhn & Pearsall, 2000; Lawson, 1992a; Sadler, 2004; Zohar & Nemet,
2002). Taken together, the current literature implies but does not directly give knowledge
concerning the SR of biology majors with regard to scientific problems. This baseline
information is critical in order to address the development of the reasoning skills that will
allow these future scientists to participate and practice within the culture of science.
10
Research Questions
This descriptive study will attempt to address the following questions:
1. What is the initial distribution of hypothetico-deductive reasoning skills of
undergraduate biology majors?
2. What is the predictive relationship of student background characteristics, such as
gender, major, university rank, years of college attendance, number of previous
college-level and Advanced Placement science courses, and future academic plans
with hypothetico-deductive reasoning skills?
3. What is the change (if any) in hypothetico-deductive reasoning skills of
undergraduate biology majors during their first two courses?
4. What is the initial distribution of argumentation skills of undergraduate biology
majors?
5. What is the relationship of student background characteristics, such as gender,
major, university rank, years of college attendance, number of previous collegelevel and Advanced Placement science courses, and future academic plans with
argumentation skills?
6. What is the change (if any) in argumentation skills of undergraduate biology
majors during their first two courses?
7. What is the relationship between hypothetico-deductive reasoning and
argumentation skills of undergraduate biology majors during their first two
courses?
11
Definition of Terms
Argumentation (Dialogic)
Constitutive definition. Dialogic argumentation recognizes opposition among
assertions, relates supporting and refuting evidence to each assertion, and weighs all of
the evidence “in an integrative evaluation of the relative merit of the opposing views”
(Kuhn, 1992, p. 157). It is a measure of “the cognitive and affective processes involved in
the negotiation of complex issues and the formation of adoption of a position” (Sadler &
Zeidler, 2005, p. 73). Strong argumentation displays internal consistency and analysis of
evidence through multiple perspectives, while poor argumentation is evidenced by
unclear, contradictory, and simple arguments with a single perspective (Sadler & Zeidler,
2005).
Operational definition. Argumentation quality is assessed based on Toulmin’s
argumentation pattern (TAP). Subscale and composite scores on the argumentation
instrument, based on the five aspects of TAP: claims made, evidence used, warrants
given, alternative explanation identified, and rebuttal given will serve as measures of the
quality of argumentation. The process and coherency of argumentation, not the actual
content of the argument, is the focus of the assessment.
Deductive Reasoning
Constitutive definition. Deductive reasoning is reasoning through the process of
deduction; “reasoning from the general to the specific, or from premises to a logically
valid conclusion” (Agnes, 2002, p. 377).
Operational definition. Deductive reasoning is assessed by the score on the
Lawson Classroom Test of Scientific Reasoning (LCTSR). This assessment focuses on
12
deduction in the form of hypothetico-deductive reasoning, as perceived by Inhelder and
Piaget (1958).
Inductive Reasoning
Constitutive definition. Inductive reasoning is reasoning through the process of
induction; “reasoning from particular facts or individual cases to a general conclusion”
(Agnes, 2002, p. 729)
Operational definition. Inductive reasoning is assessed through the argumentation
quality sub-scale and composite scores on the argumentation instrument (see
Argumentation).
Hypothetico-Deductive Reasoning
Constitutive definition. Hypothetico-deductive (HD) reasoning is the cognitive
process used to generate a hypothesis, devise an appropriate experiment to test the
hypothesis, deduce a prediction, and determine the agreement of the evidence with the
prediction.
Operational definition. Hypothetico-deductive reasoning is assessed by the score
on the Lawson Classroom Test of Scientific Reasoning (LCTSR).
Scientific Reasoning
Constitutive definition. Scientific reasoning (SR) is evidence-based reasoning.
Scientific reasoning encompasses aspects of both deduction and induction to generate,
modify, and validate theories based on evidence, which in turn has been discovered
through experimentation within a theoretical framework (Figure 2).
Operational definition. Scientific reasoning is assessed through scores on both the
LCTSR and argumentation instruments.
13
14
Figure 2. The relationship between hypothetico-deductive reasoning and argumentation to compose
scientific reasoning is connected through the relationship between evidence and theory.
CHAPTER 2
LITERATURE REVIEW
Scientific Reasoning
What is scientific reasoning? As previously defined, scientific reasoning (SR) is
the process of producing scientific knowledge through evidence-based reasoning. This
definition, however, is rather broad. Exactly how do scientists use the evidence they have
gathered to generate new scientific knowledge? The philosophy of science recognizes
two broad types of reasoning using evidence: deduction and induction. Deduction, with
its basis from Aristotle, uses general theories to create hypotheses and predict the
outcomes, generally using an “if, and, then” model. For example, if all birds have two
legs, and the finch is a bird, then it will have two legs. On the other hand, induction,
championed by Sir Frances Bacon, uses multiple pieces of evidence from which to create
theories. For example, since finches, jays, wrens, eagles, etc. all have two legs, then all
birds have two legs. Whether scientists use either or both types of reasoning is still up for
debate in science philosophy circles. In addition, the best type of SR to be employed in
the classroom has influenced a small debate in science education. However, very little
research has looked at both types concurrently (D. Kuhn & Pearsall, 2000) and for the
purposes of this study, both types will be considered.
15
Deductive Aspects of Science
The philosophy of Karl Popper. Science has traditionally been viewed through a
strictly positivistic lens whereby the process of evidence collection was believed to be
completely objective and dictated by a deductive method. One of the preeminent
philosophers of science who touted this viewpoint was Sir Karl Popper. Popper’s (1965;
1993) main interest was in determining the criterion of demarcation between science and
pseudo-science. He stated that it was easy to find confirming evidence for theories when
looking for confirming evidence. However, the truly genuine test of a theory was its
falsifiability; therefore, any theory that cannot be refuted by any conceivable event is
nonscientific and the only truly confirming evidence is that found through an attempt to
refute a theory. Popper also disapproved the use of ad hoc adjustments to a theory
following the discovery of refuting evidence. He believed that once a theory was refuted,
it could no longer be useful—a belief criticized for its ignorance of the day-to-day
workings of actual scientific practice (T. Kuhn, 1993; Thorton, 2005). However, Popper
later qualified this belief by recognizing the importance of the modification of theories by
auxiliary hypotheses, provided that the rationale behind the modifications were scientific
and not designed simply to evade falsification of the theory (Thorton, 2005). Following
these beliefs, Popper is also known for his staunch disapproval of induction as part of the
scientific method. If the true demarcation between science and pseudo-science is the
ability to disconfirm a theory, then theories built through induction of only confirming
evidence are not scientific. Popper stated, “Induction, i.e. inference based on many
observations is a myth. It is neither a psychological fact, nor a fact of ordinary life, nor
one of scientific procedure.” (1965, p. 53). This rebuke of induction has also been one of
16
Popper’s criticisms and is therefore not acknowledged in the current epistemology of
science.
Taken together, Popper’s views of the process of science are readily summed up
by Giere, Bickle, and Mauldin’s (2006) diagram of an ideal report of a scientific episode
(Figure 3). In this diagram, the model is related to the real world by a hypothesis. This
hypothesis will be found to be true if the data, derived from the real world and generated
through experimentation, are found to agree with the prediction, derived from the model.
Figure 3. Giere et al.’s (2006) model of an ideally complete report of a scientific episode
(p. 29).
This is the hypothetico-deductive process. The scientist observes the real world and
conceives of a hypothetical model to represent it, from which a prediction is deduced
regarding how the real world functions. Data is generated through an experiment and
checked to match with the prediction. Those data that match confirm the model for the
moment and those that disagree with the prediction refute the model, similar to Popper’s
(1965) belief. The best model then is the one that stands continued attempts at refutation.
17
Deduction and hypothetico-deductive reasoning. How has deduction been used in
science education? Through his research into cognitive development, Jean Piaget
categorized four stages of intellectual development from infancy to adulthood. These
stages are defined by the mental structures and operations the individual acquires and
integrates with previous experiences and operations as they mature. They are often
discussed in terms of “reasoning patterns” or “reasoning abilities,” especially with regard
to science education (to avoid other uses of “operation” in science terminology) (Karplus,
1977). As this study focuses on college students, the first two stages, sensory-motor and
pre-operational, will not be reviewed. However, the next two stages are of importance in
describing students’ reasoning at the post-secondary level. When individuals are able to
modify an object of knowledge through interiorized actions, Piaget states that these are
the creations of mental operations. The concrete operational stage (ages 7 to 10 years)
bases these operations on objects that are observable and able to be referenced, forming
the basis of elementary logic, though not extending beyond the observable. When a child
is able to construct new operations that are combinatorial and to reason based on
hypotheses that he or she is able to test, the formal operational stage is reached
(beginning approximately ages 11-15 years) (Inhelder & Piaget, 1958; Karplus, 1977;
Piaget, 1972; 2003). “In other words, formal thinking is hypothetico-deductive” (Inhelder
& Piaget, p. 251). An individual’s speed of development through the stages has been
found to be dependent upon many different factors, including societal, cultural, and
intellectual stimulations. Although most individuals can expect to advance through the
first three stages, the attainment of the fourth stage is not as well defined, depending on
the stimulation by environmental factors. Piaget hypothesizes that, between 15 and 20
18
years, most normal individuals can expect to attain the formal operational stage, although
in different areas according to his or her individual aptitudes and professional
specializations (Piaget, 1972). Interestingly, it has been found that only 25% of entering
freshmen have attained the formal level of reasoning abilities, 50% are in transition from
concrete to formal, and the remaining students have been found to still be concrete
reasoners (Chiappetta, 1976; Lawson, 1980; 1992a).
At the college level, then, it would appear that only the concrete and formal
operations are of interest; however, current research has led to the characterization of a
possible fifth operational stage. Arlin (1975) demonstrated that Piaget’s formal level may
actually be two separately characterized levels: the formal level, defined by problem
solving, and a fifth level, defined by problem-finding or the ability to discover questions
from ill-defined problems. In addition, Arlin provided evidence that these two stages are
sequential, dependent on mastery of one stage to progress to the next, as the first four of
Piaget’s operational stages. Arlin’s work has been taken further to demonstrate that
similar to the first four stages, there is a fifth change in the brain’s electroencephalogram
pattern in young adults that correlates with the development of the ability to derive
alternative hypotheses involving unseen entities (Lawson, Drake, Johnson, Kwon, &
Scarpone, 2000). This “post-formal” skill is a progression from the formal stage. Lawson,
Clark et al. (2000) characterized this progression as the ability to develop hypotheses
based on unseen entities from an ability to develop hypotheses only based on observable
entities.
As both the upper levels of Piagetian reasoning skills and Popper’s philosophy
regarding the nature of scientific knowledge construction center on hypothetico19
deductive thinking, much research has been conducted, primarily led by Lawson,
utilizing Piagetian reasoning levels to assess students in the sciences. The intersection of
these two knowledge construction theories in the science classroom has produced a
plethora of data linking Piagetian reasoning skills to teaching and learning factors such as
science achievement, conceptual change, instructional methods, and learning styles
(Lawson, 1992a). Lawson (2003a; 2005) has even gone so far as to utilize some of his
research methods and results to affirm Popper’s deductive viewpoint to the exclusion of
all other possible interpretations, although he has not escaped criticism in this matter with
regard to “shoehorning” his data (Allchin, 2003). With this criticism in mind, does the
connection between the assertions of cognitive psychologist Piaget and science
philosopher Popper tell the whole story with regard to students’ scientific reasoning
skills?
Inductive Aspects of Science
The philosophy of Thomas Kuhn and Imre Lakatos. The importance of deduction
in science is recognized without hesitation. However, the use of induction in science has
not been so readily accepted because of reasons similar to the refutations given by Popper
(1965). In addition, science has traditionally been seen as an individual endeavor instead
of a collaborative effort, which pools evidence from different quarters to formulate
theories. However, the importance of social interactions in scientific knowledge
construction has been recently acknowledged. This has primarily been due to two
philosophers, Thomas Kuhn and Imre Lakatos, who give implicit regard to induction in
their view of the scientific knowledge generation. T. Kuhn (1993; 1996) asserted that
scientific knowledge is dominated by research paradigms that drive both the direction for
20
and method from which science knowledge is created. These paradigms, which are akin
to a theory, influence the manner in which evidence is evaluated and persist until
challenged by anomalous data – the process of “normal science.” The escalation of
anomalous data confronts the paradigm/theory so that it must then be reconsidered,
evaluated, modified, and/or replaced – creating a paradigm shift or “revolution.” The
view of Lakatos (1993) is similar to that of T. Kuhn in that “research programs” exist
which consist of an accepted “hard core” and “positive heuristic” which guide how and
what research is conducted, similar to a paradigm. Lakatos also envisions a protective
belt of auxiliary hypotheses, which allows the research program to continue in light of
anomalous data so long as it keeps predicting and generating novel information, i.e.
“progressing.” Once the program no longer supports the theoretical with the empirical, it
is determined to be “degenerating” and may be superseded, or “shelved,” by another
research program that is progressing in that area. In each of these theories, Popper’s
falsification is not an absolute blow to the theory at hand; rather, refutations are
considered anomalous data to either be set aside for the time being or possibly
incorporated into auxiliary modifications (T. Kuhn, 1993). Instead, rejection occurs when
all the data are considered and inductively give rise to a new and better theory.
The viewpoints of T. Kuhn and Lakatos have been also supported on an
individual level by Chinn and Brewer (1993; 1998) who found that there are eight ways
in which individuals deal with anomalous data: ignoring, rejecting due to disbelief of the
data, uncertainty with regard to validity of data, exclusion due to irrelevance of data to
the theory, abeyance due to acceptance of the data as valid but not sure what to do with it,
reinterpretation that explains the data away within current theory, peripheral theory
21
change involving minor changes to the theory, and theory change due to a need to
abandon the old theory and adopt the new one. Chinn and Brewer found that complete
theory change upon an encounter with anomalous data is the most difficult for individuals
to readily enact. This can also be easily inferred and transferred to the vision of scientists
working within a paradigm/program, which is consistent with the views of T. Kuhn and
Lakatos.
Induction and argumentation. Informal reasoning skills were originally
considered “everyday thinking skills.” However, with the claim that informal reasoning
skills took the form of an internal dialogic argument, a new definition with regard to
scientific reasoning emerged. Deanna Kuhn (1992; 1993a; 1993b) put forth that dialogic
argumentation recognizes two opposing assertions, weighs the evidence, and determines
the relative merit of each assertion in light of the evidence. This consideration of
assertions and the corresponding evidence is similar to that put forward as scientific
reasoning by Thomas Kuhn and Imre Lakatos. The implicitly inductive epistemology of
science provided by philosophers such as T. Kuhn and Lakatos, as well as theories
regarding the structure of arguments have provided a framework with which to
investigate inductive reasoning skills in science.
Around the same time that T. Kuhn and Lakatos were developing their theories,
Stephen Toulmin (1958) elucidated a general structure in which to view argumentation.
This structure begins with the grounds, or data, used to establish a claim. In order to
bridge the grounds and claim appropriately and legitimately, one needs to provide a
warrant, i.e. a general rule. As warrants themselves may have different degrees of
appropriateness and legitimacy related to the problem at hand, backings need to be
22
provided for the warrants, establishing the precise relation of the warrant to the particular
data and claim in question. Finally, the claims may need to be modified as to the degree
of force that the data have on them. To this end, qualifiers and conditions of
exceptions/rebuttals may also need to be offered. Toulmin, Rieke, and Janik (1984) took
this information and created a framework from which to investigate and teach higher
order thinking, including consideration and integration of the grounds, claims, warrants,
and backing, as well as counterarguments and rebuttals used in argumentation. Although
this framework has been around for decades, it has more recently been applied in
numerous studies investigating students’ higher order thinking and reasoning in the
sciences, specifically looking at those factors contributing to students’ ability to generate
and evaluate sound arguments.
Toulmin and T. Kuhn/Lakatos’ theories relate at the heart of their respective
processes – a metacognitive approach to the evidence. This approach encompasses
evaluating the evidence on its own merit, separate from the theory, yet bearing upon it.
The challenge by and evaluation of evidence is similar to that used in argumentation – for
the new claim/paradigm to be accepted, scientists must first be able to envision and
consider alternative arguments. They must then provide the confirming and anomalous
data and utilize the appropriate warrants, backing, and qualifiers to convince others of its
appropriateness and veracity while being prepared for counterarguments and rebuttals.
How Do Scientists Actually Reason?
Currently, the nature of science is generally seen as an intermingling of both
deductive and inductive aspects. Science for All Americans (AAAS, 1990) describes
science as a blend of logic and imagination – logic helps to generate the data and
23
imagination finds ways to connect the evidence. It also states, “Science can use deductive
logic if general principles about phenomena have been hypothesized, but such logic
cannot lead to those general principles” (p. 142). This characterization of the process of
science includes both aspects of Popper’s logical deductive testing of hypotheses and T.
Kuhn/Lakatos’ linkage of evidence to form general principles. The hypothetico-deductive
work of science is carried out at the core, generating evidence, which is then interpreted
in a more inductive scheme, consistent with T. Kuhn’s “normal science”. D. Kuhn and
Pearsall (2000) corroborate this view in noting that there are two categories of scientificthinking skills which are investigated: investigative skills and inferential skills. The
investigative skills are those that are deductive, looking at the path from theory to
evidence through the design of experiments. On the other hand, the inferential skills
entail those that construct theory from evidence. Driver et al. (2000) also support this
view in their discussion of the place of argument in the science classroom. “…Students
need to appreciate that scientific theories are human constructs and that they will not
generate a theory, or reach a conclusion, by deduction from the data alone. Instead, they
need to postulate possible interpretations and then examine the arguments for each in the
light of evidence that is available to them” (p. 299). Even Lawson (2003b), in addressing
the applicability of Toulmin’s argumentation framework for developing hypotheticodeductive skills, appears to support this schema, although it has not been his purpose. He
emphasizes the use of “if/and/then” reasoning to test alternative hypotheses of a problem;
utilizing the claim in Toulmin’s framework as a tentative explanation, students must then
supply the data, backing, and warrants through experimentation. Ironically, he also states
that students “must once again engage in argumentation to convince others of the veracity
24
of their conclusion” (p.1389). In each instance, the importance of deduction to generate
evidence and the implication of induction to aid in generating theory are present, along
with recognition of the critical connection between the two, as shown in Figure 4.
Figure 4. Schematic of scientific reasoning, linking deductive and inductive reasoning.
Although this schema gives the relationship between the two types of reasoning to
be investigated here, it does not identify the qualities of good scientific reasoning. With
regard to strong formal operational reasoning skills, Lawson (1982) identified five
schemata involved in advanced formal reasoning: generating expectations, control of
variables, generating possible combinations of causes, probabilistic/correlational
reasoning, and proportional reasoning. In addition, those who can reason with unseen
25
entities are considered to have achieved the highest level of Piagetian formal reasoning
skills. Lawson (2003b) has also assigned high value to the recognition and testing of
alternative explanations, similar to that seen in the argumentation literature (D. Kuhn,
1992; 1993a). This type of scientific thinking has also been described as an evaluative
epistemology that values thinking, evaluation, and argument in knowing. Scientists, who
follow this epistemology, have been found to value specific, conservative claims based
on a range of solid evidence and consideration of alternative explanations (Hogan &
Maglienti, 2001). Tangential to this idea is the recognition and accommodation of
anomalous evidence into theory. Weaker reasoners tend to either dismiss anomalous
evidence outright or manipulate this evidence to maintain a confirming bias (Phillips,
1978; Zeidler, 1997). Chinn and Brewer (1993) reaffirm this by noting “theory change”
as the highest and most difficult reaction to anomalous data. All in all, strong scientific
reasoning is seen as taking into account all evidence, confirming and refuting, to evaluate
initial and alternative hypotheses in reference to the current theory. This evaluation must
also have a willingness to discard or modify the current theory in light of the accumulated
evidence.
In addition to the above findings, strong argumentation skills also seem to mirror
those mental skills identified in expert/novice literature. Perkins, et al. (1983) found that
skilled reasoners were able to use pattern and model recognition to relate the appropriate
evidence to a social issue. Additionally, these reasoners were more likely to evaluate their
mental models metacognitively for fit to the problem at hand. Voss and Means (1991)
also identified good reasoners as having a “mental style” that actively analyzed
arguments and reflected on progress. The expert/novice literature reflects these findings
26
in that experts notice meaningful patterns in information, due in part to their ability to
organize information mentally. In addition, experts monitor their own approach to
problem solving metacognitively, constantly evaluating and adapting to new problems
(Bransford, Brown, & Cocking, 2000). These findings can readily be connected to the
previously stated scientific-thinking characteristics. All in all, D. Kuhn and Pearsall
(2000) sum up this relationship when they describe the essence of mature scientific
thought as the “coordination of theory and [empirical] evidence in a consciously
controlled manner” (p. 114). Evidence is interpreted in light of the current theoretical
framework/paradigm/program with awareness of the distinction between the theories and
evidence; however, it is also metacognitively evaluated for its fit within the current state
and flux of theories, based on other emerging evidence.
Hypothetico-Deductive Reasoning Ability
Piaget’s operational levels are often discussed in terms of “reasoning patterns” or
“reasoning abilities,” especially with regard to science education (Karplus, 1977).
Science education researchers have recognized that the upper levels of Piagetian
reasoning are defined by hypothetico-deductive thinking and science is a field dependent
on the creation and testing of hypotheses. These researchers have therefore been
investigating the relationship of Piaget's intellectual/reasoning stages with factors such as
science achievement, conceptual change, instructional methods, and learning styles for
varying ages and years of schooling (Johnson & Lawson, 1998; Lawson, 1983; 1992b;
Lawson & Wollman, 2003). In addition, some researchers have proposed a fifth level of
intellectual abilities, the “post-formal operational stage,” whereby the development of
hypotheses based on observable entities is the central characteristic of the fourth stage
27
and the construction of those based on unobservable entities is the criterion for the fifth,
post-formal, stage (Lawson, 2003b; Lawson, Alkhoury, Benford, Clark, & Falconer,
2000; Lawson, Clark et al., 2000; Lawson, Drake et al., 2000).
Hypothetico-Deductive Reasoning Ability and Achievement in College Biology
Due to previous conflicting results considering the correlation of Piagetian
intellectual stage and grades, Lawson (1980) investigated the effects of reasoning ability
level and cognitive style, as defined by field-dependence/disembedding ability, in relation
to course achievement. In this study, Lawson pre-tested the students for Piagetian
intellectual stage, grouping them into concrete, transitional, or formal reasoners, as well
as cognitive style, grouping them into field-dependent, transitional, or field-independent
categories. Course achievement was measured by cumulative exam grades on open book,
open note essay exams. He found that formal reasoning significantly correlated with
field-independence. Lawson also found that Piaget test scores correlated positively and
strongly with cumulative exam scores, both with and without the cognitive-style scores
accounted for, demonstrating that reasoning abilities do appear to have a positive
relationship with course achievement. A few years later, Lawson (1983) was not able to
support his earlier findings when he determined the only consistent correlating factor
among several exam question types was cognitive style. Reasoning ability was found to
have a low positive association with achievement on a computational question, although
not to the degree found in the 1980 study. These studies suggest that reasoning ability
may have a greater effect with regard to open essay examination questions than multiplechoice or computational assessments.
28
Although open essay questions may be the preferred method for demonstrating
differences in reasoning skills, traditionally there is a demand on content knowledge
measured by high-stakes multiple-choice testing, which emphasizes the importance of a
large body of stored information and recall. This type of testing, along with the focus on
prerequisites by colleges and universities, led Johnson and Lawson (1998) to investigate
the effect of prior knowledge compared to reasoning ability on achievement. This study
also took into account method of instruction, hypothesizing that the inquiry method
advocated by the National Science Education Standards (NRC, 1996) would better serve
those students with formal-reasoning abilities while an expository instructional method
would best serve concrete-reasoning students. It was believed that students with formalreasoning abilities could generate and test alternative hypotheses on their own, while
concrete-reasoning students may need a more direct, step-by-step method for laboratory
exercises (Karplus, 1977). Through multiple-choice exam scores, reasoning ability was
determined to be the only significant predictive factor with regard to achievement on both
the semester exams/quizzes and the cumulative final examination. In addition, reasoning
ability, not prior knowledge, demonstrated correlation with achievement by instructional
method, however to a greater extent with expository instruction over inquiry-based
instruction. Lawson and Johnson (2002) repeated this study, finding similar results
regarding the correlation of instructional method with reasoning levels.
The results from both of these studies conflicted somewhat with previous work by
Lawson (1983) where he found a positive relationship between prior knowledge and
achievement on multiple-choice exam questions, but no correlation with reasoning
ability. However, the positive correlation of reasoning ability with achievement on a
29
computational question was greater than the positive correlation found between prior
knowledge and the same question type, similar to the results of his later work. This study
was also only based on one exam, not averaged over three, making the results difficult to
compare. In addition, although the evidence looks strong regarding the prevalence of
reasoning ability over prior knowledge, this conclusion needs to be moderated as the
validity and reliability of the instruments used to test for prior knowledge in all three
studies (Johnson & Lawson, 1998; Lawson; Lawson & Johnson, 2002) was weak,
especially in those done by Johnson and Lawson in 1998 where the reliability coefficient
was r = 0.2. Finally, in all studies investigating the relationship of reasoning ability and
other factors to achievement, the reliability and validity of the semester exams and
quizzes is never noted. Understandably, exams must change with the class; however,
perhaps some of the differences seen in the results may be attributed to differences in
exam questions for that semester.
Lawson (1993) also examined the mutual, relative impact that homogeneous and
heterogeneous pairings based on reasoning abilities had on student laboratory partner
academic performances and enjoyment of learning in a lecture-laboratory combination
introductory biology course for non-majors. Within each lab section, teaching assistants
partnered students by ability with the intent to generate the greatest number of pairing
combinations per class: concrete-concrete, transitional-transitional, formal-formal,
concrete-transitional, concrete-formal, and transitional-formal. Lawson found a
significant increase in the reasoning abilities of the concrete and transitional students
paired together, however laboratory partner reasoning level was not found to be the
source of the effect. In addition, laboratory partner reasoning ability did not significantly
30
affect achievement, although some gains were seen in both partners of concrete-formal
pairings. When analyzing the student surveys regarding the laboratory experience, a
reciprocal positive influence of working with either a much more or less able reasoner
was discovered: the concrete students benefited from more able tutoring and formal
students benefited from teaching their less able partner. Although concrete and formal
reasoners described their experiences as positive, the transitional students significantly
differed on nearly all aspects of the laboratory experience with regard to their concrete or
formal partner. This finding indicated that the perceived reasoning ability of one’s partner
may be critical to a student who is struggling with his or her own reasoning ability.
Lawson inferred that perhaps this difference was not due to the preference for working
with the formal reasoners, but rather the preference not to work with concrete reasoners.
The transitional students may not have been secure in their own abilities and could
become frustrated when working with less able students. Overall, Lawson demonstrated
that although partnering based on reasoning ability may not have an impact on student
achievement, it does have some influence on total laboratory experiences.
Hypothetico-Deductive Reasoning Ability, Student Beliefs, and Conceptual Change
Central characteristics of formal reasoning are the ability to consider multiple,
alternative explanations for phenomena and to generate hypotheses to test such
alternative explanations (Piaget, 1972). These skills are crucial to the process of
conceptual change, whereby students first must differentiate between existing concepts,
then exchange old, nonviable concepts for new, plausible ones, and finally to place these
abstract concepts into the appropriate context (Hewson & Hewson, 2003). The reasoning
skills found in the formal operational stage aid this process through the testing of old and
31
new concepts, as similarly described for deduction in science knowledge generation. It is
therefore desirable to understand the relationship between reasoning level and conceptual
change.
Lawson and Weser (1990) explored this association by investigating the degree
that nonscientific beliefs regarding evolution changed throughout a semester in relation to
the students’ initial levels of reasoning ability. It was predicted that those students who
had higher reasoning skills would be more likely to hold scientific beliefs initially and
also more likely to change nonscientific beliefs. Conversely, concrete reasoners would be
more likely to have and retain nonscientific beliefs. The authors did detect the predicted
correlation between initial reasoning ability and initially-held scientific beliefs, with
significant differences found on test items regarding evolutionary perspective and
Darwin's theory. In addition, although most students tended to believe that evolution took
place, those students identified as formal reasoners were more likely to hold these beliefs
strongly. Lawson and Weser also found that more skilled reasoners were better adapted to
change nonscientific beliefs to scientific ones. They argued that formal reasoners’ ability
to consider alternative views strengthens their ability for conceptual change. However,
mixed results demonstrating regression to more nonscientific beliefs on some test items,
as well as the lack of many significant differences among reasoning abilities and belief
changes implies that nonscientific beliefs mixed with religious convictions may be more
difficult to change due to the depth of emotional investment.
In contrast to investigating nonscientific beliefs which may have emotional
components, Lawson, Baker, DiDonato, and Verdi (1993) used the introduction of
competing theoretical concepts regarding molecular polarity, bonding, and diffusion in
32
order to determine the impact of hypothetico-deductive reasoning skills on conceptual
change of scientific concepts. After being exposed to an inappropriate concept, the
researchers found that reasoning skill group difference was positively and significantly
related to performance on the post-test, which required students to explain diffusion
through the use of the appropriate concept. These group differences were not seen on the
pre-test, as may be initially expected, especially when considering the observations by
Lawson and Weser (1990) that demonstrated a relationship of initial scientific beliefs to
reasoning abilities. However, as the post-test required the students to consider both the
inappropriate and appropriate theoretical explanations, the correlation between formal
reasoning skill and performance revealed the importance of these hypothetico-deductive
skills in conceptual change. In addition, the authors more closely examined those students
who initially scored at a high misconception level on the pre-test. They found that, of
those students, none of the concrete reasoners were able to change from holding a
misconception to a correct conception, while both transitional and formal reasoners
demonstrated some movement towards correct conceptions.
However, with regard to addressing misconceptions through reasoning ability,
one missing piece of information is the degree to which the experimenter actively pursues
an increase in reasoning abilities as one of the teaching goals, as is often the case in
studies where the experimenter is the teacher. If there is a significant focus on developing
reasoning skill through practice, possibly on the particular misconception investigated,
then the premise of the effect of initial reasoning ability on conceptual change is less
valid. The replicability of these experiments is severely limited without knowledge of the
degree to which reasoning skills are incorporated in the everyday classroom. Even
33
without this information, these studies, taken together, demonstrate the variety of
conceptions that can be influenced by reasoning ability.
Hypothetico-Deductive Reasoning Ability and Hypothesis Testing
Previous work focusing on Piaget’s intellectual development stages, including
those already reviewed here, has been on the four levels already described by Piaget
himself. However, recent studies have found there is a possible fifth level of reasoning
ability. The progression of development through the first four Piagetian levels has been
associated with changes in brain function, as noted by changes in electroencephalograms
(EEGs). A fifth alteration in an individual’s EEG has also been found to take place.
Lawson, Drake, et al. (2000) hypothesize this change is associated with a post-formal
operational stage, which they term “theoretical,” denoted by the ability to formulate
hypotheses involving unseen entities such as water or gas molecules. They found that in
administering quizzes that asked for hypotheses of increasing difficulty, which correlated
with concrete, formal, and theoretical levels of reasoning ability, there was a decrease in
the percentage of students who were able to correctly answer the questions. Although no
statistical analyses were performed, the percentages of those students who succeeded on
progressively difficult tasks roughly equaled the average percentages of students found at
the corresponding reasoning levels.
Lawson, Clark, et al. (2000) validated these findings by adjusting the previous
Lawson Classroom Test of Scientific Reasoning and demonstrated there are two
significantly different levels of hypothesis-testing skills. The first corresponds to the
formal reasoning level and hypotheses based on visible entities and the second
corresponds to the theoretical reasoning level and hypotheses generated for unseen causal
34
agents. In addition, these two levels have a significantly increasing impact on exam
scores and on transfer problems designed to measure theoretical reasoning skills.
Lawson, Alkhoury, et al. (2000) found a similar correlation between higher
theoretical reasoning ability and a proposed new hierarchy of scientific concepts. The
authors offered a designation of three levels of scientific concepts: descriptive, which
involves directly observable entities; hypothetical, which involves unseen entities that are
unable to be observed due to a restricted time frame (i.e. evolution); and theoretical,
which involves concepts that are not directly observable. When computing multivariate
analysis and pairwise comparisons, the authors found that, overall, the higher the
reasoning ability, the greater the proficiency in answering questions regarding more
difficult concepts.
These three studies all demonstrate the presence and characteristics of a new level
of intellectual development, which Lawson and colleagues have termed “theoretical.”
Piaget’s basic premise of the necessity of completing one stage before advancing to the
next is retained in this new schema. However, if one needs to acquire the critical skills to
advance through the stages, then those individuals who have not attained these skills
should not demonstrate any proficiency in the tests for levels above his or her current
level. Interestingly, all three studies do demonstrate that within each reasoning stage there
is proficiency variability with lower ability reasoners achieving on higher order
questions. This does call into question the validity and reliability of the test of reasoning
skills. However, Lawson and his colleagues postulate it is possible that lower-ability
students may have enough pieces of knowledge to correctly answer a few higher-level
questions or that personalogical factors, such as confidence and creativity, may be
35
interacting with reasoning ability (Lawson, Alkhoury et al., 2000; Lawson, Clark et al.,
2000; Lawson, Drake et al., 2000). Overall, though, the area is still relatively new and it
is possible that this intellectual level may be characterized by more than the ability to
consider and test unseen entities.
These studies regarding hypothesis testing and the presence of a fifth Piagetian
intellectual stage are fairly new but appear promising. The validity of the newer items
testing for the categorization of the theoretical level has been established through their
repeated use in studies that demonstrate there is a significant difference in the abilities
and achievement of students in the formal and theoretical stages. Staver and Pascarella
(1984) increases this confidence by demonstrating that the method and format of the
Piagetian tasks, which comprise the reasoning ability test, have no impact on the overall
responses of subjects. Perhaps the greatest weakness in this area of study is the possibility
of the interaction of other factors that could account for the appearance of upper-level
reasoning skills in lower-level reasoners. Although this is a concern in all studies
regarding Piagetian levels, more care and scrutinization need to take place when
establishing a new level. Future research needs to be directed at these possible
interactions, if only to discern newer individual characteristics that could have impacts on
the research presented here or could further characterize Piaget’s intellectual stages.
Summary
The literature demonstrates that reasoning ability, when compared to cognitive
style and prior knowledge, has a greater overall effect on student achievement, regardless
of instructional or testing method. This statement is supported by Cavallo, Rozman,
Blickenstaff, and Walker (2003/2004) who also found reasoning ability to be the best
36
predictor of achievement in a group of sophomore and junior science majors. When
taking the strength of the effect of reasoning ability on achievement into account,
partnering in laboratories based on reasoning ability appears to be a possible extension of
this impact from higher-level to lower-level reasoners. However, Lawson (1993) has
been unable to offer evidence in support of this, although the process still reveals some
positive effects on student perceptions of their overall laboratory experiences. With
regard to conceptual and belief change, the studies establish an overall positive
correlation between reasoning ability and conceptual change. However, this correlation is
primarily associated with post-test change and not initial reasoning of alternative
conceptions, as would be expected. This is not a pressing point, as each study was
measuring the degree of conceptual change for two topics that are fundamentally
different in regard to their personal nature for different individuals. Concepts regarding
evolution are intricately tied to religious beliefs, which are very difficult to disentangle
and disembed. On the other hand, misconceptions regarding diffusion are easier to
address, as the emotional ties are few. Nevertheless, as each study examines different
types of misconceptions yet still demonstrates a positive correlation with the appropriate
concept on post-test data, they also reveal the common influence of higher reasoning
ability with regard to conceptual change. In addition, they demonstrate that nonscientific
beliefs with an emotional commitment are more difficult to alter while those without an
emotional connection do not demonstrate this same resistance.
As an additional note, all the studies in this review, save Cavallo, et al.
(2003/2004), have focused on non-major introductory biology students from Arizona.
Although this allows for a more consistent comparison among the results of the studies, it
37
does not offer information regarding the distributions and relationships of reasoning
abilities in science majors. As aptitude may be a factor in the progression through the
stages, it is possible that science majors may have a higher proportion of formal and
theoretical reasoners and demonstrate different relationships among the factors discussed:
achievement, cognitive style, prior knowledge, instructional method, and conceptual
change (Piaget, 1972). If reasoning ability has the pervasive impact as demonstrated here
and hypothesis-testing skills, both seen and unseen, are crucial for developing scientists
to acquire, then information regarding the distribution and impact of the stages in the
population of future scientists, i.e. science majors, is critical.
Argumentation Ability
The use of argumentation theory and Toulmin’s argumentation pattern (TAP) is
relatively new in science education. Argumentation and TAP have generally been
investigated as a measure of informal, everyday reasoning about social issues, due to the
similar nature of each. Both argumentation and informal reasoning recognize opposing
assertions, weigh evidence for and against each assertion, generally use inductive
reasoning, and adopt a justified position in the end (D. Kuhn, 1992; Sadler & Zeidler,
2005; Zohar & Nemet, 2002). D. Kuhn (1993a; 1993b), in her analysis of how
individuals conceive of theories and the evidence that bears upon them, identified the
parallel nature between informal and scientific reasoning using TAP as a framework. She
found that in both informal reasoning of social issues and scientific reasoning during
experimentation, individuals had difficulty recognizing the possible falsehood of a
theory, identifying evidence that can confirm and refute the theory, and resolving
alternative explanations. This finding and its parallel nature to the current epistemology
38
of science proposed by T. Kuhn (1993; 1996) and Lakatos (1993) has led to the use of
TAP to analyze individual ability to coordinate theory and evidence, recognize alternative
explanations, and offer rebuttals in science. True to its roots in social issues however, the
expansion of argumentation in the sciences has often focused on its use in the resolution
of socioscientific issues (e.g. Sadler, 2004; Salder & Zeidler; Zohar & Nemet). Although
the research regarding the use of argumentation with scientific issues is of increasing
interest (i.e. Osborne, Erduran, & Simon, 2004), there is still less information on its use
in resolving scientific problems. However, one can infer characteristics of argumentation
in a strictly scientific context from the literature on informal reasoning of social and
socioscientific issues.
Informal Reasoning, Ability, and Expertise
In a shift from the focus on formal, deductive-based reasoning, Perkins, et al.
(1983) investigated the problems individuals encounter using everyday, informal
reasoning. This study utilized a large group of subjects that ranged from ninth-grade
students to doctoral students to non-students, with and without college degrees. Perkins et
al. interviewed individuals as they pondered over a given social issue, such as
reinstatement of the draft, television violence, the effectiveness of bottle deposits on
littering, and the definition of art. The authors wanted to determine the types and
frequencies of different objections that the individuals used for lines of reasoning.
Although this study was in its preliminary stages and categorization of the objections did
not follow a formal scheme, two major factors emerged related to the skilled reasoners:
large knowledge repertoire and efficient knowledge evocation in the form of pattern and
model recognition. Perkins et al. also suggested that a third component, a type of
39
metacognition, must be important. This suggestion came from the discovery of two types
of epistemologies: naïve reasoners who had a “makes-sense” epistemology and skilled
reasoners who demonstrated a “critical epistemology.” The naïve reasoners tended to
accept “truths” more readily, provided they made intuitive sense. Skilled reasoners, on
the other hand, held higher standards with regard to adequate justifications, exhibited
metacognition through questioning different mental models, and seemed to have general
skills with regard to logical thinking and heuristics. These results began to delineate the
characteristics of good informal reasoning.
Using the same data, Perkins (1985) investigated the relationship of educational
level on informal reasoning ability. Looking at students and non-students, he analyzed the
results in terms of ability to explain the logic of the argument with breadth and depth on
six scales. He also gave an additional overall intuitive rating for argument quality.
Looking at trends in the data, graduate students performed better than college students
who scored higher than high school students. Upon further analysis of actual gains at
each education level, he found that, although all gains were small, the greatest gains were
seen in the high school years. Perkins also performed a regression analysis of prior
thought on the different problem topics, IQ, age, and years of education on each of the six
scales. When pooling each of the student and non-student groups, IQ was found to be the
most significant contributing variable. Perkins argues several conclusions. First, general
ability is a major factor with regard to the use of informal reasoning for general social
issues; however, as Perkins reviewed several studies that demonstrate an improvement in
reasoning skills through instruction, general intelligence or ability cannot be the only
factor. Second, students do improve significantly in informal-reasoning ability during the
40
high school years; however, this gain is small. Although the gains demonstrated by the
college and graduate students were not as great as the high school students, Perkins
suggests that student gains during the postsecondary years are most likely in their
particular field of expertise and therefore more context-specific than the general topics
used to assess them in this study. Taking these results together, Perkins surmises that
general informal-reasoning skill gains may have a ceiling due to the combination of
general intelligence influences and context-specific expertise as individuals progress
through their education. However, as Perkins does not give any demographic information
regarding the participants, it is difficult to discern if the apparent ceiling could be due to
the prompts used in investigating informal-reasoning ability.
Including a consideration of Perkins (1985), Voss and Means (1991) conducted a
literature review to further identify the important characteristics found in good informal
reasoners through argumentation. They also determined that general ability appears to be
a major factor in informal reasoning. In addition, age/experience does not affect informalreasoning ability, as may be expected. Domain knowledge, as suggested previously by
Perkins, does appear to have some importance when considering problems that require an
informed person. Additionally, Voss and Means summarized the characteristics of good
reasoners in an argumentative context as having a “mental style” that actively analyzed
arguments, could generate different types of arguments, weigh evidence, and reflect on
progress. These characteristics are similar to those displayed by scientists when
evaluating data and theories (D. Kuhn, 1992; Hogan and Maglienti, 2001).
Means and Voss (1996) built on this information, studying informal reasoning
skill and its relationship to grade, ability, and knowledge levels. Students in grades 5, 7,
41
9, and 11 were selected from pre-established groupings of high, average, and low ability
and interviewed individually regarding ill-structured problems, problem solutions, and
problem difficulty. Similar to the scoring by Perkins (1985), each student was assigned a
numerical score for several scales to determine the complexity and clarity with which the
students discussed the problems. Overall, Means and Voss found that high-ability
students were significantly better reasoners than average- and low-ability students, who
most often did not differ between themselves. Although there was some significance
attributed to grade level, it was not to the extent or frequency observed with regard to the
effect of ability. It was postulated that this finding could be considered a general
knowledge effect. When considering the effect of content knowledge, Means and Voss
demonstrated that with regard to justification, sound argumentation was a function of
ability level, not knowledge. However, increased knowledge was positively associated
with the number of reasons and qualifiers generated. Although all the data is not
presented, the authors also indicate that when knowledge was statistically partialled out,
once again, high-ability level became the significant predicting factor with regard to
sound reasoning. These results, taken together, agree with Perkins and Voss and Means
(1991) that general ability is the most significant factor with regard to informal-reasoning
skills. Additionally, although age/grade has some effect, this could possibly be explained
by the subject matter used in the problems.
Means and Voss (1996) synthesized their findings to create a two-component
model for informal reasoning. The first component is a general informal-reasoning skill,
possibly related to one’s learned language structures that allow a person to store, search
for, and evaluate information. The second component is content knowledge, which can
42
include subject and personal knowledge combined in mental representations. These
components work in tandem; higher-ability individuals are able to readily access and
construct models from their content knowledge through their informal-reasoning skills
while lower-ability individuals are able to recall content, but unable to mold it to fit
different situations.
Informal Reasoning and College Students
Research on general improvements. Several studies have been conducted to
determine the development of informal-reasoning skills in undergraduate students
throughout their college experience. These studies sometimes had a focus on “critical
thinking” or “everyday reasoning,” were primarily conducted in regard to social issues,
and used mixed majors populations. Overall, a general trend of improvement was seen
through the four years of college. Going as far back as 1963, Lehmann studied the 1958
entering freshman class of more than 1,000 students at Michigan State University at the
beginning and end of their four years. With some improvement on Lehmann’s method
and using a much smaller population of 47 individuals with a matched control group of
non-college individuals from the same high school, Pascarella (1987) investigated the
development of critical thinking during the first year of college. Both studies determined
that there was a significant improvement in critical-thinking ability in students who
attended college in regard to considering, analyzing, and evaluating multiple pieces of
evidence and arguments as a whole; however, no specific aspects of the college
experience were identified. In addition, Lehmann found the greatest improvement
occurred during the first two years and that students became more flexible in their beliefs,
open-minded, and receptive to new ideas throughout the four-year experience. Although
43
these studies cannot be easily transferable to today’s students, the results indicate that
there is some development of reasoning skills in the undergraduate experience.
In a study which looked at a specific aspect of informal reasoning, “myside” bias,
Toplak and Stanovich (2003) also found improvement over the span of four years of
undergraduate work. This study utilized 112 students from a diverse background of
majors of which only 25% were in a science or engineering program. They were asked to
argue both their own side (“myside”) and another position (“otherside”) to three social
issues comprised of tuition subsidy, organ donation, and gasoline pricing. In each issue,
the number of “myside” arguments was significantly greater than the number of
“otherside” arguments. The bias towards “myside” arguments was significantly greater
for the issue regarding tuition than organ donation and gasoline prices, indicating a
relationship between “myside bias” and more personal issues. Controlling for age and
cognitive ability, Toplak and Stanovich also found that the tendency towards “myside”
bias significantly decreased with each year in post-secondary education. Overall, these
results indicate that length of time as a student at the university has a dampening effect on
“myside” bias, allowing for individuals to give more consideration to multiple sides of an
issue. This progression appeared to have a greater impact on more personal issues, such
as tuition costs and gasoline prices, which may be assumed to reflect more fiscallyminded undergraduate students. This conclusion agrees with Lehmann’s (1963)
quantitative findings where students became more open-minded and increased their
critical thinking throughout their student years at the university.
Interventions. Although the above three studies (Lehmann, 1963; Pascarella,
1987; Toplak & Stanovich, 2003) illustrate a general trend of improvement, there is very
44
little research as to what specifically and naturally influences this improvement in
critical-thinking/informal-reasoning skills during the four years of college. However,
more research can be found as to what types of interventions lead to an increase in
informal-reasoning skills. Research on younger individuals indicated that the explicit
instruction of reasoning models may have a direct effect on informal-reasoning skills
(Jiménez-Aleixandre, Rodríguez, & Duschl, 2000; Osborne, et al., 2004; Zohar, 1996;
Zohar & Nemet, 2002). However, due to maturation, these cannot be taken as a given for
college-age individuals. There are very few studies on informal reasoning and college
students aside from those above, which can describe only general trends. In a critical
literature review for the improvement of informal and Piagetian hypothetico-deductive
reasoning in introductory college biology courses, Daempfle (2002) found only nine
studies published in a research journal when limiting his search to those that included
college students in an introductory biology course, empirical research on instructional
methods, and reasoning as an output variable. Altogether, Daempfle found no outstanding
studies, but did find general characteristics of those studies that demonstrated an increase
in both informal- and Piagetian hypothetico-deductive-reasoning skills. These important
characteristics included a focus on writing, direct teaching of reasoning models, and
length of time on instruction. In addition, most of the studies were conducted in a nontraditional/inquiry/collaborative environment. Therefore, it is difficult to determine if the
characteristics or the environment elicited the positive effects. However, a study by van
Gelder and Bissett (2004) investigated the effect of deliberate practice, which included
modeling and feedback, by undergraduates in an introductory reasoning course. They
found a significant moderate correlation between hours engaged in deliberate practice
45
and an increase in informal-reasoning skills. Although not all confounding variables
could be accounted for due to the voluntary effort of the participants, this study combined
with the previously mentioned studies gives a strong indication for the positive effect of
direct instruction and/or modeling.
Problems with informal reasoning. Although there is a general trend of
improvement throughout the four years of college, there still remains much to be desired
and encouraged in undergraduate informal-reasoning skills. Baron (1991) and Perkins, et
al. (1991) both assert that there are two major areas of difficulty, independent of age,
when it comes to informal reasoning: bias/fairness and incomplete evidence/lack of
“optimal search.” As seen in Toplak and Stanovich (2003), individuals tend to view and
create arguments primarily from a “myside” bias. Those individuals who demonstrate
better informal-reasoning skills are praised for their ability to consider multiple sides to
an argument. In addition, those individuals who consider and seek out multiple pieces of
evidence also, by definition, demonstrate better informal reasoning. Perkins, et al. (1983)
had touched on both these concepts with the proposition that most individuals are poor
reasoners who utilize a “makes-sense epistemology,” not going beyond their own beliefs
or knowledge to discern a problem. This is supported by additional studies that illustrate
a significant increase in the number of sentences and quality with regard to one’s pro-side
and personal interest for an issue (Perkins, 1985; Woll, Navarrete, Sussman, & Marcoux,
1998). Ironically, when Baron investigated what undergraduates value with regard to
quality of thinking, students valued two-sided arguments.
Cerbin (1988) discusses several areas in which he believes undergraduate students
have difficulties in informal reasoning. One area for improvement is in undergraduates’
46
failure to explore the consequences of claims. Cerbin also agreed with Baron (1991) and
Perkins et al. (1991) that undergraduates present underdeveloped arguments with
inadequate use of evidence; however, Cerbin posited that this could partially be due to a
lack of well-organized content knowledge. This belief is similar to Means and Voss’
(1996) two-component model for informal reasoning, which held content knowledge as
important for good argumentation. Both these assertions could also be due to the findings
of research demonstrating the difficulty that undergraduates and other individuals have
with differentiating between evidence and claims, and the consequential poor evaluation
of the connection between the two. This poor evaluation is possibly due to a focus on the
accuracy of the evidence/claims over the plausibility of their linkage (D. Kuhn, 1992;
1993a; 1993b; Shaw, 1996). Once again, ironically, students in Baron’s study highly
regarded correctness of content with consistency between conclusions and arguments.
Reasoning Through Argumentation in Science
Coordination of theory and evidence. Deanna Kuhn (1992) focused on the use of
argument as a way of reasoned, everyday thinking. She theorized that everyday, informal
reasoning mentally took the form of a dialogic argument: recognizing two opposing
assertions, weighing the evidence, and determining the relative merit of each assertion.
Using social issues, such as prisoners returning to a life of crime, failure of students in
school, and causes of unemployment, D. Kuhn investigated the presence and quality of
argumentation in subjects' thinking on each of the problems. Subjects, ranging from
ninth-grade students to individuals in their 20s, 40s, and 60s, were grouped according to
education level and sex. Overall, D. Kuhn found that the characteristics of dialogic
argument were present in the subjects’ reasoning regarding the social issues, however,
47
not to a great extent. The percentage of success ranged from 25% to 60% for each
measure: quality of evidence, generation of alternative theories, creation of
counterarguments, and development of rebuttals. When investigating contributing factors
to reasoning success, only education level demonstrated a significant correlation. The
absence of significant correlations with age, gender, and topic knowledge suggested that
informal reasoning via argumentation constitutes a general set of innate abilities/skills,
agreeing with the work of Perkins (1985) and Voss and Means (1991). It is also
interesting to note that the adolescents were prospectively classified as to education level,
yet still showed no significant difference from the other age groups, implying that this
general reasoning skill may have peaked in late adolescence, also agreeing with Perkins.
D. Kuhn (1992) also found that only 15% of the subjects displayed an evaluative
epistemology that values thinking, evaluation, and argument in knowing, similar to that
epistemology of science theorized by T. Kuhn (1996) and Lakatos (1993). This result
agreed with findings from her other studies that looked specifically at individuals’
scientific thinking (D. Kuhn, 1993a; 1993b). In these studies, lay adults, adolescents, and
children were observed as they created experiments and analyzed data. The results
demonstrated that subjects had difficulty differentiating between evidence and theories,
especially recognizing that the theory was something separate to reflect upon using both
confirming and refuting evidence. D. Kuhn found that in both a social (1992) and
scientific context (1993a; 1993b), the confusion between evidence and theory led subjects
to typically and selectively assimilate new positive evidence into their own theory
without much evaluation. Individuals were even found to adjust their theories, albeit
subconsciously, to fit the new evidence. Without this separation and coordination, it is
48
very difficult to evaluate theories and therefore employ, by definition, an
evaluative/scientific epistemology and approach to problem solving. Together, these
findings provided the link, through dialogic argumentation, between everyday, informal
reasoning and scientific reasoning. In both instances, individuals recognized opposing
assertions, weighed the evidence, and determined the relative merit of each assertion to
come to a conclusion, although not without difficulty.
Taking this farther, Hogan and Maglienti (2001) attempted to understand where
students and scientists currently stand in their abilities to coordinate theory and evidence
and how scientists came to understand the fundamentals of this type of reasoning. They
investigated these differences by comparing the responses of middle school students and
working scientists to questions judging the validity of hypothetical conclusions based on
hypothetical data. By comparing the students, who represented lay people, with the
scientists, Hogan and Maglienti found that lay individuals demonstrate faulty scientific
reasoning by valuing causal claims without appropriate grounds or warrants, attaching
their own personal inferences to the conclusions, and using those inferences to validate
vague claims. Scientists, on the other hand, valued conservative claims based on clear
empirical evidence, appreciated the consideration of alternative explanations, and poorly
rated those conclusions with vague statements, illustrating an evaluative epistemology as
defined by D. Kuhn (1992). In addition, Hogan and Maglienti found that scientists
claimed to have developed their own reasoning skills through observing and modeling
mentors, as well as considering critiques by experts in their field. Overall, this study
elucidated the current dichotomy between novice and expert scientific reasoners as well
as the general path through which one becomes an advanced scientific reasoner.
49
Although this study provides critical information regarding the current state of science
education, it does not offer a readily feasible way in which teachers can aid students in
increasing their higher-order thinking skills and scientific reasoning.
Interventions. Some researchers have taken the information regarding difficulties
in coordinating theory and evidence to determine what types of intervention may help.
Zohar (1996) looked to determine if a learning environment, designed with
constructivism and cognitive conflict in mind, could help increase the number of valid
inferences that eighth- and ninth-grade students made regarding causal evidence. Similar
to D. Kuhn (1993a; 1993b), students were encouraged to design experiments and analyze
data during a four class-period lesson. Students were first allowed to investigate as they
desired, but then given some instruction and cuing regarding variable control. Comparing
pre- and post-tests, Zohar found that the number of valid inferences had increased from
11% to 77% with a decrease in the total number of inferences, indicating a more
systematic and focused approach. The students who scored perfectly on the post-test were
also given a transference problem and delayed post-test. These students also
demonstrated an increase in the number of valid inferences with 87.5% on the transfer
problem and 85% on the retention problem. However, the number of students included in
the transfer/retention portion was small and not representative of the population, there
were no statistical test outcomes given for either results, and the learning environment
was a combination of many different constructivist methodologies. Yet, the results may
be conservatively regarded as positive for the influence of explicit instruction with regard
to reasoning through data and conclusions.
50
Building on this information, Zohar and Nemet (2002) investigated the outcomes
of specific, explicit instruction of general reasoning patterns within science content in
ninth-grade students. Before instruction, students were able to create arguments,
counterarguments, and rebuttals, albeit often consisting of a simple structure with only
one justification. This was similar to findings by Jiménez-Aleixandre, et al. (2000) who
determined that ninth-grade students’ arguments in genetics were dominated by the
identification of claims but not the justifications and warrants to support them. Zohar and
Nemet discovered that those students who received the additional training in
argumentation skills within a unit on genetics displayed increased content knowledge,
were significantly better at applying that knowledge correctly in socioscientific
arguments regarding genetics, and significantly increased the complexity of their
arguments in comparison to the control group. Although it is difficult to determine the
exact influence of the explicit instruction versus the increase in the social aspects of the
experimental classroom, the conclusions were much more sound than Zohar (1996) with
regard to the positive effects of explicit instruction.
Findings from Zohar and Nemet (2002) are complemented by those uncovered by
Osborne, et al. (2004) who also investigated the usefulness of specific intervention on
reasoning skills of eighth-grade students. This study was the second phase of a
professional-development program designing argumentation materials for teachers to use
in their classrooms. Osborne et al. studied teacher-led group discussions of both scientific
and socioscientific issues. They found that students appeared to increase the quality of
their argumentation after a year of intervention as asserted from results that were positive,
but not statistically significant. However, this study also presents two more interesting
51
points. First, both the experimental and the control groups, which were taught by the
same teacher, increased in the quality of their argumentation, suggesting that
improvement may be teacher-specific. Second, although eight lessons were taught as
argumentation in a scientific context, the quality of argumentation was significantly
better in the socioscientific context over the scientific context. This once again possibly
indicates the importance of content-specific knowledge in the creation of strong
arguments. In both of these studies, it is not known whether the instruction or the
classroom environment that encouraged argumentation contributed to the increased
reasoning skills; however these findings tend to suggest a value of explicit instruction of
reasoning skills.
Taken together, these four studies (Jiménez-Aleixandre et al., 2000; Osborne et
al., 2004; Zohar, 1996; Zohar & Nemet, 2002) also elucidate two important points
regarding the ability of students to generate sound scientific reasoning in the form of
arguments: without intervention, students tend to develop simple arguments with few
justifications and students encounter less difficulty in arguing socioscientific issues than
scientific issues. These two findings are corroborated by a critical literature review of
studies linking informal reasoning and socioscientific issues by Sadler (2004). When
regarding the first point, Sadler found that, in general, students tend to display shallow
analyses when considering evidence. This coincides with the findings of previous studies
whereby students value vague, weakly substantiated conclusions, create simple
arguments with single justifications, and tend to lack sufficient warrants and rebuttals. If
students do not consider multiple pieces of data at a reasonable depth, then their
arguments will be weak and simple in turn. Secondly, Sadler points out that personal
52
connections to socioscientific issues tend to demonstrate a higher correlation with higherorder thinking/informal reasoning, similar to the positive findings by both Osborne et al.
(2004) and Zohar and Nemet (2002). This point is also highlighted by the lack of
improvement in students’ reasoning in scientific issues determined by Osborne et al.,
although weakly contradicted by Zohar. Two possibilities for this positive connection
with socioscientific issues are available. First, students may be more knowledgeable and
personally related to issues with a social component, granting them a broader source of
information from which to draw grounds, warrants, and backing. Second, although both
supporting and contradictory evidence is supplied by Sadler’s literature review and
Sadler and Ziedler’s (2005) work demonstrating an increase in content knowledge
associated with a decrease in flawed reasoning, it is possible that students’ lack of
conceptual understanding may impede their ability to generate sound and complex
arguments regarding scientific issues. This may be due to a lack of known grounds,
warrants, and backing.
Summary
Two points need to be made regarding the current state of research on
argumentation and informal-reasoning. First, there are strong studies which link informalreasoning ability to general ability and education (D. Kuhn, 1992; Means & Voss, 1996;
Perkins, 1985; Voss & Means, 1991). For these results, it is important to consider that
there is some selectivity with regard to ability that occurs with those individuals who
pursue higher education. Therefore, those studies linking informal-reasoning ability to
educational level may be informed by those regarding general ability. However, the data
that demonstrate a positive correlation between education and informal-reasoning ability
53
are also supported by those studies illustrating an improvement in critical thinking
throughout four years in college (Lehmann, 1963; Pascarella, 1987; Toplak & Stanovich,
2003). Therefore, it is likely that there is some sort of interaction between ability and
education level to improve informal-reasoning and argumentation skills in informal
settings.
Second, it is difficult to determine the influence of content knowledge on
informal-reasoning ability and argumentation. Most of the studies discussed here,
especially those regarding undergraduates, utilized social or socioscientific issues in
order to determine students’ informal-reasoning abilities. As informal reasoning is often
considered part of the everyday reasoning domain, it is obvious and appropriate to utilize
issues from everyday life. Some of these studies have tried to take into account prior
thought and knowledge about the topics to mixed results (Means & Voss, 1996; Perkins,
1985; Perkins et al., 1983). Means and Voss (1996) have even tried to explain the
confusion with their two-component model of reasoning, separating general informalreasoning skills and content knowledge. Support for this view can also be found in
Perkins and Salomon’s (1989) historical review of the debate as to the contextdependency of cognitive skills. They proposed that there are general cognitive skills that
adapt to domain-specific areas, with some domains more readily adapted to than others
because of use and knowledge base. This view is also supported by those studies that
have demonstrated that the more personal the issue and the more intimate the knowledge,
the better the reasoning (Perkins, 1985; Sadler & Zeidler, 2005; Toplak & Stanovich,
2003; Woll et al., 1998). Overall, the relationship of content knowledge and informal-
54
reasoning ability and argumentation is tenuous, depending on the topic and participants
used.
Taken together, this information could have an impact when expanding this
research to a scientific context for biology majors. One might expect that science majors,
who have chosen to go to college and who should have greater knowledge and interest in
their field of study, may exhibit strong argumentation skills. However, the studies that
looked at reasoning in science problems did not examine college students and
demonstrated difficulties in science-issue argumentation (Hogan & Maglienti, 2001;
Osborne et al., 2004). Therefore, it is problematic to extrapolate results with any
certainty. Altogether, the data from studies in informal reasoning and argumentation
provides very little information with regard to undergraduate science majors’ actual
abilities in argumentation, especially with regard to its use in science. One can only infer
similar difficulties with coordinating theory and evidence and displaying an evaluative
epistemology to be discovered in biology majors, as is found in both lay and younger
individuals.
55
CHAPTER 3
METHODS
Research Design
This study is descriptive in nature. The research design was created to elicit
information regarding students’ SR abilities development through the students’ natural
participation in the first two quarters of an undergraduate introductory biology course
sequence. No particular intervention with regard to SR was given. To this end,
participants were assessed during their regularly scheduled lab section at the beginning of
their first quarter of biology study, the end of the first quarter, and the end of the second
quarter. At the university setting chosen, this corresponded to the autumn 2006 and
winter 2007 quarters. This pattern of assessment was designed to determine the students’
SR skills proficiency at the beginning of their study, at the mid-point of the course
sequence, and at the end of the course sequence.
With this research design, several threats to the internal validity of the study must
be addressed. Foremost, the threat due to repeated testing was addressed by the use of
two parallel forms for each instrument and the length of time between administrations. In
the first administration at Week 1, Form A of each instrument was given; at Week 10,
Form B of each instrument was administered. Form A was then given again at the end of
56
the second quarter, a total of 22 weeks after the first administration. Although the format
of the instrument forms was identical, the items were not. This format, combined with the
long periods of time between testing served to reduce the testing threat. In relation to the
preparation of two forms, the threat due to instrumentation is of concern. To address this,
both forms of each instrument were piloted to ascertain reliability and correlated for
similarity. The argumentation instrument was also face validated, while the hypotheticodeductive reasoning instrument has been repeatedly validated, as reported in the
literature. Lastly, due to the passage of 23 weeks between the data collection events,
threat due to maturation needs to be addressed. This threat is decreased by the relative
maturity of the subjects in the study. Although the study focuses on SR development,
most participants in the study needed to have reached a higher level of maturation to be
enrolled in a biology-majors class at the main campus of the university. In addition, only
participants over the age of 18 were considered.
The larger threats to this research design lay in the threats to external validity.
This is a mostly descriptive study designed to give some indication of a baseline of SR
development. Generalization of the results will not be able to be expanded much beyond
the population used. This is due to the threat of the interaction of testing and the students’
natural development. It is difficult to generalize results to individuals who have not been
pre-tested, especially when it is assumed that testing SR skills directly is not a very
common experience. In addition, the interaction of self-selection with the study must be
considered. The participation in this study was strictly voluntary. This may skew the
results as they specifically relate to those students who were very enthusiastic about
science and were eager to determine their own aptitude with regard to SR skills.
57
However, the large number of participants for the first two administrations and the
diversity of the university may ameliorate this threat.
Participants
The target population for this study was biology majors in the first year of
coursework in the major curriculum. Biology majors include those declaring a major in
biology, biochemistry, entomology, evolution and ecology, microbiology, molecular
genetics, plant cellular and molecular biology, and zoology. The accessible population
used was voluntary students, over the age of 18, enrolled in the introductory biology
course sequence for majors at a large Midwestern university during the autumn quarter
2006 and winter quarter 2007. This accessible population is believed to be appropriate for
this study due to the diverse student population available at one of the largest universities
in the United States. Participants were recruited verbally and in writing during their
lecture class and completed the instruments during their lab section. To maintain the
anonymity of those choosing to participate, all students were asked to complete the
instruments. Data from all participants who volunteered were utilized, including 412
students at the first autumn administration, 344 students at the second autumn
administration, and 59 students at the winter administration. Of the original 460
volunteers, 30 participants completed either the hypothetico-deductive (HD) reasoning or
argumentation portion of all three administrations. To account for participants who did
not participate in all aspects of the study, all statistical tests were selected to exclude
individuals list-wise, i.e. those who did not give complete data for both of the two
dependent variables tested were not included. A summary of all the participant
characteristics is presented in Tables 1 and 2, as well as Figure 5. Overall, the typical
58
participant was a 20-year-old sophomore non-biology major with plans to earn a B.S. and
go to professional school, having taken three previous college-level science courses and
one Advanced Placement (AP) course.
Variable
Biology Majors
Not Biology Majors
141
319
Female
74
179
Male
67
140
M
19.96
20.36
SD
2.01
2.32
Sophomore
Sophomore
n
78
162
M
2.01
2.36
SD
1.15
1.30
M
3.49
3.25
SD
2.26
2.08
M
0.97
0.71
SD
1.30
1.21
Total n
Gender
Age (years)
University Rank
Number of Years as an
Undergraduate
Number of Science
Courses Taken
Number of AP* Courses
Taken
Mode
Note. Age, number of years as undergraduate, number of science courses taken, and number of AP courses
taken are normally distributed.
*AP = Advanced Placement.
Table 1. Demographics of All Participants by Major
59
Variable
Biology Majors
Not Biology Majors
141
319
A.S.
0
1
B.A.
23
50
B.S.
116
239
Graduate
5
14
Job
11
104
Bachelor Degree
2
23
Science Grad School
35
67
Other Grad School
4
29
Professional School
96
129
Total n
Degree Sought
Post-Baccalaureate
Plans
Note. Total n in each variable category does not add up to overall total n as some individuals indicated
multiple degrees sought or several post-baccalaureate plans.
Table 2. Distribution of Future Plans for All Participants by Major
60
160
141
Number of Participants
140
120
100
76
80
51
60
35
40
32
27
23
21
20
21
20
14
12
Bu
sin
es
s
er
th
O
n
ed
uc
at
io
Ed
ec
id
nd
N
at
U
es
ou
rc
es
ci
en
ce
th
O
ur
al
R
g
er
S
ne
er
in
gi
En
H
um
an
i
tie
s
ac
y
ar
m
re
Ph
tu
ic
ul
gr
A
A
lli
ed
M
Bi
ol
ed
ic
og
al
y
0
Major
Figure 5. Distribution of all 460 participants by major.
Outcome Measures
Dependent Variables Data Collection
Hypothetico-deductive reasoning. Student hypothetico-deductive (HD) reasoning
ability was assessed using the Lawson (2000) revised, multiple-choice edition of the
Lawson Classroom Test of Scientific Reasoning (LCTSR). The instrument has wellestablished validity and reliability. For example, Lawson, Banks, and Logvin (2007)
demonstrated a posttest Kuder-Richardson 20 internal-consistency reliability coefficient
of 0.79. The LCTSR is used to assess several aspects related to HD reasoning, such as
conservation of weight and volume, probability, proportionality, correlations, control of
variables, and HD reasoning directly. The LCTSR consists of 12 scenarios followed by
two questions each. There are six pairs of similar scenarios, each pair addressing one of
the aspects related to HD reasoning. With each scenario, the first question focuses on the
61
scenario content specifically, while the second question asks for the reason the first
answer is correct. Each answer for the first question has a corresponding reason in the
second question.
To respond to the testing threat to internal validity, the LCTSR was piloted to
determine the reliability of each half of the split test and administered as equivalent
forms. In addition, it is noted that when used in the literature, the full LCTSR has been
regularly utilized as a pre- and post-test with college students (e.g. Johnson & Lawson,
1998; Lawson, 1980; 1992a). The split forms divided the six pairs of similar scenarios
into two forms each with six scenarios, including twelve questions total. Participants
were given a score based on the total number of items answered correctly, as compared
against a right/wrong answer key. The form order for the autumn administrations was
randomly decided. The first form (A) was given again at the end of the winter quarter.
Form A
Form B
Original LCTSR
Question Pair
Form A
Question Pair
Original LCTSR
Question Pair
Form B
Question Pair
1,2
A1, A2
3, 4
B1, B2
7, 8
A3, A4
5, 6
B3, B4
9, 10
A5, A6
15, 16
B5, B6
11, 12
A7, A8
13, 14
B7, B8
17, 18
A9, A10
19, 20
B9, B10
21, 22
A11, A12
23, 24
B11, B12
Table 3. Original LCTSR Question Distribution to Forms A and B
62
The Cronbach’s alpha test for reliability for the 12 items of Form A was α = 0.53,
p = 0.000 (n = 391), and the 12 items of Form B was α = 0.67, p = 0.000 (n = 318). There
was a moderate correlation between the two forms (r = 0.42, n = 299, p = 0.000).
Although the Cronbach’s alphas were lower than generally accepted for a social science
instrument, it is important to note that Cronbach’s alpha test for reliability assesses the
homogeneity of the instrument for that which it purports to measure. By splitting the
instrument into two halves that were originally a part of one measure, the original
redundancy, which provided substantial interrelatedness of the item responses, decreased
as expected. To confirm this, separate principal components analyses on Form A and
Form B demonstrated six factors corresponding to the six scenarios/pairs of questions
(Tables 4 and 5). The results from these analyses corroborate the low Cronbach’s alpha
results for the LCTSR forms by illustrating some of the lack of homogeneity as inherent
to the instrument and the process of dividing it into two forms.
63
Component
Item
1
2
3
4
5
6
Communality
A5
.96
.94
A6
.96
.94
A9
.91
.84
A10
.90
.83
A1
.88
.79
A2
.88
.78
A7
.83
.69
A8
.83
.69
A3
.76
.64
A4
.85
.73
A11
.78
.63
A12
.80
.65
Eigenvalue
2.25
1.69
1.56
1.36
1.19
1.10
9.15
% Total Variance
18.8
14.1
13.0
11.4
10.0
9.0
76.3
% Trace
24.6
18.5
17.0
14.9
13.0
12.0
100.0
Note. Varimax orthogonal rotation used. n = 391. Loadings less than 0.3 are not printed. Component 1 =
Control of Variables, Component 2 = Probabilistic Thinking, Component 3 = Conservation of Mass,
Component 4 = Advanced Control of Variables, Component 5 = Advanced Proportional Thinking, and
Component 6 = HD Thinking.
Table 4. Principal Components Analysis Rotated Factor Loadings of LCTSR Form A
64
Component
Item
1
2
3
4
5
6
Communality
B1
.97
.96
B2
.95
.95
B3
.90
.86
B4
.88
.86
B9
.85
.78
B10
.89
.80
B5
.72
.61
B6
.80
.69
B7
.64
.54
B8
.75
.62
B11
.42
B12
.50
.52
.86
.79
Eigenvalue
2.97
1.67
1.31
1.07
1.06
0.91
8.99
% Total Variance
24.7
13.9
10.9
8.9
8.8
7.6
74.8
% Trace
33.0
18.6
14.6
11.9
11.8
10.1
100.0
Note. Varimax orthogonal rotation used. n = 318. Loadings less than 0.3 are not printed. Component 1 =
Conservation of Volume, Component 2 = Proportional Thinking, Component 3 = Correlational Thinking,
Component 4 = Probabilistic Thinking, Component 5 = Advanced Control of Variables, and Component 6
= HD Reasoning.
Table 5. Principal Components Analysis Rotated Factor Loadings of LCTSR Form B
65
Argumentation. Student argumentation ability was assessed via a paper and pencil
instrument developed specifically for this study (Appendix A). Multiple forms of the
instrument were piloted to assess equivalence, content validity, and reliability. Experts in
biology and education established face validity. Participants were asked to respond to a
set of questions designed to extract their inductive-reasoning patterns. These patterns
were assessed through the argumentation support of participant-derived conclusions from
a given scenario and data set relative to evolution (Form A) and ecology (Form B).
Students’ abilities to express the five aspects of Toulmin’s argumentation pattern (TAP):
grounds used, claims made, warrants used, and ability to identify and rebut
counterarguments, were examined using open-ended questions. Argumentation quality
was assessed using a rubric with a score ranging from 0 – 2 per each item (Appendix B).
The assessment focus was not on the correct content of each item, but rather the presence
and articulation of each aspect assessed by that item, as well as internal coherence among
answers. Each item was scored independently and then summed to create a composite
score for each student. The reliability of the five-item forms was assessed by Cronbach’s
alpha: for Form A α = 0.68, p = 0.000, and n = 412 and for Form B α = 0.72, p = 0.000,
and n = 344. Both forms also demonstrated two subscales relating to the types of
questions asked when analyzed by a principal components analysis (Table 6).
66
Form A
Component
Item
1
A4
A5
2
Form B
Component
Communality
Item
1
2
Communality
.87
.56
B4
.87
.81
.89
.56
B5
.90
.83
A1
.73
.56
B1
.72
.62
A2
.75
.56
B2
.80
.65
A3
.66
.47
B3
.70
.55
Eigenvalue
2.22
.98
3.20
Eigenvalue
2.38
1.07
3.45
% Total
Variance
44.4
19.7
64.1
% Total
Variance
47.6
21.3
68.9
% Trace
69.4
30.6
100.0
% Trace
69.0
31.0
100.0
Note. Varimax orthogonal rotation used. Loadings less than 0.3 are not printed. n = 412 for Form A and n =
345 for Form B. For each Form A and B, Component 1 = Alternative Explanations and Component 2 =
Initial Argument.
Table 6. Principal Components Analysis Rotated Factor Loadings of Argumentation
Forms A and B
The first argumentation subscale (represented by Component 2), comprising of
items 1, 2, and 3 on each form, represents the participants’ ability to generate and support
an argument. The other subscale (represented by Component 1), comprising of items 4
and 5, represents the participants’ ability to recognize and rebut an alternative
explanation. Each subscale on each form was checked for reliability using Cronbach’s
67
alpha (Table 7). Although the values are not ideal (> 0.7), they are sufficient for use in
this study.
Form A
Form B
Initial
Argument
Alternative
Explanations
Initial
Argument
Alternative
Explanations
n
413
412
346
345
Cronbach’s α
0.54
0.76
0.63
0.80
Table 7. Reliability of Argumentation Forms A and B Subscales
Independent Variables Data Collection
Participants were asked for demographic information to correlate with dependent
variable data. This information was collected during the LCTSR and argumentation
instrument administration at the beginning of the autumn 2006 quarter (AU1) and at the
end of the winter 2007 quarter (WI). Participants were asked for their declared or planned
major, university rank, number of years at the university, post-graduation goals, number
of previous advanced placement (AP) and undergraduate science courses taken, gender,
and age. In the winter quarter, participants were also asked when they enrolled in the first
quarter of the introductory course sequence or its equivalent and the type of institution
where they took the equivalent (Appendix C). Each of these variables was collected to
account for possible influences and intervening variables affecting the dependent
variables.
68
Because the introductory courses at the Midwestern university where the study
was conducted have dual lecture/laboratory components, instructional evidence was
collected in the form of lecture and laboratory syllabi, assignment instructions,
PowerPoint lecture notes, sample exams, and laboratory manuals. These materials were
examined for evidence of HD reasoning and argumentation emphasis, as well as level of
inquiry encouraged in the laboratory portion of the course. Laboratory exercises were
scored on a scale of 1 to 5, based on the five levels of inquiry identified by Bonstetter
(1998), to give an indication of the average level of inquiry involved.
69
CHAPTER 4
RESULTS AND CONCLUSIONS
Characterization of Courses
The two courses in the introductory biology sequence in this study were
comprised of a lecture section and a corresponding lab section. Students attended lecture
twice per week and laboratory twice per week for the first biology course (“Biology 1”),
but attended laboratory once per week for the second biology course (“Biology 2”). The
courses traditionally have had a large enrollment with more than 20 lab sections per
quarter. The enrollment in autumn 2006 for Biology 1 was approximately 600 and
decreased to approximately 400 in winter 2007 for Biology 2. With these class sizes, the
two lecture instructors for Biology 1 and the one instructor for Biology 2 relied on
PowerPoint presentations with a focus on factual information. These presentations were
provided to the students through the online classroom.
Assessments for the students primarily consisted of multiple-choice exams in
lecture and short multiple-choice quizzes for lecture/laboratory information. The
questions on the Biology 1 exams were typically factual-recall with some application. No
exam samples were collected for Biology 2. Any data given on exams was for
calculation-based problems, such as Mendelian genetics. Other assignments consisted of
70
worksheets for laboratory exercises and a New York Times assignment for Biology 1.
This essay assignment required students to investigate any topic in biology based on
science articles found in the New York Times. However, this assignment was more factbased and students were not asked to investigate the socioscientific aspects of their
topics; however, news articles were also discussed on a regular basis in the lectures for
both quarters. Students also completed the laboratory exercises as characterized in Table
8 according to Bonstetter (1998).
Level of Inquiry
Course
Traditional
Level 1
Structured
Level 2
Guided
Level 3
Mean Level
Biology 1 (Autumn)
3
4
1
1.75
Biology 2 (Winter)
7
7
0
1.50
Note. Mean level calculated by multiplying the number of laboratory exercises by the level of inquiry,
summing those values, and dividing by the total number of laboratory exercises.
Table 8. Laboratory Exercises Characterization by Level of Inquiry
Traditional laboratory exercises were more “cookbook” (Level 1) where the
instructor provided everything from the topic to the conclusions. Examples of these labs
were microscope identification of cell organelles in Biology 1 and multiple dissections in
Biology 2. In the structured labs (Level 2), students were given all materials, a step-bystep procedure, and data analysis guidance, but they determined their own conclusions.
Not considering the multiple labs of dissections (three), this was the dominant type of
laboratory exercise employed during this study. In the guided inquiry (Level 3), students
71
were provided with topics, questions, and materials, but allowed to develop their own
procedure and data analysis. The example of this type of laboratory was the
characterization of enzyme kinetics in Biology 1. Evidence of HD reasoning and
argumentation was found in those laboratory exercises classified as a Level 2 or 3. With
Level 2 exercises, students were encouraged to recognize the hypotheses and predictions
and to generate a conclusion based on the evidence they collected. They were often asked
to explain what they “expect[ed]” and “why.” Students’ “reasoning” for answers was
irregularly required. However, one Biology 1 and two Biology 2 Level 2 exercises also
asked students to determine or explain away alternative explanations. The Level 3
exercise in Biology 1 was more intellectually demanding with regard to scientific
reasoning as students were required to explain their experimental designs and address
multiply hypotheses for their questions of interest. This process was socially constructed
within their laboratory groups and as a class, mimicking the process of scientific evidence
sharing.
Overall, the pedagogical design and execution of the Biology 1 and Biology 2
courses was what would be expected as typical of a large university lecture course.
Factual recall was emphasized with laboratories focused primarily on structured inquiry.
This is most likely a common logistical solution for classes with large numbers of
students. However, it is also emphasized in the course objectives given in the Biology 2
syllabus. The courses do encourage students to seek science information outside of
lecture with the New York Times assignments; however, they are still seeking more
factual information on a topic rather than focusing on the way in which that information
72
is gathered and displayed. These pedagogical choices have not been demonstrated to have
significant effects on scientific reasoning, although they are assumed to do so implicitly.
Hypothetico-Deductive Reasoning
Initial Distributions
The initial distribution of the LCTSR scores for each administration was first
investigated to determine the normality of the data and discern any readily distinguished
patterns (Table 9). It was noted that the second autumn administration (AU2) appears
lower than either the first autumn administration (AU1) or the winter administration
(WI). In addition, the AU1 and WI administrations appear similar, while the standard
deviations for both the AU2 and WI administrations are 0.5 to 1 point larger, indicating
more spread of scores with the later administrations. The average scores are
approximately equal to 8. This total score on the LCTSR, adjusted for using a total point
score and the split-half version, corresponds to the low end of scores for formal reasoning
ability (Lawson et al., 2007). In addition, the percentage of individuals who scored in this
range was slightly higher than the previously reported 50% of non-major students in
biology (Johnson & Lawson, 1998; Lawson, 1992a). This finding is encouraging but also
could be expected due to the assumption that students enrolled in a science major course
would have a natural aptitude for science and its related skills. Lastly, because the
introductory biology course sequence is important for other majors, a comparison
between biology majors and the other participants was examined. There appeared to be
no difference between the biology majors and other students in the course.
73
AU1
AU2
WI
Bio
Not Bio
Bio
Not Bio
Bio
Not Bio
n
116
275
84
234
32
27
M
8.29
7.98
7.65
7.82
8.28
8.85
SD
1.81
1.86
2.56
2.37
2.80
2.60
% Scores ≥ 8
69.0
62.8
53.0
56.2
65.6
81.5
Note. All populations for each administration are normally distributed, save the WI administration for nonbiology majors, which is negatively skewed (-1.69). Bio = biology majors, Not Bio = non-biology majors.
AU1 = 1st administration in autumn quarter, AU2 = 2nd administration in autumn quarter, and WI = 3rd
administration in winter quarter.
Table 9. Average Total LCTSR Scores by Administration and Major
Due to the administration of the instruments during the students’ regularly
scheduled lab sections, a repeated measures MANOVA was performed for the two
autumn administrations and a one-way ANOVA was performed on the winter
administration to determine if there was any effect on the LCTSR scores and their change
related to the different laboratory instructors. For the autumn administrations, the
assumptions of equality of variances were met (Table 10). It was found that there was no
significant interaction between lab section and LCTSR test-score change (F21, 277 = 3.80,
p = 0.098). There were, however, significant differences found among the 22 autumn lab
sections (F21, 277 = 1.78, p = 0.021, effect size = 0.12). With a Bonferroni adjustment for
the repeated post-hoc tests, section 3 (overall M = 6.05) was found to differ from both
section 12 (overall M = 9.25, p = 0.005) and section 14 (overall M = 8.71, p = 0.026).
74
AU1
AU2
Autumn Lab Section
M
SD
n
M
SD
n
1
7.95
1.67
20
7.10
2.25
20
2
8.17
1.34
12
8.25
1.49
12
3
7.30
1.42
10
4.80
2.20
10
4
8.29
2.63
7
6.86
2.55
7
5
7.93
1.64
14
7.71
2.30
14
6
7.94
1.57
16
7.44
1.93
16
7
8.60
2.70
5
7.80
1.79
5
8
8.50
1.65
10
6.10
2.77
10
9
8.40
1.35
10
7.70
2.75
10
10
7.50
1.60
8
6.63
1.77
8
11
7.88
2.13
16
7.19
2.37
16
12
9.58
1.00
12
8.25
1.49
12
13
7.82
1.89
11
7.36
2.06
11
14
8.84
1.86
19
8.21
2.15
19
15
8.18
1.47
17
6.65
1.73
17
16
8.38
2.14
13
8.00
2.20
13
17
8.17
1.76
18
7.83
2.36
18
Continued
Table 10. Average AU1 and AU2 LCTSR Scores by Lab Section
75
Table 10 continued
AU1
AU2
Autumn Lab Section
M
SD
n
M
SD
n
18
8.20
1.85
20
8.00
2.13
20
19
8.20
1.64
5
6.80
2.28
5
20
7.63
1.86
16
6.25
2.57
16
21
8.58
1.61
19
7.68
2.47
19
22
7.53
1.84
19
7.68
2.21
19
For the winter quarter lab sections, a one-way ANOVA was performed. However,
the assumption of equality of variances did not hold true and 5 of 12 sections only had
one or two participants, not permitting the calculation of post-hoc tests to identify
individual significant differences (Table 11). Therefore, even though the ANOVA
demonstrated a significant difference due to section (F12, 46 = 2.89, p = 0.006), no
practical information could be gleaned to describe this influence. Due to the only
statistical differences being found in the autumn data between the lowest and highest
average scores and a lack of interaction between score change and lab section, there
appears to be no discernable possibility of different laboratory instructors and sections
being an intervening variable.
76
Winter Lab Section
M
SD
n
23
8.71
3.15
7
24
11.00
0.00
2
25
8.00
-
1
26
8.88
1.36
8
27
8.50
2.12
2
28
11.50
0.72
2
29
7.80
2.17
5
30
5.43
3.87
7
31
8.00
2.10
6
32
9.67
1.63
6
33
9.83
1.47
6
34
10.4
1.52
5
35
4.00
1.41
2
Table 11. Average WI LCTSR Scores by Lab Section
Relationship of Background Characteristics and LCTSR Autumn Quarter Scores
An exploratory multiple regression analysis was undertaken to determine if any
demographic characteristics or set of characteristics could be determined to be an
influencing factor on LCTSR scores. Due to the exploratory nature of the analysis, a
multiple regression was performed on both the AU1 LCTSR scores and the AU2 LCTSR
77
scores to identify any recurring characteristics and provide cross-validation. The response
for both instruments was sufficiently repetitive and there was a lack of an apparent
significant score increase to warrant the use of the second administration as a crossvalidation sample. To keep the ratio of variables to number of individuals low, several
stepwise multiple regression analyses were conducted. In addition, due to the repetitive
nature of the AU1 and AU2 LCTSR assessments, the AU1 scores were entered by force
before the stepwise entry of the other independent variables when the AU2 scores were
regressed on them. This accounted for any variance due to a test-retest factor.
The LCTSR scores for each autumn administration were regressed on the
demographic variables including age, gender, years as an undergraduate, number of
Advanced Placement courses, and number of college science courses as a set, as well as
university rank separately, to determine if maturation and previous experience influenced
LCTSR scores. These were chosen based on the relationship of the LCTSR to Piagetian
theory, which holds that physical maturation and experience are the factors that influence
HD reasoning skills (Piaget, 1972). No consistent predictors were found in either
analysis. The LCTSR scores were also regressed on the future plans variables, including
degree sought and post-baccalaureate plans, to determine if future interests may influence
HD reasoning. These variables were thought to be possible intervening variables due to
the expectation that individuals who sought more science-related degrees and careers
would have a greater interest in sharpening their HD skills for their future plans utilizing
alternative methods, for example, admission test preparation and independent research. It
was thought that this type of preparation may therefore be considered as part of HD
78
reasoning experience. These variables also did not demonstrate any consistent
relationship with LCTSR scores.
When considering experience as a factor, a final analysis regressed LCTSR scores
on participants’ choice of major. In both analyses of the AU1 and AU2 data, it was found
that a non-biology science major, such as chemistry or physics, demonstrated a positive
increase of approximately one point on LCTSR scores, as shown in Tables 12 and 13
(only the regression of AU2 LCTSR scores are shown for brevity). Although the choice
of a non-biology science major is found in both AU1 and AU2 analyses and is considered
as cross-validated, other majors were also found to have an effect on LCTSR scores. The
analysis of the AU1 data also found a negative effect on LCTSR scores due to choice of
an allied health or agriculture major, while the AU2 data also demonstrated a positive
effect due to declaring a business major. However, these three types of major have very
little in common and can’t be considered as cross-validation for each other.
Intercorrelation
Variables
AU1 LCTSR Scores (X1)
Other Science Major (X2)
X1
X2
X3
Y
M
SD
1.00
0.15
0.03
0.42
8.14
1.76
1.00
-0.04
0.17
0.06
0.23
1.00
0.11
0.02
0.15
1.00
7.77
2.45
Business Major (X3)
AU2 LCTSR Score (Y)
Note. n = 299. For the Other Science Major, 1 = Declared Major and 0 = Not Declared Major. Only Other
Science Major is consistent when both the AU1 and AU2 LCTSR Scores are regressed on College Major.
Table 12. Summary Data of Regression of AU2 LCTSR Scores on College Major
79
Step
R2
R2
Change
AU1 LCTSR Scores
1
0.18
0.18
0.56
0.40
7.58
0.000*
Other Science Major
2
0.19
0.01
1.23
0.12
2.22
0.028*
Business Major
3
0.20
0.01
1.75
0.11
2.07
0.039*
Variables
(Constant)
Note.
B
Std.
β
t
p
3.14
n = 299
* Significant at α = 0.05.
Standard error for predicted AU2 LCTSR scores from full model= 2.21
Adjusted R2 for full model = 0.19
For model: F 3, 295 = 24.47, p = 0.000*
Table 13. Stepwise Entry Regression of AU2 LCTSR Scores on College Major
In each of the statistical analyses, the residuals were checked and no violations of
the assumptions for multiple regression were found. The stepwise regression model fits
well and the combination of the three variables accounted for a total of 19.9% of the
variance in the LCTSR scores, with the AU1 LCTSR scores contributing the majority
(17.5%) and the other science major variable contributing an additional 1.2%. Although
these results are cross-validated by a multiple regression analysis on the AU1 LCTSR
scores, the small amount of contribution to the variance, the weak intercorrelations, and
the large size of the sample limits the practical nature of the results regarding college
major.
Change in LCTSR Scores
To examine any significant developmental change in HD reasoning, LCTSR
scores were analyzed with a repeated measures MANOVA, also considering the possible
differences between biology majors and all other participants for comparison. To best
80
discern any developmental differences, two time periods were investigated. The first time
period included the first quarter of the introductory biology sequence while the second
time period encompassed the entire two-quarter sequence. Participants lacking data for
either administration were eliminated by the list-wise option for the analysis. This
elimination provided a total of 27 individuals who completed all three administrations
(Table 14). Therefore, two separate analyses were run to better take advantage of the
greater number of participants in the autumn quarter. These separate analyses also
allowed for an investigation into the influence due to time in the introductory biology
sequence (one quarter versus two) and to better compare overall development by
comparing the same form at the AU1 and WI administrations. In each analysis, the
assumption for homogeneity of variance was met.
LCTSR
AU1
AU2
WI
AU1 &
AU2
AU1 &
WI
ALL
391
318
59
299
30
27
Table 14. Total Number of Individuals who Completed the LCTSR Instrument by
Administration
Change in overall total scores. Over the course of the first quarter, there is a
significant decrease from the beginning of the quarter to the end of the quarter but no
difference between biology majors and non-biology majors (Tables 15 and 16). Primarily
due to the difference between LCTSR AU1 and AU2 scores, there first appears to be a
significant interaction effect between LCTSR score and major; however, this interaction
81
only explains 1.4% of the variance and is therefore not of very practical interest. When a
Bonferroni correction is applied for the two separate MANOVAs, reducing α to 0.025,
this interaction only approaches significance; but a statistically significant difference in
the LCTSR scores still remains.
AU1
AU2
Mean Difference
M
SD
n
Biology
8.49
1.73
79
Not Biology
8.01
1.76
220
Biology
7.66
2.59
79
Not Biology
7.81
2.40
220
Biology – Not Biology
0.16
AU2 – AU1 Overall
-0.52
Table 15. Descriptive Statistics of AU1 and AU2 LCTSR Scores by Major
Variables Tested
F
df
p
Effect Size
AU1 vs. AU2 LCTSR Scores
11.47
1, 297
0.001a
0.04
Biology majors vs. Not Biology Majors
0.48
1, 297
0.489
0.00
Interaction between LCTSR Scores and Major
4.32
1, 297
0.039b
0.01
Note. Wilks’ lambda was utilized for each the LCTSR scores and interaction comparisons’ F tests.
a
Statistically significant difference at α = 0.025. aStatistically significant difference at α = 0.05, but not at
the Bonferroni-corrected α = 0.025.
Table 16. Repeated Measures MANOVA Comparison of AU1 and AU2 LCTSR Scores
82
As it is highly unlikely that students lost reasoning skills over the course of the
quarter, some other factors must be considered to explain this drop in scores. One
possibility is that students, after encountering the work load of their first college-level
biology course, began to doubt their ability to do well in a college-level science major
course. This may have led to a reduction in self-efficacy with regard to their reasoning
ability and they therefore scored poorly. The plausibility of this explanation is difficult to
determine, as no data was available for the students’ self-efficacy. However, Lawson et
al. (2007) found that SR ability positively influenced self-efficacy, but not vice versa,
therefore reducing the probability of self-efficacy’s negative effect in this instance. Also,
the finding that the participants completed an average of more than three college-level
science courses weakens this argument, as the LCTSR is not solely biology-specific. The
more likely explanation is two-fold. First, Forms A and B were not as equivalent as
previously thought. The equivalency of the forms was difficult to determine, given the
nature of the instrument and the degree of independence of items when split. However,
the Pearson correlation between the two forms, as used in this study, was moderate to
substantial (rBiology = 0.54, p = 0.000, n = 79; rNot Biology = 0.38, p = 0.000, n = 220).
Second, the AU2 administration was given during the lab section in the last week of the
quarter before the holiday break. Upon entry of the data, an increase in random patterns
and lack of sincerity in answers was noted more keenly for this administration. It is likely
that the students opted to complete the instruments as quickly as possible without much
effort or interest. A comparison of the AU1 and WI LCTSR scores was used to further
illustrate these possibilities (Tables 17 and 18).
83
AU1
WI
Mean Difference
M
SD
n
Biology
9.00
1.88
14
Not Biology
8.69
1.40
16
Biology
8.21
3.04
14
Not Biology
9.19
2.56
16
Biology – Not Biology
0.33
WI – AU1 Overall
-0.14
Table 17. Descriptive Statistics of AU1 and WI LCTSR Scores by Major
Variables Tested
F
df
p
Effect Size
AU1 vs. WI LCTSR Scores
0.12
1, 28
0.735
0.00
Biology majors vs. Not Biology Majors
0.21
1, 28
0.653
0.00
Interaction between LCTSR Scores and Major
2.36
1, 28
0.136
0.08
Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. No
statistically significant difference at the Bonferroni-corrected α = 0.025.
Table 18. Repeated Measures MANOVA Comparison of AU1 and WI LCTSR Scores
There were no significant differences found between the AU1 and WI LCTSR
scores or between biology and non-biology majors. Also, no interactive effect was found
between LCTSR scores and major. The lack of difference between the pre- (AU1) and
post-test (WI) administrations of the same form (A), even when the mid-test (Form B)
demonstrated a significant decrease, lends support to the explanation that the AU2 scores
84
drop were most likely due to non-measured intervening variables (such as attitude or selfefficacy). However, it must be acknowledged that, if self-efficacy in biology is a factor, it
is possible that another quarter may have reinstated previous belief in one’s self and
abilities. Another factor lends credence to the explanation of the AU2 scores. The
administration of the instrument in the winter quarter was completed optionally at the end
of a laboratory meeting in which the students completed their final practical exam.
Therefore, the few students who completed the instrument during this administration
were more likely to be committed to the research process and answer to the best of their
ability. These results are then likely to be indicative of the apparent lack of HD reasoning
skills development during the two-quarter introductory biology sequence. The lack of
development even in this group points to the need for specific attention to improve these
skills.
Change in LCTSR item scores. With the finding that the AU2 LCTSR scores were
significantly lower than the AU1 scores, an investigation into differences among actual
item scores was undertaken to identify any particular types of scenarios that led to the
lower scores. For the AU1 and AU2 LCTSR administrations, a repeated measures
MANOVA was completed on the total scores for each scenario/pair of questions.
Participants were given a 1 for each correct answer, with the possible item-pair total
scores of 0, 1, or 2. The assumption of equality of variances was not met for the AU1
LCTSR item-pair scores; however, the sample sizes were equal and all multivariate test
statistics demonstrated the same F, p, and effect size values. The assumption of equality
of variances was met for the AU2 LCTSR item-pair scores. Overall, no distinct patterns
85
were found, as nearly all items in each administration were significantly different from
each other (Tables 19 – 22, Figures 6 and 7).
2.2
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
A1, A2
A3, A4
A5, A6
A7, A8
AU1 LCTSR Item Pair
A9, A10
A11, A12
Figure 6. Mean AU1 LCTSR item-pair scores by major.
Variables Tested
F
df
p
Effect Size
AU1 LCTSR Item-Pair Scores
319.73
5, 385
0.000*
0.81
Biology majors vs. Not Biology Majors
6327.62
1, 389
0.124
0.01
Interaction between LCTSR Item-Pair
Scores and Major
1.58
5, 385
0.164
0.02
Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n =
116, Not Biology n = 275, and Total n = 391. *Experiment-wise statistically significant difference at α =
0.05.
Table 19. Repeated Measures MANOVA Comparison of AU1 LCTSR Item-Pair Scores
86
Item Pair
A3, A4
A5, A6
A7, A8
A9, A10
A11, A12
A1, A2
0.000*
0.000*
0.000*
0.020*
0.000*
0.000*
0.000*
0.000*
0.000*
0.000*
0.057
0.000*
0.000*
0.000*
A3, A4
A5, A6
A7, A8
A9, A10
0.000*
Note. Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α =
0.05.
Table 20. P-values from MANOVA Post-hoc Comparison of AU1 LCTSR Item-Pair
Scores
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
B1, B2
B3, B4
B5, B6
B7, B8
AU2 LCTSR Item Pair
B9, B10
Figure 7. Mean AU2 LCTSR item-pair scores by major.
87
B11, B12
Variables Tested
F
df
p
Effect Size
AU2 LCTSR Item-Pair Scores
313.21
5, 312
0.000*
0.68
Biology majors vs. Not Biology Majors
2520.61
1, 316
0.582
0.00
Interaction between LCTSR Item-Pair
Scores and Major
1.51
5, 312
0.186
0.02
Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n =
84, Not Biology n = 234, and Total n = 318. *Experiment-wise statistically significant difference at α =
0.05.
Table 21. Repeated Measures MANOVA Comparison of AU2 LCTSR Item-Pair Scores
Item Pair
B3, B4
B5, B6
B7, B8
B9, B10
B11, B12
B1, B2
0.446
0.000*
0.000*
1.000
0.000*
0.000*
0.000*
0.045*
0.000*
0.000*
0.000*
0.000*
0.000*
0.000*
B3, B4
B5, B6
B7, B8
B9, B10
0.000*
Note. Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α =
0.05.
Table 22. P-values from MANOVA Post-hoc Comparison of AU2 LCTSR Item-Pair
Scores
As can be seen in both Figures 6 and 7, Item pairs 7, 8 and 11, 12 demonstrate the
lowest mean scores. On both forms, Items 7 and 8 are related to an experimental scenario
regarding the preference of fruit flies for two different variables. These problems are
designed to address the identification and control of variables. In addition, Items 11 and
88
12 on each form are created to directly assess HD thinking and reasoning. It is possible
that the scores on Items 7 and 8 are lower due to difficulty reading the numerical values
on the figure associated with the scenario. Several individuals indicated this issue on their
instrument. However, if studied closely, especially on Form B, the actual numerical
values are not critical to understanding of the problem. With regard to Items 11 and 12,
Form B’s scenario focused on the consequence of placing red blood cells in a hypertonic
solution and its cause. The low scores on these items are somewhat surprising as the
participants completed this experiment earlier in the quarter. These results give some
indication as to the particular trouble the participants have with regard to HD reasoning.
They do appear to be competent in basic conservation of material, proportional and
probabilistic thinking. However, in both the item pairs highlighted as particularly
difficult, the skills investigated are directly related to experimental design, control, and
understanding – a critical skill for science majors.
Argumentation
Initial Distributions
The initial distribution of the argumentation scores for each administration was
first investigated to determine the normality of the data and discern any readily
distinguished patterns (Table 20). As seen in the LCTSR scores, the AU2 administration
scores appear lower than either the AU1 or WI scores, with a noticeable decrease in the
WI scores of biology majors compared to non-biology majors. However, the standard
deviations of all three administrations appear to be constant, although a little large,
indicating a wide range of scores. The mean score was approximately between 5 and 6.
This finding is interesting as participants would be expected to do better on the first three
89
questions of the instrument relating to the initial argument. Based on the current
literature, individuals are more adept at creating an initial argument and have difficulties
recognizing, creating, and addressing alternative explanations (D. Kuhn, 1992; Toplak &
Stanovich, 2003; Zohar & Nemet, 2002). As the items are scored from 0 to 2, a strong
initial argument would yield a total score of 6 and is similar to the current mean.
Unfortunately, as this instrument was created for this particular study, there was no
comparison to determine the exact meaning of an average score in this range. It is also
important to note again, however, that when scoring the instruments, if an individual
earnestly attempted any of the items, those blanks remaining were given a 0 instead of a
“no response.” This scoring method could also be skewing the scores lower, especially
for the alternative explanation items, 4 and 5.
AU1
AU2
WI
Bio
Not Bio
Bio
Not Bio
Bio
Not Bio
n
119
293
91
253
30
29
M
6.17
6.08
5.42
5.40
5.17
6.00
SD
2.43
2.41
2.29
2.53
2.23
2.70
Note. All populations for each administration are normally distributed. AU1 = 1st administration in autumn
quarter, AU2 = 2nd administration in autumn quarter, and WI = 3rd administration in winter quarter.
Table 23. Average Total Argumentation Scores by Administration and Major
As with the LCTSR scores, a repeated measures MANOVA was performed for
the two autumn administrations and a one-way ANOVA was performed on the winter
90
administration to determine if there was any effect due to different lab sections. For the
autumn administrations, the assumptions of equality of variances were met (Table 24). It
was found that there was a significant interaction between lab section and argumentation
score change (F21, 318 = 1.92, p = 0.010,) with an effect size accounting for 11.3% of the
variance in scores. There were also significant differences found among the autumn lab
sections (F21, 318 = 2.32, p = 0.001) with an effect size accounting for 13.3% of the
variance. However, with a Bonferroni adjustment for the post-hoc tests, only section 10
with the lowest mean score (overall M = 4.63), was found to differ from section 12,
which had the highest mean score (overall M = 7.29, p = 0.034). This may be of some
concern because section 12 was also one of the highest scoring sections for the LCTSR.
AU1
AU2
Autumn Lab Section
M
SD
n
M
SD
n
1
7.00
2.32
21
6.43
2.25
21
2
5.07
2.49
15
4.53
3.18
15
3
6.62
2.10
13
3.38
2.79
13
4
5.82
3.19
11
5.73
2.24
11
5
6.47
2.40
17
6.35
1.94
17
6
6.25
2.02
16
6.06
2.74
16
Continued
Table 24. Average AU1 and AU2 Argumentation Scores by Lab Section
91
Table 24 continued
AU1
AU2
Autumn Lab Section
M
SD
n
M
SD
n
7
6.57
2.23
7
6.29
2.29
7
8
5.57
1.70
14
4.86
2.38
14
9
6.42
2.50
12
6.50
2.24
12
10
5.13
2.16
16
4.13
2.42
16
11
6.53
2.64
15
5.40
2.64
15
12
7.33
2.06
12
7.25
1.55
12
13
4.10
2.18
10
6.00
2.67
10
14
5.75
2.05
20
5.95
1.85
20
15
7.16
2.12
19
6.00
2.19
19
16
7.21
2.86
14
4.64
2.79
14
17
6.30
2.11
20
5.20
2.42
20
18
5.36
2.48
22
5.86
2.12
22
19
4.90
3.38
10
4.20
1.32
10
20
5.78
2.13
18
4.89
2.52
18
21
7.59
2.06
17
4.76
2.17
17
22
5.57
2.38
21
4.67
2.75
21
For the winter quarter lab sections, a one-way ANOVA was performed, meeting
the assumption of equality of variances (Table 25). There were no significant differences
92
found among the sections (F11, 58 = 1.43, p = 0.193). Overall, due to the only statistical
differences being found in the autumn data between the lowest and highest average
scores, there once again appears to be very little practical concern with regard to the
possibility of different laboratory instructors and sections being an intervening variable.
Winter Lab Section
M
SD
n
23
6.33
2.25
6
24
8.50
0.71
2
26
5.78
2.05
9
27
5.50
0.71
2
28
6.00
1.41
2
29
3.80
2.59
5
30
3.67
3.50
6
31
4.80
0.45
5
32
6.14
2.12
7
33
5.63
2.78
8
34
7.60
2.70
5
35
4.00
2.83
2
Note. Lab section 25 did not have any participants complete the argumentation instrument.
Table 25. Average WI Argumentation Scores by Lab Section
93
Relationship of Background Characteristics and Argumentation Autumn Quarter Scores
An exploratory multiple regression analysis was undertaken to determine if any
demographic characteristics could be determined to have an influencing factor on
argumentation scores. Due to the exploratory nature of the analysis, a multiple regression
was performed on both the AU1 and the AU2 argumentation scores to identify any
recurring characteristics and provide cross-validation. Similar to the LCTSR, the
response for both instruments was sufficiently repetitive and there was a lack of a
significant score increase to warrant the use of the second administration as a crossvalidation sample. To keep the ratio of variables to number of individuals low, several
stepwise multiple regression analyses were conducted. In addition, due to the repetitive
nature of the AU1 and AU2 LCTSR assessments, the AU1 scores were entered by force
before the stepwise entry of the other independent variables when regressing the AU2
scores on them. This accounted for any variance due to a test-retest factor.
As with the LCTSR scores, the argumentation total scores for each autumn
administration were regressed on the demographic variables group including age, gender,
years as undergraduate, number of Advanced Placement courses, and number of college
science courses, as well as university rank separately, to determine if maturation and
previous experience influenced the scores. Even though these were originally chosen
based on the relationship of the LCTSR to Piagetian theory, it was believed that
experience, especially the number of college science courses, may be a positive
influencing factor on argumentation scores. No consistent factors were found in either
autumn analysis. Multiple regression of the argumentation scores on the future plans
variables, including degree sought and post-baccalaureate plans, or choice of major also
94
did not demonstrate any consistent influence. Overall, no consistent factors were found
between the two administrations or the results found with the LCTSR scores.
Change in Argumentation Scores
To examine any significant developmental change in argumentation skills,
argumentation scores were analyzed with a repeated measures MANOVA. Once again,
the possible differences between biology majors and all other participants were
considered for comparison. The two time periods between administrations, AU1 to AU2
and AU1 to WI, were examined to determine any effects due to time and form. This was
also important because participants lacking data for either administration were eliminated
using the list-wise option in the analysis. This elimination provided for a total of only 29
individuals who completed all three instruments (Table 26). Therefore, two separate
analyses were completed to better take advantage of the greater number of participants in
the autumn quarter. In all cases, the assumption for homogeneity of variance was met.
Argumentation
AU1
AU2
WI
AU1 &
AU2
412
344
59
340
AU1 &
WI
ALL
31
29
Table 26. Total Number of Individuals Who Completed the Argumentation Instrument by
Administration
Change in overall total scores. Over the course of the first quarter, there was a
significant decrease in argumentation scores from the beginning to the end of the quarter
with the effect size explaining only 0.5% of the variance in the scores. There was no
95
difference between biology majors and non-biology majors or interactive effect due to
major (Tables 27 and 28). However, the decrease in the argumentation scores is troubling
yet again as it is highly unlikely that students lost reasoning skills over the course of the
quarter. This repetition of effect in the argumentation scores as in the LCTSR scores
increases the likelihood that an intervening factor was at work. Once again, the
possibility of a self-efficacy decrease must be considered although there are no additional
reasons to assume this is a factor for the argumentation scores when it is unlikely to be a
factor influencing the LCTSR scores. As no factors were found with the regression, this
leaves the format of the administration and the different forms of the instrument as the
most likely influences.
AU1
AU2
Mean Difference
M
SD
n
Biology
6.39
2.46
89
Not Biology
6.05
2.40
251
Biology
5.43
2.29
89
Not Biology
5.40
2.54
251
Biology – Not Biology
0.18
AU2 – AU1 Overall
-0.81
Table 27. Descriptive Statistics of AU1 and AU2 Argumentation Scores by Major
96
Variables Tested
F
df
p
Effect Size
AU1 vs. AU2 Argumentation Scores
17.93
1, 338
0.000*
0.05
Biology majors vs. Not Biology Majors
0.62
1, 338
0.433
0.00
Interaction between Argumentation Scores
0.69
1, 338
0.407
0.00
and Major
Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests.
*Statistically significant difference at the Bonferroni-corrected α = 0.025.
Table 28. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation
Scores
The possibility of Form A and Form B not being equivalent is the stronger
possible intervening variable in this instance. The two forms each contained a scenario
and data table from two different topics in biology: Form A focused on evolution and
Form B focused on ecology. Even though the scenarios and corresponding data sets were
designed to be more non-specific in nature, this could have had an effect. When
correlating the two forms, this possibility becomes a concern. The correlations between
AU1 and AU2 argumentation scores by major were not significant for biology majors
(rBiology = 0.20, p = 0.055, n = 89) and only weakly associated for non-biology majors
(rNot Biology = 0.20, p = 0.002, n = 251). However, if the content of the forms was a true
problem, a difference in scores would be expected to be found between biology majors
and non-biology majors, which was not the case. Another likely cause of the drop in the
scores is, once again, the desire of the participants to put less effort into the instruments at
the end of the quarter. Both these possibilities are supported by the comparison of the
97
AU1 and WI argumentation scores where the significant decrease is no longer present
when comparing the same form (Tables 29 and 30).
AU1
WI
Mean Difference
M
SD
n
Biology
6.57
2.14
14
Not Biology
6.82
2.70
17
Biology
5.57
2.85
14
Not Biology
6.53
2.85
17
Biology – Not Biology
-0.61
WI – AU1 Overall
-0.65
Table 29. Descriptive Statistics of AU1 and WI Argumentation Scores by Major
Variables Tested
F
df
p
Effect Size
AU1 vs. WI Argumentation Scores
2.67
1, 29
0.113
0.08
Biology majors vs. Not Biology Majors
0.51
1, 29
0.480
0.02
Interaction between Argumentation Scores
and Major
0.59
1, 29
0.380
0.03
Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests,
Bondferroni-corrected α = 0.025.
Table 30. Repeated Measures MANOVA Comparison of AU1 and WI Argumentation
Scores
98
There was no significant difference found between the AU1 and WI
argumentation scores or between biology and non-biology majors. Also, no interactive
effect was found between argumentation scores and major. The lack of difference
between the pre- (AU1) and post-test (WI) administrations of the same form (A), even
when the mid-test (Form B) was a significant decrease, lends support to the explanation
that the drop in the AU2 scores was most likely due to non-equivalent forms or a nonmeasured intervening variable. Also, the few individuals who completed the WI
administration were probably more dedicated to providing valid data. Their lack of score
change over two complete quarters lends credence to the likelihood that the second
administration score decrease was influenced by an outside variable. Once again though,
it must be acknowledged that, if self-efficacy in biology is a factor, it is possible that
another quarter may have reinstated previous belief in one’s self and abilities. Regardless,
the overall lack of improvement in argumentation scores through two quarters of study
and the lack of any specific attention to argumentation implies the need for directed
intervention to improve these skills.
Change in argumentation subscale scores. The argumentation subscale scores
were examined to further clarify the composition of the participants’ argumentation
scores and difficulties encountered. Overall, with a visual inspection, it appears that
participants in this study are similar to those reported in the literature. Subscale scores for
alternative explanations seem lower than those for argument generated in each
administration (Figure 8). A repeated measures MANOVA was utilized to determine if
the ability to generate an argument or the ability to identify and rebut alternative
explanations were significantly different or changed over the course of the two quarters.
99
As with the other analyses, scores were compared in two time periods: AU1 to AU2 and
AU1 to WI. In each analysis, the assumption of equality of variances was met.
2.2
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
AU1
Argument
AU1 Alter.
AU2
AU2 Alter. WI Argument
Explan.
Argument
Explan.
Argumentation Subscale by Administration
WI Alter.
Explan.
Figure 8. Mean AU1, AU2, and WI argumentation subscale scores by major.
The results of the repeated measures MANOVA indicate that there were
significantly lower average alternative explanation subscale scores than argument scores
for each the AU1 and AU2 administrations (Table 31). These score differences do not
interact with choice of major. This result was expected, based on the current literature
(i.e. D. Kuhn, 1992; Toplak & Stanovich, 2003; Zohar & Nemet, 2002). Students were
better able to identify and support one conclusion from the given data than identify any
other conflicting counterarguments. Difficulties in the argument subscale primarily rested
on conclusions that were too broad and the lack of specific data identification (i.e. the
individual identified “the data table” as the source of support). One possible
consideration for the difference between the subscales is that some participants seemed
100
unable to consider another person’s point of view with regard to the data set. Item 4 on
Form B asked, “A graduate student in your lab has an alternative conclusion with regard
to the data. What does she believe and why?” Several individuals replied on each
administration as individual 13-3 did for AU2, “I still don’t know how I am suppost (sic)
to know what someone else thinks.” Considering this a serious answer, it is interesting to
note the strong bias the participant has for his or her own point of view. Item 5 asked the
participants how they would counter the viewpoint offered in Item 4. Many individuals
recognized the need for strong empirical evidence to support an argument by often
answering that “more research needs to be done,” instead of using the data given. As it is
difficult to determine if this was a serious answer or simply a quick way to finish the
instrument, this was not the intended purpose of the item and such answers were scored a
0. The combination of these two types of answers may have influenced this decrease in
alternative explanation subscale scores. However, as the earnestness of such answers was
unable to be determined, it is assumed that participants had difficulty recognizing and
countering alternative conclusions with the information provided.
Another finding from the MANOVA is that the initial decrease in total
argumentation score from AU1 to AU2 is due to a drop in each subscale score. This helps
characterize the drop in scores from AU1 to AU2 as likely due to an overarching
intervening variable that tended to affect all participants equally. This intervening
variable could be any of those previously described, such as a decrease in self-efficacy,
the different content area used in the two different forms, or lack of effort.
101
F
df
p
Effect
Size
AU1 Argument vs. AU1 Alternative Explanationa
18.42
1, 338
0.000*
0.05
AU2 Argument vs. AU2 Alternative Explanationa
151.03
1, 338
0.000*
0.31
Interaction among AU1, AU2, and Majora
0.01
1, 338
0.929
0.00
AU1 Argument vs. AU2 Argumentb
9.63
1, 338
0.002*
0.03
AU1 Alternative Explanation vs. AU2 Alternativeb
Explanation
14.71
1, 338
0.000*
0.04
Interaction of Argumentation Subscales and Majora
0.35
2, 337
0.706
0.00
Variables Tested
Note. aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bIntersubscale comparisons were completed as univariate tests with a Bonferroni correction for alpha.
*Statistically significant difference at α = 0.05. Biology n = 89, Not Biology n = 251, and Total n = 340.
Table 31. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation
Subscale Scores by Major
When comparing the AU1 and WI administrations, a slightly different pattern of
results was found with no significant difference between the AU1 subscales. The smaller
number of individuals used for this analysis reduces the power to detect differences
between the AU1 and WI argumentation subscales that was resulted from the previous
analysis. The WI scores still illustrated the significant decrease between the argument
and alternative explanation subscales. However, just as the AU1 results in this
comparison must be treated with some skepticism due to a low n, so must this finding as
well. It is also interesting to note that there was a significant increase in the WI argument
subscale from the AU1 argument subscale – the first positive increase indicated. It was
likely due to the low n, though, as there was no difference between the alternative
102
explanation subscales and there was no overall difference between the two
administrations. Lastly, there were no interactions due to major for either combination.
F
df
p
Effect
Size
AU1 Argument vs. AU1 Alternative Explanationa
2.01
1, 29
0.167
0.07
WI Argument vs. WI Alternative Explanationa
7.79
1, 29
0.009*
0.21
Interaction among AU1, WI, and Majora
0.03
1, 29
0.869
0.00
AU1 Argument vs. WI Argumentb
4.57
1, 29
0.041*
0.14
AU1 Alternative Explanation vs. WI Alternativeb
Explanation
0.39
1, 29
0.537
0.01
Interaction of Argumentation Subscales and Majora
0.57
2, 28
0.571
0.04
Variables Tested
Note. . aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bIntersubscale comparisons were completed as univariate tests with a Bonferroni correction for alpha. Biology n
= 14, Not Biology n = 17, and Total n = 31. *Statistically significant difference at α = 0.05.
Table 32. Repeated Measures MANOVA Comparison of AU1 and WI Argumentation
Subscale Scores by Major
Correlation of Hypothetico-Deductive Reasoning and Argumentation
To determine the relationship between HD reasoning and argumentation skills, a
Pearson correlation was completed for each administration of the instrument by biology
major or non-biology major. In each instance, the resulting correlation was significantly
different from 0 and positive, as expected (Table 33). The values for the autumn
administrations were low to moderately correlated, while the winter administration
demonstrated substantial correlations. This discrepancy was most likely due to the greater
103
number of individuals participating in the autumn administrations, increasing the
variability of the scores. Even though the winter administration values were much higher
for each major, the n in each case was over 25, and therefore the correlations can be
considered relatively unbiased. Overall, these findings support the hypothesis that HD
reasoning and argumentation skills are positively related and moderately correlated.
Correlation
AU1 LCTSR and AU1 Argumentation
AU2 LCTSR and AU2 Argumentation
WI LCTSR and WI Argumentation
n
r
p
Biology
114
0.23
0.015*
Not Biology
270
0.20
0.001*
Biology
82
0.35
0.001*
Not Biology
231
0.27
0.000*
Biology
28
0.55
0.003*
Not Biology
24
0.50
0.013*
Note. All significance tests are two-tailed. *Statistically significant difference at α = 0.05.
Table 33. Pearson Product Moment Correlations between LCTSR and Argumentation
Scores by Administration
Three-Time Participants
Of the 460 initial volunteer participants, 27 gave complete data for the LCTSR
instrument and 29 gave complete data for the argumentation instrument for all three
administrations. Of these small samples, 26 individuals completed both instruments at
each administration. As these few individuals out of the larger sample were the most
dedicated to the study, they needed to be looked at more closely. As previously stated, the
104
collection of data during the winter quarter occurred at the end of the participants’ last lab
section meeting when they completed their final practical exam. Therefore, without any
incentive to participate, it can be inferred that those who chose to complete the
instruments could be considered to take the research more seriously and respond to the
best of their abilities. With this in mind, this sub-set of participants may give a more
precise picture as to the nature of SR development. To be assured that the three-time
participants were representative of all the participants, a MANOVA comparison of
demographics was performed (Table 34). The assumption of equality of variances was
not met; however the robustness of the MANOVA with a high sample size reduces this
threat. In addition, one three-time participant did not provide information on the number
of undergraduate science courses completed. Therefore, this variable was compared using
an independent samples t-test (Table 35). This analysis also did not meet the assumption
of equality of variances, so the test statistic which did not assume equality of variances
was utilized.
For all of the demographic variables investigated, the three-time participants were
not significantly different from all other participants. A slight, but non-significant
increase in the number of undergraduate science courses taken by three-time participants
was found. As this variable was hypothesized to have an intervening influence on the
results, this may be a cause of concern. However, the previous multiple regression
conducted on each of the LCTSR and argumentation scores found no consistent influence
due to this factor. This finding, coupled with the non-significance of the t-test, leads to
the conclusion that the three-time participants could be considered similar in
demographics to the other participants.
105
Demographic
Biology Major
Bio Mj = 1, Other = 0
Age
Gender
Female = 0, Male = 1
Years as an
Undergraduate
Seeking a B.S.
BS = 1, Other = 0
Go to Professional School
Yes = 1, Other = 0
Population
n
M
SD
F
df
p
Three-Time
30
0.43
0.50
2.33
1, 451
0.128
All Other
423
0.30
0.46
Three-Time
30
20.73
3.08
1.67
1, 451
0.194
All Other
423 20.19
2.14
Three-Time
30
0.37
0.49
0.82
1, 451
0.367
All Other
423
0.45
0.50
Three-Time
30
2.32
1.15
0.08
1, 451
0.772
All Other
423
2.25
1.23
Three-Time
30
0.97
0.18
3.63
1, 451
0.058
All Other
423
0.84
0.37
Three-Time
30
0.37
0.49
1.89
1, 451
0.170
All Other
423
0.50
0.50
Table 34. MANOVA Demographics Comparison of Three-Time Participants with All
Other Participants
Demographic
Number of Science
Courses Taken
Population
n
M
SD
t
df
p
Three-Time
29
4.34
2.88
2.00
30.09
0.054
All Other
405
3.25
2.06
Table 35. Independent T-test Comparison of Number of Science Courses Taken by ThreeTime Participants and All Other Participants
106
Although there didn’t appear to be any demographic difference between the threetime participants and all other participants, the small sample size and disparity between
the number of those who completed the LCTSR and argumentation instruments needed to
be more closely examined. Table 36 presents the distribution of the three-time
participants by instrument type.
Number of ThreeTime Participants
LCTSR and
Argumentation
26
LCTSR Only
Argumentation
Only
Total
1
3
30
Table 36. Number of Three-Time Participants Who Completed the LCTSR and
Argumentation Instruments
A closer examination of the participants who completed only one instrument at all three
administrations did not reveal any particular patterns. The four individuals were a
combination of three sophomores and one senior, three females and one male, from a
variety of majors, and had different post-baccalaureate plans. They were also from
different lab sections. No consistent pattern was found in their individual LCTSR and
argumentation scores. However, the individual who only completed the three
administrations of the LCTSR instrument did score very low on the WI administration.
As this individual was not the only participant to score low on the WI LCTSR
administration, it is unlikely that this had a significant effect.
107
Change in Overall SR Scores
As all previous statistical analyses have been eliminating participants’ data using
the list-wise option, this was a limitation for comparing all three administrations in one
statistical test. With only approximately 30 individuals completing either WI instruments,
a list-wise elimination of the entire data set while comparing the three administrations
together would only focus on those 30 individuals for the AU1 to AU2 analyses as well
and be limited in information. With a focus on the three-time participants, this is no
longer an issue. In comparing the three-time participants as a subsample of the total
participants, the main interest is in determining if there was a different pattern of results
than those found when all participants were included. To this end, the previous statistical
analyses were conducted again on the three-time participant population.
Change in LCTSR scores. The LCTSR scores for all three administrations were
analyzed using a repeated measures MANOVA, also taking into consideration biology
majors versus non-biology majors (Tables 37 and 38). The assumption of equal variances
was met. It was first noted that the mean scores for the three-time participants appeared
slightly higher than that of all participants for each of the biology and non-biology
majors. Only the biology majors appeared to exhibit the same decrease in AU2 scores.
When compared in the MANOVA, there was no significant difference among the three
administrations and no interaction with biology major. This lack of difference also
supports the possibility that the two LCTSR forms were not as different as originally
thought and the original decrease can be attributed to participant attitudes at the end of
the quarter. However, it also demonstrates that no increase was seen. If these participants
are hypothesized to be more dedicated students, it is possible that there may be a ceiling
108
effect. On the other hand, mean scores an average of 3 to 4 points from the maximum
score are unlikely to be the highest the “better” student participants can achieve.
AU1
AU2
WI
M
SD
n
Biology
9.00
1.86
15
Not Biology
8.67
1.45
12
Biology
7.75
3.08
15
Not Biology
9.13
2.03
12
Biology
8.67
3.06
15
Not Biology
9.13
2.64
12
Table 37. Descriptive Statistics of AU1, AU2, and WI LCTSR Scores by Major for ThreeTime Participants
Variables Tested
F
df
p
Effect Size
AU1 vs. AU2 vs. WI LCTSR Scores
0.44
2, 24
0.652
0.04
Biology majors vs. Not Biology Majors
0.46
1, 25
0.506
0.02
Interaction between LCTSR Scores and Major
2.13
2, 24
0.141
0.15
Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests.
Table 38. Repeated Measures MANOVA Comparison of AU1, AU2, and WI LCTSR
Scores for Three-Time Participants
109
Change in argumentation scores. The total argumentation scores were also
analyzed via a repeated measures MANOVA, considering all three administrations and
differences related to major (Tables 39 and 40). Similar to the LCTSR score results for
the three-time participants, the assumption of equality of variances was met and there
were no significant differences among the three administrations. There was also no
interaction between argumentation scores and major. In addition, as opposed to the
LCTSR scores, overall means of each administration appear similar to those of all the
participants, possibly indicating a general difficulty with the instrument. Regardless, this
second lack of development of SR scores in a population believed to have earnestly
completed the instruments lends credence to the previous findings that without a specific
focus, no improvement in SR will be noticed.
AU1
AU2
WI
M
SD
n
Biology
6.69
2.18
16
Not Biology
6.63
2.67
13
Biology
5.85
2.51
16
Not Biology
5.63
3.14
13
Biology
5.62
2.60
16
Not Biology
6.31
2.80
13
Table 39. Descriptive Statistics of AU1, AU2, and WI Argumentation Scores by Major for
Three-Time Participants
110
Variables Tested
F
df
p
Effect Size
AU1 vs. AU2 vs. WI Argumentation Scores
1.50
2, 26
0.243
0.10
Biology majors vs. Not Biology Majors
0.03
1, 27
0.869
0.00
Interaction between Argumentation Scores
and Major
0.78
2, 26
0.471
0.06
Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests.
Table 40. Repeated Measures MANOVA Comparison of AU1, AU2, and WI
Argumentation Scores for Three-Time Participants
Change in LCTSR and Argumentation Subscale Scores
LCTSR item comparison. As no difference was found between the AU1 and AU2
LCTSR scores for the three-time participants, a repeated measures MANOVA was
conducted to determine if there was a difference in mean score patterns of the item pairs
(Table 41). The assumption of equality of variances was not met for this analysis.
However, MANOVA is relatively robust regarding this assumption. The pattern of AU1
Form A scores is similar to that of all participants (Figure 9). Both Item pairs A7, A8 and
A11, A12 are the lowest scores, again demonstrating a difficulty with the control of
variables and direct HD reasoning. An overall significant difference was found among
the item pairs with an effect size of 0.83, but no interactions among item-pair scores and
major were found. However, the pattern of significant differences found with item-pair
pairwise comparisons changed distinctly (Table 42). The only item pair retaining its
significant difference from all other item pairs was A7, A8. The difficulty of this item
pair in this population can signify that either the problem was truly difficult to decipher
or the participants sincerely had difficulty with this aspect of HD reasoning.
111
2.2
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
A1, A2
A3, A4
A5, A6
A7, A8
A9, A10 A11, A12
AU1 LCTSR Item Pair
Figure 9. Mean AU1 LCTSR item-pair scores by major for three-time participants.
Variables Tested
F
df
p
Effect Size
AU1 LCTSR Item-Pair Scores
22.99
5, 24
0.000*
0.83
Biology majors vs. Not Biology Majors
0.82
1, 28
0.372
0.03
Interaction between LCTSR Item-Pair
Scores and Major
0.91
5, 24
0.493
0.16
Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n =
13, Not Biology n = 17, and Total n = 30. *Statistically significant difference at α = 0.05.
Table 41. Repeated Measures MANOVA Comparison of AU1 LCTSR Item-Pair Scores
for Three-Time Participants
112
Item Pair
A3, A4
A5, A6
A7, A8
A9, A10
A11, A12
A1, A2
0.033*
1.000
0.000*
1.000
0.023*
0.799
0.000*
1.000
1.000
0.000*
1.000
0.329
0.000*
0.009*
A3, A4
A5, A6
A7, A8
A9, A10
0.210
Note. *Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α =
0.05. Underlined values represent those pairwise comparisons that are different from the same comparisons
found using all participants’ data.
Table 42. P-values from MANOVA Post-hoc Comparison of AU1 LCTSR Item-Pair
Scores for Three-Time Participants
When scrutinizing the item-pair data for the AU2 Form B, a new pattern of mean
scores was revealed. This graph was more flattened in shape and distinct differences
between biology and non-biology majors were evident (Figure 10). As the assumption of
equality of variances was met, an overall significant difference among the item pairs with
a very large effect size (0.77) was still found by the MANOVA and no significant
interaction with major choice was identified (Table 43). However, as in Form A, a new
pattern of item-pair pairwise comparisons was found (Table 44). The three-time
participants’ only item pair that retained its significant difference from nearly all other
item pairs was B11, B12. This once again highlights difficulties participants had with
direct HD reasoning.
113
2.2
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
B1, B2
B3, B4
B5, B6
B7, B8
B9, B10 B11, B12
AU2 LCTSR Item Pair
Figure 10. Mean AU2 LCTSR item-pair scores by major for three-time participants.
Variables Tested
F
df
p
Effect Size
AU2 LCTSR Item-Pair Scores
15.98
5, 24
0.000*
0.77
Biology majors vs. Not Biology Majors
0.93
1, 28
0.343
0.03
Interaction between LCTSR Item-Pair
1.70
5, 24
0.173
0.26
Scores and Major
Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n =
13, Not Biology n = 17, and Total n = 30. *Statistically significant difference at α = 0.05.
Table 43. Repeated Measures MANOVA Comparison of AU2 LCTSR Item-Pair Scores
for Three-Time Participants
114
Item Pair
B3, B4
B5, B6
B7, B8
B9, B10
B11, B12
B1, B2
1.000
0.139
1.000
0.274
0.148
0.424
1.000
1.000
0.001*
0.269
1.000
0.000*
1.000
0.000*
B3, B4
B5, B6
B7, B8
B9, B10
0.000*
Note. *Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α =
0.05. Underlined values represent those pairwise comparisons that are different from the same comparisons
found using all participants’ data.
Table 44. P-values from MANOVA Post-hoc Comparison of AU2 LCTSR Item-Pair
Scores for Three-Time Participants
Argumentation subscale comparison. As with the total LCTSR scores, no
differences were found among the administrations of the argumentation instrument. It
was unknown whether this lack of difference was due to a score change in either the
subscale score or overall score. To better discern this pattern, in particular between AU1
and AU2, repeated measures MANOVAs looking at the differences between subscales
and quarters were used (Figure 11 and Table 45). The assumption for equality of
variances was met and no significant interactions due to major were found. A significant
decrease in score was found between the argument subscale and the alternative
explanation subscale in each administration. However, there was no difference in either
subscale between AU1 and AU2. This infers that there was no greater difficulty on Form
B than Form A and there was an overall change in both subscale scores, not just one in
particular.
115
2.2
2
1.8
1.6
Mean Score
1.4
1.2
Biology
Not Biology
1
0.8
0.6
0.4
0.2
0
AU1 Argument
AU1 Alter. Explan.
AU2 Argument
AU2 Alter. Explan.
Argumentation Subscale by Administration
Figure 11. Mean AU1 and AU2 argumentation subscale scores by major for three-time
participants.
F
df
p
Effect
Size
AU1 Argument vs. AU1 Alternative Explanationa
9.95
1, 28
0.004*
0.26
AU2 Argument vs. AU2 Alternative Explanationa
4.31
1, 28
0.047*
0.13
Interaction among AU1, AU2, and Majora
0.53
2, 27
0.595
0.04
AU1 Argument vs. AU2 Argumentb
2.65
1, 28
0.115
0.09
AU1 Alternative Explanation vs. AU2 Alternativeb
Explanation
1.12
1, 28
0.300
0.04
Interaction of Argumentation Subscales and Majora
0.04
2, 27
0.962
0.00
Variables Tested
Note. aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bIntersubscale comparison’s were completed as a univariate test with a Bonferroni correction for alpha.
*Statistically significant difference at α = 0.05. Biology n = 13, Not Biology n = 17, and Total n = 30.
Table 45. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation
Subscale Scores by Major for Three-Time Participants
116
Correlation of HD Reasoning and Argumentation
A Pearson correlation between the LCTSR and argumentation scores for each
administration by major was completed to determine if there were any differences
between the three-time participants and all other participants (Table 46). However, due to
the limited sample size (less than 25), it was likely that bias would be a consideration. In
fact, only four of the six correlations were found to be significantly different than 0. The
significant correlations though were all positive and stronger than those found using all
participants, as they all could be characterized as substantial. Practically, although the
pattern of results is similar to those already found, the differences between the three-time
participants and all participants was likely due to bias and not very informative.
Correlation
AU1 LCTSR and AU1 Argumentation
AU2 LCTSR and AU2 Argumentation
WI LCTSR and WI Argumentation
n
r
p
Biology
13
0.22
0.472
Not Biology
17
0.63
0.007*
Biology
13
0.76
0.003*
Not Biology
17
0.60
0.012*
Biology
12
0.77
0.004*
Not Biology
14
0.44
0.112
Note. All significance tests are two-tailed. *Statistically significant difference at α = 0.05.
Table 46. Pearson Product Moment Correlations between LCTSR and Argumentation
Scores by Administration for Three-Time Participants
117
Summary of Key Findings
Overall, several key findings were determined:
1. Students enrolled in an introductory biology course initially had slightly higher
LCTSR scores than those individuals described in the literature. These scores did
not change overall during the course of the two-quarter sequence. Students
appeared to have difficulty with the control of variables and direct HD reasoning.
2. An other-science declared major may have been an influencing factor for LCTSR
scores; however, declaring a biology major or non-biology major was not a factor
in any score comparison.
3. Students taking an introductory biology course demonstrated difficulties with
generating and rebutting alternative explanations compared to creating an initial
argument. These scores did not change overall during the course of the twoquarter sequence.
4. No influencing factor could be determined for argumentation scores. In addition,
declaring a biology major or non-biology major was not a factor in any score
comparison.
5. There was a moderate positive relationship between HD reasoning and
argumentation scores.
6. When looking at a subsample of participants who completed instruments in all
three administrations, no improvement was seen in LCTSR scores and similar
difficulties with control of variables and direct HD reasoning were evident.
7. When looking at a subsample of participants who completed instruments in all
three administrations, no improvement was seen in argumentation scores and
118
similar difficulties with identifying and rebutting alternative explanations were
found.
8. Overall, the course offered no direct attention to HD reasoning or argumentation.
The lack of improvement in scores over the first quarter and entire two-quarter
sequence may have reflected this.
119
CHAPTER 5
DISCUSSION AND IMPLICATIONS
This study has addressed three holes found in information provided by the
literature. First, this study focused on biology majors and how they compare to other
students taking the same introductory coursework. Very little is known about the
education of science majors in general and biology majors in particular. This is critical
due to the attrition of science majors and the increasing impact that science has on daily
life. This study has attempted to establish a baseline regarding biology majors’ scientific
reasoning skills from which interventions could be designed and evaluated. It was found
that biology majors are not much different from other undergraduate populations in the
literature or other individuals in their introductory courses. Second, the deductive aspects
and inductive aspects of scientific reasoning were investigated concurrently in this study.
Each type of reasoning is utilized in science and it is important that both aspects are
attended to in science education. By studying them concurrently, the relationships
between the two types of reasoning can be established and considered. This study
revealed that deductive and inductive reasoning, as measured by the LCTSR and
argumentation instrument, are moderately correlated. Lastly, this study focused on
individuals’ argumentation abilities regarding scientific scenarios. As argumentation of
120
experimental findings is an important aspect of the culture of science, this skill needs to
also be assessed using scientific data. The students in this study demonstrated similar
difficulties with the recognition and rebuttal of alternative explanations, as seen in other
populations in the literature. Overall, this study offers information to begin to fill these
holes in the science education literature.
Limitations of the Study
Although most of the study results are statistically strong, there are several
limitations to this study that restrict the practical extension of the findings. The first
limitation is due to the instrumentation utilized in this study. The LCTSR has welldocumented validity and reliability. It has also been used repeatedly in studies of
undergraduates. However, its original strength and internal reliability lays in the
redundant nature of the items assessing the six aspects of HD reasoning. By splitting the
instrument in half, this internal reliability was reduced and some validity lost. This study
is also the first use of the argumentation instrument. The validity and reliability of the
instrument are promising, but need to be further developed and studied, especially in
comparison to the more regular use of oral interviews or discourse analysis to assess
argumentation skills. Lastly, the administration of two different forms for each
instrument likely influenced the AU2 results. To account for differences among lab
sections and to ensure that three-time participants all completed the same order of
instruments, it was decided to present only one form at each administration. An
alternative would have been to randomly assign the two forms for each instrument to lab
sections for the AU1 administration and reversed the type of form for the AU2
administration. This method would have accounted for a difference in the equivalency of
121
the forms. However, the process of assuring that each participant would then receive the
correct form for the WI administration was prohibitive due to the inability to know which
participants would continue in the study in the second quarter.
Another aspect of the study that places limitations on the findings is the
instrument administration process. As previously noted, the teaching assistants
distributed the instruments in the regular lab sections of the introductory biology courses.
Students were given no incentive to participate and this may have limited the validity of
the results by limiting the motivation of the participants. This is especially a concern for
the winter administration. Out of approximately 400 students enrolled in the second
introductory biology course, only 59 completed the instruments. By offering the students
the instrument as an option at the end of their laboratory practical exam during the last
week of the course, there was very little impetus for the students to participate in the
study. The limited number of individuals who then completed instruments at all three
administrations possibly introduced bias to the results that concern both quarters.
However, it is difficult to assess this bias as no significant difference was found over the
two quarters. Lastly, although many studies have demonstrated an improvement in SR
skills over a short period of time (e.g. Lawson, 1992a; Zohar & Nemet, 2002), it is
possible that two quarters was not a long enough time to detect any improvement in SR
skills.
A last limitation of the study regards the support for the theoretical model given in
Figure 2. By solely focusing on students in the introductory courses whose SR skills are
limited, only a low positive relationship could be established between HD reasoning and
argumentation skills. The relationship between these two aspects of SR needs to be
122
further examined with individuals at presumed higher levels of SR skills, such as upperlevel undergraduates, graduate students, and scientists. Studying individuals with higher
levels of SR abilities may also be better able to tease out relationships among the various
underlying skills identified by the LCTSR and argumentation instruments. Overall,
though, the findings in this study cannot be extended much beyond the particular
circumstances identified here.
The Assumption of Natural Development of SR Skills
Professors of science have generally assumed that participation in regular
undergraduate science coursework will implicitly foster individuals’ scientific reasoning
as they memorize facts and understand concepts. Yet, these same professors are troubled
time and again when students have difficulties writing laboratory reports or applying new
information to an experimental situation. Hogan and Maglienti (2001) established that
scientists do not come to truly understand what characterizes good scientific reasoning
until they participate in the science process themselves, arguing their conclusions to their
peers. This study complements this finding, as the participants did not have an experience
even close to that of working in an actual research laboratory. In the coursework, students
were examined with primarily factual-recall questions and had a low level of inquiry in
their laboratory exercises. This two-quarter introductory biology course could be
considered typical at a large university. Very little attention was paid to scientific
reasoning, even implicitly. With little to no attention, the students in the study did not
increase their scores over one or both quarters. Even the biology majors and three-time
participants, who would have been expected to improve due to their interest in their own
scientific reasoning, did not exhibit any significant improvement. In addition, the number
123
of previous science courses was not found to be an influencing factor on either HD
reasoning or argumentation skills. However, although the findings do not establish any
SR skills improvement, this conclusion needs to be moderated as the validity and
reliability of the instruments used was weakened due to their administration.
These findings do imply that to improve SR skills, specific explicit attention
needs to be focused on them as part of the learning goals of the course. Previous studies
(Jiménez-Aleixandre et al., 2000; Osborne et al., 2004; Zohar, 1996; Zohar & Nemet,
2002) have demonstrated that directed intervention improves argumentation skills. The
control group individuals in these studies did not significantly improve their
argumentation skills compared to those who experienced an explicit intervention. At the
college level, this type of intervention could occur in several different ways. One could
be a specific course, in addition to their regular coursework, designed to teach these types
of skills to biology or all science majors. A more likely scenario is one where instructors
consciously focus on aspects of good reasoning and test students on the use of these
skills. Even though this could more readily occur in laboratory courses through practical
use and application of SR skills, the lecture can and should be utilized as well through
discussing and modeling strong SR skills. Another possibility is for students to become
involved in laboratory research much earlier than their junior and senior years, when
most students enroll in independent research. Hopefully, the focus on improvements in
students’ scientific reasoning earlier in their college careers would also allow them to be
more connected to science as a process, reducing the feelings of alienation that
commonly lead to attrition (Seymour & Hewitt, 1997).
124
Particular Findings for Hypothetico-Deductive Reasoning
One of the expectations of this study was that biology majors would have a higher
initial HD reasoning score than the other populations in the literature. According to
Piaget (1972), HD reasoning in different topic areas develops based experience and
aptitude. It was assumed that students interested in taking a major-level biology course
would have a greater aptitude for SR than students enrolled in a non-major course. It was
found that the mean LCTSR score placed the students in the low range of formalreasoning ability, slightly higher than the concrete to transitional range generally found
with students enrolled in non-majors biology courses. Also, the percentage of students
who scored in this range or higher (53 – 81%) was higher than the 50% of students
previously reported for non-majors in biology (Johnson & Lawson, 1998; Lawson,
1992a). This implies that instructors of biology majors’ courses can expect their students
to come in with a higher level of HD reasoning, although not by a great deal.
One of the other key findings regarding HD reasoning in this study was the
students’ lower scores on the LCTSR item pairs assessing control of variables and direct
HD reasoning. These results indicate that biology majors and other students enrolled in
introductory biology courses have difficulty with two skills central to good scientific
research. This finding, coupled with the lack of instructional attention, does not bode well
for students’ future success in laboratory courses or independent research. These skills
need to be practiced for students to improve. This is difficult when students are primarily
given laboratory exercises in which the design of an experiment is already provided. HD
reasoning links the hypothesis, design, and predictions of an experiment together. If the
hypothesis and design are already provided, it may be difficult for students to practice
125
linking the reasoning behind them. Overall, students need more opportunities to design
their own experiments to help improve their HD reasoning, preferably in a Level 3 and
above inquiry setting.
Particular Findings for Argumentation
The analysis of argumentation in this study focused on the overall structure and
support of the argument, not the correctness of the content used. Overall, the
argumentation findings were similar to that in the literature. Students in this study
demonstrated a significant decrease in average scores for the recognition and rebuttal of
alternative explanations compared to the development of an initial argument (Osborne et
al., 2004; Zohar & Nemet, 2002). This finding reflects a "myside" bias and difficulty
recognizing other explanations for the same data set (D. Kuhn, 1992; Toplak &
Stanovich, 2003). In addition, the difficulties students displayed for initial argument
generation reflected those given by Hogan and Maglienti (2001) – too broad of claims
and lack of specific data to support the claims. However, students did recognize the
importance of empirical data in supporting their own claim by often citing that “more
research was needed” to choose between their conclusion and the alternative.
Unfortunately, by not using the data given, this also indicates that students may have
difficulty taking a stand with regard to their conclusions. This could be due to the
perception that science is a collection of correct facts, not empirically supported theories
– a point emphasized by the focus on factual-recall in college biology instruction. Further
refinement of the argumentation instrument and a more focused study could help
determine whether students truly had difficulty identifying and rebutting alternative
126
explanations or simply were tired of answering questions on an instrument that did not
benefit them specifically.
The findings in the study imply that students first need to learn to develop a
strong scientific argument. To this end, students must also understand that science is
based on a preponderance of empirical data that is repeatedly argued to develop useful
theories and laws. Instructors could implement this in their classrooms by simply
modeling a classic biology experiment and the argument the researchers created for their
theory or model. Students should also be encouraged to analyze their own experimental
data in laboratories and to specifically support their analysis with theories learned in
class. In the laboratory exercises used by the courses in this study, the most common type
was the structured inquiry, which does provide students the opportunity to analyze their
own results (albeit with instructor/laboratory manual guidance) and come to their own
conclusions. In fact, one exercise focused on the historical development of the structure
of DNA and discovery of DNA replication through the analysis of alternative
explanations. However, often students are not expected to explicitly explain the theory
behind their conclusions or identify any possible alternative scientific explanations. This
may be precisely what is needed.
Biology Majors as a Population
This study adds to the little that is known about biology majors as a population
and begins to establish a baseline for future work. The most consistent result in the
statistical tests used was that the biology majors were not significantly different from the
other students enrolled in the introductory biology courses. This is somewhat surprising
as many of the other students in the course were alternative majors who needed only one
127
quarter of biology coursework. However, this does characterize biology majors as being
no more adept at HD reasoning or argumentation with a biology data set than other
students enrolled in the same course. This may also imply that biology majors have
difficulty recognizing these important underlying aspects of science as a process. Instead,
they are left to view biology as a science that is primarily a compilation of facts. This is
of particular concern for the retention of biology majors. Seymour and Hewitt (1997)
documented that one of the leading causes of attrition of science majors is a feeling of
isolation due to high loads of necessary fact memorization. If biology majors could
understand that science is more of a process than a collection of facts, this may help
alleviate some of the perceived isolation. It is also well understood that many students
who choose to major in biology often do so with a desire to enter medical school after
graduation. For both this population and the remaining biology majors who may wish to
go on to graduate school in biology, the strong development of SR skills is crucial to help
mold better doctors and better researchers.
This study then provides some evidence that the process by which the large
lecture introductory biology course is generally taught has not been helping biology
majors develop their SR skills early in the curriculum. “Methods of science” is often one
of the top objectives of the introductory courses. However, students in these classes may
be having difficulty understanding these methods well because they are not developing
the mental reasoning skills, both HD and argumentation, that lay under them. In addition,
the main learning goal of the introductory biology course is to understand basic content
knowledge. Yet, even this aspect of biology education is influenced by SR skills. Johnson
and Lawson (1998) found that HD reasoning skills, not prior knowledge, was a better
128
predictor of course achievement. Implicitly addressing these SR aspects of biology in the
laboratory and lecture does not appear to be enough to reveal some improvement. SR
skills, both hypothetico-deductive and argumentative, should be specifically and
explicitly addressed in order to ensure that biology majors are reaching these course and
curriculum objectives. By placing more emphasis on SR skills early in the program, the
foundation can be laid for the remaining curriculum and work beyond.
Future Work
Future work related to this study will focus on two main aspects: instrument
refinement and factors that impact SR skills. The first goal is to further refine and
validate the argumentation instrument, particularly with oral interviews. Typically, the
research of argumentation is completed through oral interviews or discourse analysis. To
truly make the argumentation instrument useful, it will need to provide a similar level of
information as the current analytical systems. It will also be important to compare the
instrument as it stands with scientific data sets to a similar version with a socioscientific
topic. This could help bridge the gap between argumentation research on socioscientific
issues and scientific data. An end goal of the instrument refinement is not only to make it
a better academic research tool, but also to make it a useful action research tool for
teachers. The LCTSR, due to its reliability, validity, and ease of use, is often noted in
action research conducted both formally and informally. It would be useful if a pencil and
paper instrument could also be developed to assess students’ inductive reasoning of
scientific data. This could help classroom teachers formatively assess their students’
abilities and focus on areas of need.
129
The other main area of future study will be to follow the development of biology
majors’ scientific reasoning skills throughout the entire curriculum. This research will
allow for the identification of any key types of courses, experiences, or points in the
curriculum that particularly aid in SR skills development. This research may also help to
identify characteristics of these factors that could be developed for interventions to aid in
SR skills development. For example, due to the intimate nature of SR with the nature of
science, it will be important to identify correlations between SR skills and students’
understanding of the nature of science. From this and the literature, future work can focus
on developing and implementing course and curriculum interventions. Part of this work
will also be identifying the impact of instructors’ attitudes and attentions related to giving
explicit instruction on scientific reasoning in both the lecture and laboratory. It will also
be of interest then to determine if these types of influences and intervention have any
rolling impact on the K-12 classrooms of biology majors who wish to teach.
Overall, the work in this study and beyond is only beginning to unravel the needs
of biology majors as a population. More information is needed regarding the
development of SR skills throughout the undergraduate program and factors that affect
this development, along with determining methods of intervention. From this point,
biology departments can more readily prepare their students for their future, whether it is
as researchers, medical professionals, teachers, or just scientifically literate individuals.
130
REFERENCES
Agnes, M. (Ed.). (2002). Webster's new world college dictionary (4th ed.). Cleveland,
OH: Wiley Publishing, Inc.
Allchin, D. (2003). Lawson's shoehorn, or should the philosophy of science be rated 'x'?
Science & Education, 12, 315-329.
American Association for the Advancement of Science. (1990). Science for all
Americans. New York: Oxford University Press.
Arlin, P. K. (1975). Cognitive development in adulthood: A fifth stage? Developmental
Psychology, 11(5), 602-606.
Baron, J. (1991). Beliefs about thinking. In J. F. Voss, D. N. Perkins & J. W. Segal
(Eds.), Informal reasoning and education (pp. 169-186). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Bender, H. (2005). The study of biology. Change, 37(2), 42-43.
Bonstetter, R. J. (1998). Inquiry: Learning from the past with an eye on the future.
Electronic Journal of Science Education, 3(1), Guest Editorial.
Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (2000). How people learn:
Brain, mind, experience, and school (Expanded ed.). Washington, D.C.: National
Academy Press.
Carter, J. L., Heppner, F., Saigo, R. H., Twitty, G., & Walker, D. (1990). The state of the
biology major. BioScience, 40(9), 678-683.
Cavallo, A. M. L., Rozman, M., Blickenstaff, J., & Walker, N. (2003/2004). Learning,
reasoning, motivation, and epistemological beliefs: Differing approaches in
college science courses. Journal of College Science Teaching, 33(3), 18-22.
131
Cerbin, B. (1988). The nature and development of informal reasoning skills in college
students. Paper presented at the National Institute on Issues in Teaching and
Learning, Chicago, IL. (ERIC Document Reproduction Service No. ED 298805).
Chiappetta, E. L. (1976). A review of Piagetian studies relevant to science instruction at
the secondary and college level. Science Education, 60(2), 253-261.
Chinn, C. A., & Brewer, W. F. (1993). The role of anomalous data in knowledge
acquisition: A theoretical framework and implications for science instruction.
Review of Educational Research, 63(1), 1-49.
Chinn, C. A., & Brewer, W. F. (1998). An empirical test of a taxonomy of responses to
anomalous data in science. Journal of Research in Science Teaching, 35(6), 623654.
Committee on Undergraduate Biology Education to Prepare Research Scientists for the
21st Century. (2003). BIO2010: Transforming undergraduate education for future
research biologists. Retrieved May 8, 2006, from
http://www.nap.edu/catalog/10497.html
Daempfle, P. A. (2002). Instructional approaches for the improvement of reasoning in
introductory college biology courses: A review of the research. (ERIC Document
Reproduction Service No. ED468720)
Driver, R., Newton, P., & Osborne, J. (2000). Establishing the norms of scientific
argumentation in classrooms. Science Education, 84(3), 287-312.
Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning (5th
ed.). Belmont, CA: Thomson Wadsworth.
Harker, A. R. (1999). Full application of the scientific method in an undergraduate
teaching laboratory. Journal of College Science Teaching, 29(2), 97-100.
Hewson, M. G., & Hewson, P. W. (2003). Effect of instruction using students' prior
knowledge and conceptual change strategies on science learning. Journal of
Research in Science Teaching, 40(Supplement), S86-S98.
Hodson, D. (1996). Laboratory work as scientific method: Three decades of confusion
and distortion. Journal of Curriculum Studies, 28(2), 115-135.
Hogan, K., & Maglienti, M. (2001). Comparing the epistemological underpinnings of
students’ and scientists’ reasoning about conclusions. Journal of Research in
Science Teaching, 38(6), 663-687.
132
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to
adolescence: An essay on the construction of formal operational structures (A.
Parsons & S. Milgram, Trans.). New York: Basic Books, Inc.
Jiménez-Aleixandre, M. P., Rodríguez, A. B., & Duschl, R. (2000). "Doing the lesson" or
"doing science": Argument in high school genetics. Science Education, 84(6),
757-792.
Johnson, M. A., & Lawson, A. E. (1998). What are the relative effects of reasoning
ability and prior knowledge on biology achievement in expository and inquiry
classes? Journal of Research in Science Teaching, 35(1), 89-103.
Karplus, R. (1977). Science teaching and the development of reasoning. Journal of
Research in Science Teaching, 14(2), 169-175.
Kuhn, D. (1992). Thinking as argument. Harvard Educational Review, 62(2), 155-178.
Kuhn, D. (1993a). Connecting scientific and informal reasoning. Merrill-Palmer
Quarterly, 39(1), 74-103.
Kuhn, D. (1993b). Science as argument: Implications for teaching and learning scientific
thinking. Science Education, 77(3), 319-337.
Kuhn, D., & Pearsall, S. (2000). Developmental origins of scientific thinking. Journal of
Cognition and Development, 1(1), 113-129.
Kuhn, T. S. (1993). Logic of discovery or psychology of research? In J. H. Fetzer (Ed.),
Foundations of philosophy of science: Recent developments (pp. 364-380). New
York: Paragon House.
Kuhn, T. S. (1996). The structure of scientific revolutions (3rd ed.). Chicago: The
University of Chicago Press.
Lakatos, I. (1993). History of science and its rational reconstructions. In J. H. Fetzer
(Ed.), Foundations of philosophy of science: Recent developments (pp. 381-413).
New York: Paragon House.
Lawson, A. E. (1980). Relationships among level of intellectual development, cognitive
style, and grades in a college biology course. Science Education, 64(1), 95-102.
Lawson, A. E. (1982). The nature of advanced reasoning and science instruction. Journal
of Research in Science Teaching, 19(9), 743-760.
133
Lawson, A. E. (1983). Predicting science achievement: The role of developmental level,
disembedding ability, mental capacity, prior knowledge, and beliefs. Journal of
Research in Science Teaching, 20(2), 117-129.
Lawson, A. E. (1992a). The development of reasoning among college biology students a review of research. Journal of College Science Teaching, 21, 338-344.
Lawson, A. E. (1992b). What do tests of "formal" reasoning actually measure? Journal of
Research in Science Teaching, 29(9), 965-983.
Lawson, A. E. (1993). Using reasoning ability as the basis for assigning laboratory
partners in nonmajors biology. Journal of Research in Science Teaching, 29(7),
729-741.
Lawson, A. E. (1995). Science teaching and the development of thinking. Belmont, CA:
Wadsworth Publishing Company.
Lawson, A. E. (2000). Classroom test of scientific reasoning: Multiple choice version,
based on Lawson, A. E. (1978). Development and validation of the classroom test
of formal reasoning. Journal of Research in Science Teaching, 15(1), 11-24.
Lawson, A. E. (2003a). Allchin's shoehorn, or why science is hypothetico-deductive.
Science & Education, 12, 331-337.
Lawson, A. E. (2003b). The nature and development of hypothetico-predictive
argumentation with implications for science teaching. International Journal of
Science Education, 25(11), 1387-1408.
Lawson, A. E. (2005). What is the role of induction and deduction in reasoning and
scientific inquiry? Journal of Research in Science Teaching, 42(6), 716-740.
Lawson, A. E., Alkhoury, S., Benford, R., Clark, B. R., & Falconer, K. A. (2000). What
kinds of scientific concepts exist? Concept construction and intellectual
development in college biology. Journal of Research in Science Teaching, 37(9),
996-1018.
Lawson, A. E., Baker, W. P., DiDonato, L., & Verdi, M. P. (1993). The role of
hypothetico-deductive reasoning and physical analogues of molecular interactions
in conceptual change. Journal of Research in Science Teaching, 30(9), 10731085.
Lawson, A. E., Banks, D. L., & Logvin, M. (2007). Self-efficacy, reasoning ability, and
achievement in college biology. Journal of Research in Science Teaching, 44(5),
706-724.
134
Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Sequist, J. M., & Kwon,
Y.-J. (2000). Development of scientific reasoning in college biology: Do two
levels of general hypothesis-testing skills exist? Journal of Research in Science
Teaching, 37(1), 81-101.
Lawson, A. E., Drake, N., Johnson, J., Kwon, Y.-J., & Scarpone, C. (2000). How good
are students at testing alternative explanations of unseen entities? American
Biology Teacher, 62(4), 249-255.
Lawson, A. E., & Johnson, M. (2002). The validity of Kolb learning styles and neoPiagetian developmental levels in college biology. Studies in Higher Education,
27(1), 79-90.
Lawson, A. E., & Weser, J. (1990). The rejection of nonscientific beliefs about life:
Effects of instruction and reasoning skills. Journal of Research in Science
Teaching, 27(6), 589-606.
Lawson, A. E., & Wollman, W. T. (2003). Encouraging the transition from concrete to
formal cognitive functioning - an experiment. Journal of Research in Science
Teaching, 40(Supplement), S33-S50.
Lehmann, I. J. (1963). Changes in critical thinking, attitudes, and values from freshman
to senior years. Journal of Educational Psychology, 54(6), 305-315.
Leonard, W. H. (1989). Ten years of research on investigative laboratory instruction
strategies. Journal of College Science Teaching, 18, 304-306.
Leonard, W. H. (2000). How do college students best learn science? An assessment of
popular teaching styles and their effectiveness. Journal of College Science
Teaching, 29(6), 385-388.
Marbach-Ad, G. (2004). Expectations and difficulties of first-year biology students.
Journal of College Science Teaching, 33(5), 18-23.
Means, M. L., & Voss, J. F. (1996). Who reasons well? Two studies of informal
reasoning among children of different grade, ability, and knowledge levels.
Cognition and Instruction, 14(2), 139-178.
National Research Council. (1996). National science education standards. Washington,
D.C.: National Academy Press.
National Research Council. (1999). Transforming undergraduate education in science,
mathematics, engineering, and technology. Washington, DC: National Academy
Press.
135
National Science Foundation. (1996). Shaping the Future: New expectations for
undergraduate education in science, mathematics, engineering, and technology.
Arlington, VA: National Science Foundation.
Nersessian, N. J. (1995). Should physicists preach what they practice? Constructive
modeling in doing and learning physics. Science & Education, 4(3), 203-226.
Osborne, J., Erduran, S., & Simon, S. (2004). Enhancing the quality of argumentation in
school science. Journal of Research in Science Teaching, 41(10), 994-1020.
Pascarella, E. T. (1987). The development of critical thinking: Does college make a
difference? Paper presented at the Annual Meeting of the Association for the
Study of Higher Education, Baltimore, MD. (ERIC Document Reproduction
Service No. ED292417)
Perkins, D. N. (1985). Postprimary education has little impact on informal reasoning.
Journal of Educational Psychology, 77(5), 562-571.
Perkins, D. N., Allen, R., & Hafner, J. (1983). Difficulties in everyday reasoning. In W.
Maxwell (Ed.), Thinking: The expanding frontier (pp. 177-189). Philadelphia, PA:
Franklin Institute Press.
Perkins, D. N., Farady, M., & Bushey, B. (1991). Everyday reasoning and the roots of
intelligence. In J. F. Voss, D. N. Perkins & J. W. Segal (Eds.), Informal reasoning
and education (pp. 83-106). Hillsdale, NJ: Lawrence Erlbaum Associates.
Perkins, D. N., & Salomon, G. (1989). Are cognitive skills context-bound? Educational
Researcher, 18(1), 16-25.
Phillips, D. C. (1978). The Piagetian child and the scientist: Problems of assimilation and
accommodation. Educational Theory, 28(1), 3-15.
Piaget, J. (1972). Intellectual evolution from adolescence to adulthood. Human
Development, 15(1), 1-12.
Piaget, J. (2003). PART I. Cognitive development in children: Piaget - Development and
learning. Journal of Research in Science Teaching, 40(Supplement), S8-S18.
Popper, K. R. (1965). Conjectures and refutations: The growth of scientific knowledge.
New York: Harper & Row.
Popper, K. R. (1993). Science: Conjectures and refutations. In J. H. Fetzer (Ed.),
Foundations of philosophy of science: Recent developments (pp. 341-363). New
York: Paragon House.
136
Sadler, T. D. (2004). Informal reasoning regarding socioscientific issues: A critical
review of research. Journal of Research in Science Teaching, 41(5), 513-536.
Sadler, T. D., & Zeidler, D. L. (2005). The significance of content knowledge for
informal reasoning regarding socioscientific issues: Applying genetics knowledge
to genetic engineering issues. Science Education, 89(1), 71-93.
Seymour, E., & Hewitt, N. M. (1997). Talking about leaving: Why undergraduates leave
the sciences. Boulder, CO: Westview Press.
Shaw, V. F. (1996). The cognitive processes in informal reasoning. Thinking and
Reasoning, 2(1), 51-80.
Sigma Xi. (1990). Entry-level undergraduate courses in science, mathematics, and
engineering: An investment in human resources. Research Triangle Park, NC:
Sigma Xi, The Scientific Research Society.
Staver, J. R., & Pascarella, E. T. (1984). The effect of method and format on the
responses of subjects to a Piagetian reasoning problem. Journal of Research in
Science Teaching, 21(3), 305-314.
Thorton, S. (2005). Karl Popper. Retrieved June 6, 2006, from
http://plato.stanford.edu/archives/sum2005/entries/popper/
Toplak, M. E., & Stanovich, K. E. (2003). Associations between myside bias on an
informal reasoning task and amount of post-secondary education. Applied
Cognitive Psychology, 17, 851-860.
Toulmin, S. E. (1958). The uses of argument. Cambridge, Great Britain: Cambridge
University Press.
Toulmin, S., Rieke, R., & Janik, A. (1984). An introduction to reasoning (2nd ed.). New
York City: Macmilian Publishing Company.
van Gelder, T., & Bissett, M. (2004). Cultivating expertise in informal reasoning.
Canadian Journal of Experimental Psychology, 58(2), 142-152.
von Glasersfeld, E. (1993). Questions and answers about radical constructivism. In K.
Tobin (Ed.), The practice of constructivism in science education (pp. 24-38).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Voss, J. F., & Means, M. L. (1991). Learning to reason via instruction in argumentation.
Learning and Instruction, 1, 337-350.
137
Windschitl, M., & Buttemer, H. (2000). What should the inquiry experience be for the
learner? American Biology Teacher, 62(5), 346-350.
Woll, S. B., Navarrete, J. B., Sussman, L. J., & Marcoux, S. (1998). College students'
ability to reason about personally relevant issues. Paper presented at the
American Psychological Association Annual Convention, San Francisco, CA.
(ERIC Document Reproduction Service No. ED424556)
Zeidler, D. L. (1997). The central role of fallacious thinking in science education. Science
Education, 81(4), 483-496.
Zohar, A. (1996). Transfer and retention of reasoning strategies taught in biological
contexts. Research in Science & Technological Education, 14(2), 205-219.
Zohar, A., & Nemet, F. (2002). Fostering students' knowledge and argumentation skills
through dilemmas in human genetics. Journal of Research in Science Teaching,
39(1), 35-62.
138
APPENDIX A
ARGUMENTATION INSTRUMENT
139
Form A
While climbing in the Rocky Mountains, you find an isolated valley that seems to have
been separated from the surrounding area by a landslide centuries ago. You are surprised
to find an unknown species of yellow flower from the family Asteraceae. You know that
there are several species nearby that could possibly be the original source of this
population. You collect data from these populations to try to determine the evolutionary
source of the new flower.
Flower
Height (m)
Petal Tip
Leaf Shape
Leaf Edge
Unknown
Daisy
Sunflower
Dandelion
Goatsbeard
1.0
0.75
2.0
0.25
0.5
Round
Pointed
Round
Pointed
Round
Uneven
Uneven
Teardrop
Uneven
Uneven
Ragged
Ragged
Smooth
Ragged
Smooth
Seed
Dispersal
Bird
Wind
Bird
Wind
Bird
1. What is a conclusion that you can draw from the data regarding the evolutionary
source of this new flower?
2. What data are you using to support this conclusion?
3. What rationale links this data to your conclusion?
4. The local naturalist is excited by your find, but takes an alternative viewpoint with
regards to the data. What does he conclude?
5. How would you respond to this viewpoint?
140
Form B
You have been surveying Lake Erie for the last five years to better understand the
relationship among the zebra mussel, quagga mussel, round goby (fish), and small mouth
bass. To answer this question, you have collected the following data:
Average Mature
Organism Density per
Year (#/sample)
00 01 02 03 04
Preferred Food
Sources
Zebra Mussel
125
117
98
87
74
Protozoa
Quagga
Mussel
143
135
100
84
121
Protozoa
Round Goby
8
11
17
21
10
Mussel larvae, Algae
Small Mouth
Bass
6
9
10
18
22
Immature Round Goby,
Immature Mussels,
Minnows
Preferred
Spawning
Habitat
Rocky,
shallow
Rocky,
shallow
Sandy,
shallow
Rocky,
shallow
1. What is a conclusion that you can draw from the data regarding these
relationships?
2. What data are you using to support this relationship?
3. What rationale links this data to your conclusion?
4. A graduate student in your lab takes an alternative viewpoint with regards to the
data. What does she conclude?
5. How would you respond to this viewpoint?
141
APPENDIX B
ARGUMENTATION INSTRUMENT SCORING RUBRIC
142
Item Number
1 – Claim made
2 – Grounds used
3 – Warrants given
4 – Counterargument
generated
5 – Rebuttal offered
0
No claim made or
claim made is
irrelevant to
data/scenario
presented
No grounds used or
grounds used are
irrelevant to
data/scenario
presented (“all
data”)
No warrants given
or warrant given is
irrelevant to
data/scenario
presented or is
completely unclear
No
counterarguments
generated or
counterargument
generated is
irrelevant to
data/scenario
presented or not
opposed to initial
claim at all
No rebuttal offered
or rebuttal offered
is irrelevant to
data/scenario
presented (“both
valid” or “more
research needed”)
1
Claim made is
weakly related to
or supported by
data/scenario
presented or is
too broad/general
2
Claim made is
clearly related to
data/scenario
presented and is
conservative
Grounds given
weakly support
claim made
and/or are too
general
Grounds given
sufficiently
support claim
made, identifying
specific data and
trends
Warrant weakly
related grounds to
claim or is
somewhat unclear
Warrant is valid in
light of grounds
used and claim
made
Counterargument
given is weakly
opposed to initial
claim or
supported
by/related to
data/scenario
presented (no
answer to “why”)
Counterargument
given is clearly
related to
data/scenario
presented and
opposes initial
claim
Weak rebuttal
offered, not
supported by
grounds or just
expansion on
warrant/claim
Rebuttal is clearly
identified and
supported by
grounds, offers
new viewpoint
If the individual begins the assessment in earnest and leaves items blank, those items are
scored as 0.
143
APPENDIX C
STUDENT DEMOGRAPHIC INFORMATION INSTRUMENT
(WINTER 2007)
144
1. Are you willing to participate in the study entitled, “Scientific Reasoning Skills
Development in the First Courses of Undergraduate Biology Majors”?
YES
2. What is your age?
NO
____________
3. What is your gender?
MALE ______
FEMALE ______
4. What is your university rank?
FRESHMAN SOPHOMORE JUNIOR SENIOR
SENIOR
+
GRADUATE
STUDENT
5. How many years have you attended college as an undergraduate? __________
6. What is your declared or planned major?
____________________________
7. How many advanced placement (AP) science courses have you taken? ________
8. How many college-level science courses have you taken? __________________
9. *What quarter did you take Biology 1 or its equivalent? __________________
10. *If you did not take Biology 1 at OSU, where did you complete its equivalent?
_________________________________________________________________
This school follows (please check one): Semesters _____ Quarters _____
11. What degree are you working toward?
A.S. _____
B.A. _____
B.S. _____
145
M.S. _____
12. What are your plans after completing your degree?
JOB
BACHELOR’S
DEGREE
GRADUATE
SCHOOL:
SCIENCE
GRADUATE
SCHOOL:
OTHER THAN
SCIENCE
PROFESSIONAL
SCHOOL (i.e.
MEDICAL,
DENTAL, ETC…)
If you are not pursuing a career in science, in what field are you planning on
pursuing a career?
___________________________________________
* These questions were only asked in for the winter administration to Biology 2.
146