The Big-Fish-Little-Pond Effect - Research Center for Group Dynamics

[AERJ306728_correx2]
American Educational Research Journal
Month XXXX, Vol. XX, No. X, pp. X –X
DOI: 10.3102/0002831207306728
© AERA. http://aerj.aera.net
The Big-Fish-Little-Pond Effect: Persistent
Negative Effects of Selective High Schools
on Self-Concept After Graduation
Herbert W. Marsh
University of Oxford
Ulrich Trautwein
Oliver Lüdtke
Jürgen Baumert
Max Planck Institute for Human Development
Olaf Köller
Humboldt University
According to the big-fish-little-pond effect (BFLPE), attending academically
selective high schools negatively affects academic self-concept. Does the BFLPE
persist after graduation from high school? In two large, representative samples of German high school students (Study 1: 2,306 students, 147 schools;
Study 2: 1,758 students, 94 schools), the predictive effects of individual
achievement test scores and school grades on math self-concept are very positive, whereas the predictive effects of school-average achievement are negative (the BFLPE). Both studies showed that the BFLPE was substantial at the
end of high school and was still substantial 2 years (Study 1) or 4 years (Study
2) later. In addition, because of the highly salient system of school tracks
within the German education system, the authors are able to show that negative effects associated with school type (highly academically selective schools,
the Gymnasium) were similar—but smaller—than the BFLPE based on
school-average achievement.
KEYWORDS: academically selective schools, big-fish-little-pond effect, frameof-reference effects, German educational system, grade-on-a-curve effect,
multilevel modeling, self-concept
T
he concept of self is one of the oldest (James, 1890/1963) and most important constructs in the social sciences (Branden, 1994; Marsh & Craven,
2002, 2006). However, psychologists from the time of William James have recognized that objective accomplishments are evaluated in relation to frames of
reference (also see Festinger, 1954). Thus, James (1890/1963) indicated, “We
Marsh et al.
have the paradox of a man shamed to death because he is only the second
pugilist or the second oarsman in the world” (p. 310). Marsh (1974, 1984b,
1991, 1993, 2005; Marsh & Craven, 2002; Marsh & Parker, 1984) proposed the
big-fish-little-pond effect (BFLPE) to encapsulate frame-of-reference effects in
educational settings.
According to the BFLPE model, students compare their own academic
ability with the academic abilities of their classmates and use this social comparison impression as one basis for forming their own academic self-concepts. A negative BFLPE (a contrast effect) occurs where equally able
students have lower academic self-concepts when they compare themselves
to more able students and higher academic self-concepts when they compare themselves to less able students. For average-ability students who
attend a school where the average achievement level of other students is
high (hereafter referred to as a high-ability school), their academic abilities
HERB ERT W. MARSH is a professor of education at Oxford University, Department of HERBERT /
Education, Oxford OX2 6PY UK; e-mail: [email protected]. He is widely [cap]
published (350 articles in 70 journals, 55 chapters, 13 monographs, 350 conference
papers) and coedits the International Advances in Self Research monograph series
and is an ISI highly cited researcher. He founded the SELF Research Centre, with
members and satellite centers around the world. His areas of specialization include
self-concept, students’ evaluation of university teaching, and application of advanced
quantitative research methods to diverse areas of education and psychology.
ULRICH TRAUTWEIN is a research scientist at the Center for Educational Research at
the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin,
Germany; e-mail: [email protected]. His main research interests include
the effects of various characteristics of the learning environments
on student
achieve[author names
should all
have full cap initial
ment, self-concept, and interest.
cap for first letter of first and last name and
OLIVER LÜDTKE is a research scientist at the Center
Research
small for
capsEducational
for other letters
-- see other
at the Max Planck Institute for Human Development,
Lentzeallee
Berlin,
AERJ
articles 94,
for 14195
examples.]
Germany; e-mail: [email protected]. His main research interests include
the modeling of effects of classroom factors on individual development and the role
of personality development in school and university contexts.
JÜRGEN BAUMERT is the director of the Center for Educational Research at
the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin,
Germany; e-mail: [email protected]. His major research interests
include research on teaching and learning, large-scale assessment, cognitive and
motivational development in adolescence, and educational institutions as developmental environments.
OLAF KÖLLER is a full professor of educational research and director of the
Institute for Educational Progress at Humboldt University Berlin, Unterden Linden 6,
D-10099 Berlin, Germany; e-mail: [email protected]. His main research interests include educational assessment, the association between school achievement
and motivation, and social comparison processes.
2
Stability of the BFLPE
are below the average of other students in their school. According to the
BFLPE, this educational context will foster social comparison processes leading to academic self-concepts that are lower than if the same students
attended an average-ability school. Conversely, if these students attend a
low-ability school, their abilities will be above average in relation to other
students in the school, and the social comparison processes will result in
higher academic self-concepts. Hence, academic self-concepts depend not
only on a student’s academic accomplishments but also on the accomplishments of those in the educational setting that the student attends.
The BFLPE is typically operationalized (see Marsh & Craven, 2002) as a
path model (see Figure 1A) in which the effects of individual student achievement on academic self-concept are predicted to be positive and the effects of
school-average achievement are predicted to be negative. Empirical support
for this negative effect of school-average achievement on academic self-concept (the BFLPE) comes from numerous studies based on a variety of experimental and analytical approaches (see reviews by Marsh, 2005; Marsh &
Craven, 2002). Good support for the cross-cultural generalizability of the
BFLPE comes from studies conducted in Germany (e.g., Jerusalem, 1984;
Marsh, Köller, & Baumert, 2001), Hong Kong (e.g., Marsh, Kong, & Hau, 2000),
Israel (e.g., Zeidner & Schleyer, 1999), the United States (e.g., Marsh, 1991),
and Australia (e.g., Marsh, Chessor, Craven, & Roche, 1995). Whereas most
research has been limited to results from a single country, Marsh and Hau
(2003) tested the cross-cultural generalizability of the BFLPE with nationally
representative samples of approximately 4,000 15-year olds from each of 26
countries who completed the same self-concept instrument and achievement
tests as part of the Organization for Economic Cooperation and Development
(OECD) Program for International Student Assessment (PISA) study. Consistent
with a priori predictions, the standardized effects of individual student
achievement were substantial and positive (.38), and the effects of schoolaverage achievement were smaller and negative (–.21). Although the results
varied somewhat from country to country, this variation was small.
Distinctive Characteristics of the BFLPE
Multidimensionality and Domain Specificity
In self-concept research, there is growing support for a multidimensional
perspective of self-concept rather than a unidimensional perspective that
places primary reliance on a single global component of self-concept or selfesteem (Marsh & Craven, 2006). Particularly in educational settings, there is
increasing evidence that global self-esteem is nearly unrelated to a wide variety of educational outcomes (Marsh, 1993; Marsh & Craven, 1997). Consistent
with theoretical predictions and this growing support for a multidimensionsal
perspective, the BFLPE is very specific to academic components of selfconcept. Marsh and Parker (1984; Marsh, 1987) showed that there were large
negative BFLPEs for academic self-concept, but little or no BFLPEs on general
3
Marsh et al.
self-concept or self-esteem. Marsh et al. (1995) reported two studies of the
effects of participation in gifted and talented programs on different components of self-concept over time and in relation to a matched comparison group.
There was clear evidence for negative BFLPEs in that the academic self-concepts of students in the gifted and talented programs declined over time and
in relation to the comparison group. These BFLPEs were consistently large for
academic components of self-concept but were small and largely nonsignificant for four nonacademic self-concepts and for global self-esteem. Whereas
different studies have evaluated the BFLPE for different academic domains, the
most frequent tests are based on a global measure of academic self-concept
or on math self-concept (see review by Marsh & Craven, 2002).
Relation to Individual Student Ability
Do both the best and poorest students suffer the BFLPE? Marsh (1984a,
1987, 1991, 1993; Marsh et al., 1995; Marsh & Rowe, 1996) argued that attending selective schools should lead to reduced academic self-concepts for
students of all achievement levels, based on several different theoretical perspectives. In their review of BFLPE research, Marsh and Craven (2002; also
see Marsh & Hau, 2003) concluded that there is little evidence that the size
of the BFLPE varies systematically with individual student ability levels.
Juxtaposition Between Standardized Test Scores,
School Grades, and Grading on a Curve
It is important that tests of the BFLPE are based on achievement indicators that vary along a metric that is common to all individual students, classes,
and schools. For this reason, BFLPE studies typically use standardized achievement tests as the basis of both individual student and school-average achievement. This, however, creates a potential dilemma in that there is a growing
body of research demonstrating that academic self-concept is more strongly
related to school-based measures (e.g., school grades) than to standardized
achievement (Marsh, 1993; Marsh & Craven, 2006). This follows in that school
grades are a more direct source of feedback to students about their academic
accomplishments than test scores are—particularly ones that are not part of
the formal assessment process for which students do not prepare and do not
even receive feedback about their results. Also, school grades are likely to be
responsive to motivational processes (e.g., effort, conscientiousness, appropriate preparation, self-regulation, etc.) that mediate the relation between selfconcept and achievement—particularly when such characteristics are part of
the basis for assigning school grades. However, school grades typically reflect
an idiosyncratic metric that varies depending on the particular teacher, class,
or school. In particular, there is a widespread grading-on-a-curve phenomenon such that the percentage of students getting each grade does not vary
much in different schools—even when there are substantial differences in the
ability levels of students in the different schools; equally able students are
4
Stability of the BFLPE
likely to get lower grades in schools where the school-average achievement
levels are high than in schools where achievement levels of other students
are low.
An early BFLPE study (Marsh, 1987) shows how this grading-on-a-curve
phenomenon influences the BFLPE. Based on a large representative sample
of U.S. high schools, Marsh (1987; also see Marsh & Rowe, 1996) noted that
the BFLPE and the grading-on-a-curve effect have a similar rationale and are
mutually reinforcing. He found that the effects of school-average achievement
on academic self-concept were negative. The results also clarified the distinction between academic ability and grade-point average, their respective
influences on self-concept, and how this influenced the BFLPE. Consistent
with the grading-on-a-curve effect, equally able students have lower gradepoint averages in high-ability schools than in low-ability schools. This phenomenon is so widespread that university admission officers have to take it
into account when evaluating the high school transcripts of university applicants (Espenshade, Hale, & Chung, 2005). However, Marsh demonstrated that
this frame-of-reference effect influencing grade-point average was separate
from, but contributed to, the BFLPE on academic self-concept. More recently,
Trautwein, Lüdtke, Marsh, and Köller (2006) demonstrated a similar phenomenon in German schools.
Stability and Persistence of Effects
A critical issue with theoretical and practical implications is whether
BFLPEs are short term and ephemeral or stable and long lasting. In response
to the Marsh and Hau (2003) OECD PISA study, both Dai (2004) and Plucker
et al. (2004) suggested that the BFLPE might be temporary. Marsh and Hau
reviewed a number of studies suggesting that the size of the BFLPE remained
stable or even increased in size over time for students who remained in the
same selective school setting. In evaluating the stability of the BFLPE over time
(see Figure 1B), school-average ability is related to academic self-concept, collected on at least two occasions. Hence, the critical effects are the total, direct,
and mediated effects of school-average ability on academic self-concept at
Time 2 (T2). If the BFLPE were highly stable over time, then most of the effect
of school-average ability on T2 self-concept would be mediated through its
effect on Time 1 (T1) self-concept so that the direct effects of school-average
ability on T2 self-concept (represented by the corresponding path coefficient
in Figure 1B) would be close to zero. The direct effect of school-average ability could be positive even though the total effect was negative. This would
indicate that the negative indirect effect of school-average ability that was
mediated through T1 self-concept was offset to some extent by a positive direct
effect. To the extent that the direct effect of school-average ability is negative,
it would mean that there is a new (additional) negative effect of schoolaverage ability beyond the negative effect that can be explained in terms of
T1 academic self-concept.
5
Marsh et al.
A
Positive effects of Time 1
achievement on
academic self-concept
Time 1 individual
student
achievement
++
-
++
Time 1 schoolaverage
achievement
Time 1 individual
student academic
self-concept
negative effects of Time 1
school-average
achievement on academic
self-concept (BFLPE)
???
B
Time 1 individual
student
achievement
Positive effects of Time 1
achievement on
academic self-concept
++
++
Time 1 schoolaverage
achievement
-
Time 1 individual
student academic
self-concept
++
Time 2 individual
student academic
self-concept
Negative effectsof Time 1
school-average
achievement on academic
self-concept (BFLPE)
???
Figure 1. The Big-Fish-Little-Pond Effect (BFLPE). Theoretical predictions: (A)
traditional analysis based on a single wave of data, and (B) stability of BFLPEs
over time based on longitudinal data.
The Marsh, Köller, and Baumert (2001) study was particularly important
in demonstrating the temporal evolvement of the BFLPE. In 1991, former East
and West German students experienced a remarkable social experiment—the
reunification of very different school systems after the fall of the Berlin Wall.
Self-concept scores were collected at the start, middle, and end of the first
school year after reunification. East German students had not previously been
grouped according to achievement. For them, over the three data collection
waves, the BFLPE was initially small, then moderate, and then substantial by
the end of the year. West German students had attended schools based on
achievement grouping for the 2 years prior to reunification. For them, the
BFLPE was substantial at all three times. A large East-West difference in the
size of the BFLPE at the start of the year disappeared completely by the end
of the year. The BFLPE was stable for West German students who had previously been in academically selective schools, but for East German students
who had previously been in heterogeneous, mixed-ability schools, the size
of the BFLPE grew steadily larger during the first year of school following the
introduction of an ability-tracked school system.
6
Stability of the BFLPE
In a large Hong Kong study of students entering selective schools in
Grade 7 (Marsh, Kong & Hau, 2000), there was a substantial negative effect
of school-average ability in Grade 9 even after controlling for the substantial
negative effects in Grade 8. Also, in the large U.S. High School and Beyond
Study, Marsh (1991) demonstrated that for many outcomes there were new
negative effects of school-average achievement at the end of high school even
after controlling for those already experienced earlier in high school. This
research suggests that the BFLPE is likely to increase in size during the initial
period of adjustment after students are introduced to a major shift in the frame
of reference and provides no evidence that the effect declines during the
period students are in the same frame of reference.
Clearly, as long as students remain in the same high school and schoolaverage achievement is relatively stable so that the immediate frame of reference remains reasonably stable, there is ample evidence that the BFLPE
persists or may even increase in size. This is not surprising and is consistent
with the rationale underpinning the imposed social comparison paradigm
posited by Diener and Fujita (1997). Thus, for example, in classic research on
conformity to group norms, Sherif and Hovland (1961) reported that experimentally induced changes in the frame of reference were still evident when
participants subsequently made judgments after they were no longer part of
the original group. However, a more demanding challenge is to evaluate the
stability of the BFLPE on academic self-concept several years after graduation
from high school, when the frame of reference based on other students from
the same high school is not so salient and is no longer imposed by the immediate context. The current study is apparently unique in evaluating the stability of the BFLPE formed in high school for self-concept measures collected
at the end of high school and several years after graduation.
The Present Investigation
Our major focus is to evaluate the long-term stability and persistence of
the BFLPE. Based on two large, representative studies of German high school
students, we evaluate the effects of high-school-average achievement levels
on academic self-concepts at the end of high school and several years after
graduation, a time interval of 2 years in Study 1 and 4 years in Study 2. In
addition, because of the highly salient system of school tracking within the
German education system, we are able to contrast the effects of school-average achievement with those of school type (i.e., the effects associated with
attending highly academically selective schools, called the Gymnasium in the
German system).
Although the primary focus of our study is on individual and schoolaverage achievement based on a standardized achievement test, we also
examine the effects of school grades. Of particular interest is the extent to
which the negative effect of school-average achievement on academic selfconcept can be explained by the typical grade-on-a-curve phenomenon in
which equally able students receive lower grades in more academically
selective settings (Espenshade et al., 2005; Marsh, 1987).
7
Marsh et al.
An important feature of this research is the appropriate implementation
of a multilevel model analysis that is suitable for hierarchical data, in which
participants (e.g., students) are nested with naturally occurring groups (e.g.,
schools). Taking advantage of the appropriate tests of cross-level interactions
that are possible in multilevel models, we evaluate the generalizability of the
BFLPE by testing the extent to which the negative effects of school-average
achievement levels and school type (school-level variables) interact with individual student gender and student achievement levels (individual student-level
variables).
Based on research reviewed earlier (also see Figure 1), we propose the
following a priori hypotheses and research questions to guide our analyses
and presentation of results:
1. Both individual student achievement and school grades at the end of high
school have positive effects on academic self-concept at T1 (the end of high
school) and T2 (after graduation from high school). We ask whether additional
effects of T1 achievement on T2 academic self-concept remain after controlling for the effects of T1 academic self-concept (i.e., how changes in academic
self-concept over time are related to prior levels of achievement).
2. The effects of high-school-average achievement and high school type are negative for academic self-concept at T1 and T2. The effects of school-average
achievement are predicted to be systematically larger than school-type effects
and to be evident even after controlling for school type. However, we pose a
research question regarding the size and direction of school-type effects after
controlling for school-average achievement—whether the effects are merely
diminished substantially, as might be expected if school-type functions as a
rough, dichotomous proxy measure of school-average achievement, or whether
attending a prestigious school type contributes positively to self-concept (assimilation) after controlling for the negative BFLPE (contrast) associated with
school-average achievement. We also pose the research question of whether
these effects of school-average achievement and school type on T2 academic
self-concept remain after controlling for the effects of T1 academic self-concept
(i.e., how changes in academic self-concept over time are related to school-average achievement and school type).
3. School grades are a salient source of feedback to students about their academic
accomplishments. Because teachers typically grade on a curve, there is a frameof-reference effect for school grades (equally able students get lower school
grades in schools and classes where other students are particularly able) that is
similar to the BFLPE. Based on Marsh (1987), we predict that the BFLPE will be
partially mediated by school grades—that the sizes of the negative effects of
school-average achievement and school type are expected to be statistically significant but smaller after controlling for the effects of school grades.
4. As a research question, we ask whether the negative effects of school-average
achievement and school type are generalizable over responses by boys and girls
and over different levels of academic achievement. Although cross-level interactions between these school-level variables and individual student-level variables are typically small (see Marsh & Craven, 2002), there is not a sufficient
basis for making directional hypotheses. However, the generalizability of the
BFLPE across different groups of students has potentially important practical
implications for understanding the BFLPE.
8
Stability of the BFLPE
Study 1
Method
The German School Setting
The German school system is well known for its early and selective differentiation of students in different school types. Selection takes place after
Grade 4 (in a few states after Grade 6), when students are about age 10.
Although there is considerable variation across the German states in terms of
the number and quality of tracks (Baumert, Trautwein, & Artelt, 2003;
Trautwein et al., 2006), the majority of states have adopted a variant of the
tripartite system of Hauptschule (least-demanding track), Realschule (intermediate track), and Gymnasium (highest track). Hauptschule and Realschule
students graduate after Grade 9 or 10 and then typically enter the dual system, which combines part-time education at vocational school with on-thejob training. Students from traditional Gymnasium graduate after Grades 12
or 13, which is a prerequisite for university entrance. To overcome the strict
differentiation between Gymnasium and the other school types, several
German states now allow the best students from different tracks to enter
upper-secondary education at the vocational Gymnasium and the comprehensive school in order to qualify for university. Approximately 30% of students in the German school system qualify to go to university. Because
non-Gymnasium students leave secondary school after Grade 9 or 10, it was
not possible to include these students in this investigation.
Description of the Study and Sample
The data for Study 1 come from a large, ongoing German study (Transformation of the Secondary School System and Academic Careers, or TOSCA),
conducted at the Max Planck Institute for Human Development, Berlin, and
the Institute for Educational Progress at the Humboldt University, Berlin (see
Köller, Watermann, Trautwein, & Lüdtke, 2004). The data considered here are
based on students from 147 randomly selected upper secondary schools in a
single German state. The schools are representative of the traditional and vocational Gymnasium school types, which provide students with the qualifications
to attend university. Students in the traditional Gymnasium are a particularly
highly selected sample in terms of academic achievement.
A multistage sampling procedure was implemented to ensure that the
data were representative. Schools and students were randomly selected. The
participation rate at the school level was 99%, and a satisfactory participation
rate of more than 80% was achieved at the student level. At T1, the students
in the sample were in their final year of upper secondary schooling, with a
mean age of 19.51 years (SD = 0.77). Two trained research assistants administered materials in each school between February and May 2002. Students participated voluntarily, without any financial reward. At T1, all students were
asked to provide written consent to be contacted again later for a second wave
9
Marsh et al.
of data collection. At T2, participants were asked to complete an extensive
questionnaire, taking about 2 hours, in exchange for a financial reward of 10€
(about U.S. $12). Because the focus of this investigation is on the stability of
effects over time, actual analyses are based on responses by the 2,306 (48%)
students who completed the math self-concept instrument at T1 (final year of
high school) and T2 (2 years after graduation from high school). However, T1
school-average achievement was based on responses by all students from each
of the 147 schools who completed the achievement tests administered at T1.
Comparing students in the final sample with those in the original sample, students responding at T2 were more likely to be female, to be younger in age,
to be more mathematically able based on both test scores and school grades,
to come from a more academically selective school, and to have a higher math
self-concept (all effect sizes were modest, varying between .1 and .3).
Instruments
Math self-concept. Math self-concept was based on the German adaptation of the Self Description Questionnaire III, a multidimensional self-concept
instrument for late adolescents and young adults based on the Shavelson,
Hubner, and Stanton (1976; Marsh & Shavelson, 1985) model. In the German
adaptation (Schwanzer, Trautwein, Lüdtke, & Sydow, 2005), four researchers
with English as a second language translated all original items independently
of each other. Because the translation was intended to be used with a diverse
sample of students, there was an emphasis on using simple, common words
that would be easily understood by all students. Subsequently, using the assistance of a professional translator, the most appropriate translation was chosen
(and in some instances refined). Extensive pilot testing resulted in a short
German instrument with four items per scale with a 4-point (disagree to agree)
response format. The four items chosen per scale emphasized cognitive evaluations (e.g., “I’m good at mathematics”) rather than affective items (e.g., “I
like mathematics”) and had the highest factor loadings on their respective factors. The math self-concept items were administered at both T1 and T2. Marsh,
Trautwein, Lüdtke, Köller, and Baumert (2006) demonstrated strong support
for the convergent and discriminant validity of responses to this math self-concept scale. Thus, for example, math self-concept was substantially positively
related to math school grades (.71), math standardized achievement test scores
(.59), and taking advanced math courses (.51) but was nearly unrelated or even
negatively related to English and German outcomes.
Academic outcome measures. The mathematics achievement test consisted
of original items from the Third International Mathematics and Science Study
(TIMSS; e.g., Baumert, Bos, & Lehmann, 2000). A total mathematics achievement score was constructed using the original metric of the TIMSS study.
Although the test was low stakes (i.e., the results had no consequences and
student received no feedback on the results), seperate experimental studies
10
separate
Stability of the BFLPE
with random assignment to conditions showed that the addition of rewards
contingent on informational feedback, grading, and performance had no effect
on test performances (Baumert & Demmrich, 2001). However, in a separate
experimental study with random assignment to conditions, Baumert and
Demmrich (2001) showed that the addition of rewards contingent on informational feedback, grading, and performance had no effect on test performances.
All schools used a school grade scale that ranged between 0 points (very
low achievement) to 15 points (very high achievement). Grades were based
on the school grades that students had received on their report cards that covered work completed over approximately 6 months. Actual grades (based on
official school records) were available for students from a majority—but not
all—of schools considered in Study 1. The correlation between self-reported
grades and documented grades was r = .93 (p < .001) for students who had
nonmissing values for both constructs. However, to minimize missing values
and to maximize the similarity between Studies 1 and 2 (actual grades were
not available in Study 2), self-reported grades were used in both studies.
School-average measures of achievement were always based on test scores
rather than on school grades.
Statistical Analyses
Recent BFLPE studies (e.g., Lüdtke, Köller, Marsh, & Trautwein, 2005;
Marsh & Hau, 2003) have used multilevel modeling approaches. In most studies conducted in school settings, individual participant characteristics are confounded with those associated with groups (e.g., classrooms, schools, etc.)
because such groups typically are not established according to random
assignment. This clustering effect entails problems with respect to the appropriate levels of analysis, aggregation bias, and heterogeneity of regression
(Raudenbush & Bryk, 2002). Participants in the same group are typically more
similar to other participants in the same group than they are to participants in
other groups. Even when participants are initially assigned at random, they
tend to become more similar to each other over time. Furthermore, a variable
may have a very different meaning when measured at different levels. For
example, the BFLPE research reviewed here suggests that a measure of ability
at the student level provides an indicator of a student attribute, whereas
school-average achievement at the school level becomes a proxy measure of
a school’s normative environment. Thus, the average achievement of a school
has an effect on student self-concept above and beyond the effect of the individual student’s ability. Multilevel modeling is designed to resolve the confounding of these two effects by facilitating a decomposition of any
observed relationship among variables, such as self-concept and ability, into
separate within-school and between-school components (see Goldstein, 2003;
Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). The juxtaposition of the
effects of individual achievement and class-average achievement is inherently
a multilevel issue that cannot be represented adequately at either the individual
11
Marsh et al.
or the classroom level. A detailed presentation of multilevel modeling (also
known as hierarchical linear modeling) is available elsewhere (e.g., Raudenbush
& Bryk, 2002; Snijders & Bosker, 1999).
By basing the analyses on responses by participants with both T1 and
T2 data, there were very few missing data (less than 1% missing responses).
Although missing data was not an important problem for the final sample,
we used the expectation maximization algorithm, a widely recommended
approach to imputation for missing data, as operationalized in SPSS (Version
11.5) to impute missing values for T1 math self-concept (T1MSC), math
grades, and math test scores. This procedure is preferable to traditional procedures such as listwise and pairwise deletion for missing data.
Based on strong theoretical models, it is appropriate and desirable to
posit causal effects and to test causal predictions based on appropriate statistical analyses. However, even in rigorous studies, causality cannot be
proven; it can only be shown that the data are consistent with strong tests of
causality. Hence, we have been careful to use the generic term predictive
effect, which does not imply causality, when discussing the actual results of
our statistical analyses.
Multilevel Models
All multilevel analyses in this investigation were conducted with the commercially available MLwiN statistical package (Version 2; see Rasbash, Steele,
Browne, & Prosser, 2005; also see Goldstein, 2003) that is specifically designed
to analyze multilevel data. To test the a priori predictions and research questions, a set of eight multilevel models was evaluated. In each model, either
T1MSC or T2 math self-concept (T2MSC) was the dependent variable. The
overall strategy was to begin with the most basic tests of the BFLPE and then
evaluate the consequences of adding variables or interaction terms in relation
to a priori predictions or research questions. In this sense, all the models are
a priori models.
In Models 1, 2, and 3, we tested the effects of the BFLPE based on individual achievement test scores in combination with school-average achievement (Model 1, the traditional basis of the BFLPE), school type (1 = traditional
Gymnasium, a selective school; 0 = other), and the combination of both
school-average achievement and school type. The purpose of these comparisons was to determine whether attending a selective school has a positive
(assimilation) effect on self-concept that offsets the negative (contrast) effect
of school-average achievement.
For each of these models, separate analyses were conducted for T1MSC
(Models 1A, 2A, and 3A), T2MSC (Models 1B, 2B, and 3B), and T2MSC with
T1MSC included as a predictor variable (Models 1C, 2C, and 3C). The purpose of these models was to compare the sizes of effects near the end of high
school (T1) and several years after graduation from high school (T2) and to
determine whether the effects of school-average achievement continued to
have negative effects at T2 after controlling for the effects observed at T1.
12
Stability of the BFLPE
Models 4, 5, and 6 paralleled the first three models except that they
included both achievement tests scores (as in the first three models) and high
school grades as predictor variables. The purpose of these models was
to compare the effects of test scores and school grades on academic selfconcept at T1, T2, or T2 controlling for T1. Particularly relevant is the extent
to which the BFLPE associated with school-average achievement is mediated
by school grades.
In Model 7, we explore whether the size of the BFLPE based on schoolaverage achievement is moderated by gender or individual achievement,
whether the effect is larger or smaller for girls than for boys, and whether the
effect is larger or smaller for students who are initially more able. This is
accomplished by adding appropriate main and interaction effects to those predictor variables in Model 4. In Model 8, we conduct parallel analyses in which
the BFLPE is based on school type (selective vs. nonselective) rather than
school-average achievement.
To facilitate the interpretation of the results and to reduce potential multicollinearity problems, we began by standardizing (z scoring) all variables to
have M = 0 and SD = 1 across the entire sample (Appendix, Study 1). Schoolaverage achievement was determined by taking the average of achievement
scores for students in each school (but not restandardizing these scores so that
individual student and school-average achievement scores were in the same
metric). Product terms were used to test interaction effects. In constructing
these product variables, we used the product of standardized (z score) variables, but the product terms were not restandardized. In this respect, all parameter estimates based on multilevel models in this investigation are
standardized parameter estimates in which the standardized metric is based on
the metric of the individual-level variables (for further discussion, see Marsh &
Rowe, 1996; also see Aiken & West, 1991; Raudenbush & Bryk, 2002). Total
effects can be divided into direct and indirect (or mediated) effects when there
is a potential mediating variable. In this investigation, for example, we predict
that the negative effects of school-average ability on T2MSC will be largely
mediated by T1MSC. It is, of course, relevant to evaluate the statistical significance of mediated effects using appropriate procedures such as those outlined
by Krull and MacKinnon (2001) that are applied to multilevel models. In this
investigation, whenever we discuss mediated effects, we tested the statistical
significance of these effects using what Krull and MacKinnon (p. 259) refer to
as the 2-1-1 model in which the first variable in the causal chain is a grouplevel variable (school-average ability in this investigation).
Preliminary Analyses: School-Level Variation in Test Scores,
Grades, and Self-Concept
In preliminary analyses, we evaluated the multilevel structure of math
achievement, math school grades, and math self-concept. Consistent with a
priori predictions based on previous research, there was substantial variation
between schools in terms of math achievement but substantially less variation
13
Marsh et al.
between schools in math school grades and particularly in math self-concept.
These differences are illustrated in terms of the caterpillar plots (Figure 2).
Thus, for example, in Figure 2A, the 147 schools (each school is represented
by one of the 147 vertical lines) are ranked in terms of math achievement, and
a 95% error bar shows how variable scores are within a given school. For math
achievement, many schools are clearly above or below the grand mean across
all schools (zero, because these scores were standardized). School intercepts
vary from more than a standard deviation below the mean to half a standard
deviation above the mean. Consistent with the grade-on-a-curve effect, there
is much less variation in school grades (Figure 2B). In contrast to math achievement, there are only a few error bars representing different schools that do not
overlap with the mean grade across all schools. Consistent with the BFLPE,
there is even less school-to-school variation in math self-concepts (Figure 2C).
The intraschool correlation is an index of the amount of variation in each outcome that can be explained in terms of differences between schools. Whereas
the intraschool correlation is substantial for math achievement (.26), it is much
smaller for math school grades (.04) and even smaller for math self-concept
(.02). In summary, consistent with the grade-on-a-curve effect and the BFLPE,
there is substantial school-to-school variation in math achievement scores but
much less variation in math school grades and math self-concept.
Results and Discussion
The BFLPE
In each of a set of multilevel models, math self-concept responses are
predicted from different combinations of individual-level and school-level
variables specifically constructed to assess a priori hypotheses and research
questions. For each model, three sets of analyses were conducted in which
the dependent variable was T1MSC (Table 1, Models 1A, 2A, and 3A), T2MSC
(Table 1, Models 1B, 2B, and 3B), and T2MSC in which T1MSC was included
as a predictor variable to assess change in math self-concept (Table 1, Models
1C, 2C, and 3C).
The predictive effect of individual achievement is substantial for T1MSC
(.68; Model 1A in Table 1) and for T2MSC (.63; Model 1B, Table 1). In Model
1C (Table 1), T1MSC has the largest predictive effect on T2MSC (.73), but the
predictive effect of individual achievement is still highly significant (.15).
Hence, individual achievement test scores from T1 have a significant direct
predictive effect on T2MSC beyond the substantial indirect predictive effect
that is mediated by T1MSC.
For present purposes, the most important results of Model 1 are the predictive effects of school-average achievement, the BFLPE. This predictive
effect is statistically significant, substantial, and negative for T1MSC (–.39,
Model 1A) and nearly as large for T2MSC (–.34, Model 1B). Hence, there is
a substantial negative predictive effect of school-average achievement on
T2MSC collected 2 years after graduation from high school. Furthermore,
14
Stability of the BFLPE
Figure 2. School-to-school variation. (A) Math achievement: Each of the 147
vertical lines represents the school intercept (in terms of math achievement) with
+/– 1.96 SEs). Schools are ranked in terms of math achievement (from
an error bar (+
lowest to highest). (B) Math school grades: Each of the 147 vertical lines repre+/– 1.96
sents the school intercept (in terms of math grades) with an error bar (+
SEs). (C) Math self-concept: Each of the 147 vertical lines represents the school
+/– 1.96 SEs). Individual
intercept (in terms of math self-concept) with an error bar (+
student scores for achievement, grades, and self-concept were standardized (M =
0, SD = 1) across all students.
15
Marsh et al.
even after controlling for the substantial negative predictive effect of schoolaverage achievement on T1MSC, school-average achievement still has a
small, statistically significant, negative predictive effect on T2MSC (–.07, Model
1C). Hence, school-average achievements from T1 have a small (statistically
significant) negative direct predictive effect on T2MSC beyond the substantial indirect negative predictive effect that is mediated by T1MSC. This implies
that school-average achievement has new negative predictive effects on
math self-concept after graduation from high school, beyond what can be
explained in terms of the earlier negative predictive effects on T1MSC collected at the end of high school. Hence, at least from this perspective, the
negative predictive effect of school-average achievement grows larger after
graduation from high school rather than diminishing.
Model 2 (Table 1) is essentially parallel to Model 1 except that the BFLPE
is represented by school type (i.e., attending a highly selective traditional
Gymnasium) rather than school-average achievement. Whereas the negative
predictive effects of school type in Model 2 are systematically smaller than
the corresponding predictive effects based on school-average achievement
in Model 1, the pattern and direction of predictive effects are similar. The
only major difference is that the negative predictive effect of school type on
T2MSC after controlling for T1MSC (Model 2C) is not statistically significant
(whereas the corresponding predictive effect of school-average achievement
in Model 1 is significantly negative).
In Model 3 (Table 1) we included both school-average achievement and
school type as predictor variables. Not surprisingly, given that these variables
are substantially correlated, the unique predictive effects of each are systematically smaller than in the corresponding analysis, in which only one or the
other of these school-level variables was included. However, what is interesting is the fact that both of these school-level variables have significantly negative predictive effects on T1MSC (Model 3A) and T2MSC (Model 3B). Thus,
for example, whereas the negative predictive effect of school-average achievement was –.39 in Model 1A (which did not include school type), the corresponding predictive effect was –.25 in Model 3A (which did include school
type). Similarly, whereas the negative predictive effect of school type was –.20
in Model 2A (which did not include school-average achievement), the corresponding predictive effect was –.12 in Model 3A (which did include schoolaverage achievement). In summary, the results of Models 1, 2, and 3 provide
clear support for the BFLPE and for the main a priori prediction of the present
investigation. The predictive effect of school-average achievement (the BFLPE)
is substantially negative at the end of high school and continues to be substantially negative 2 years after graduation from high school. Furthermore,
there are new negative predictive effects of school-average achievement on
T2MSC beyond the substantial negative predictive effects that are mediated by
T1MSC. Hence, the predictive effects of school-average achievement and
school type continue to have substantial negative predictive effects on math
self-concept several years after graduation from high school. On this basis, we
conclude that the BFLPE is a persistent, long-lasting phenomenon.
16
17
Model 1A
T1MSC
b (SE)
Model 1B
T2MSC
b (SE)
Model 1C
∆T2MSC
b (SE)
Model 1: SchoolAverage Achievement
Model 2A
T1MSC
b (SE)
Model 2B
T2MSC
b (SE)
Model 2C
∆T2MSC
b (SE)
Model 2: School Type
Model 3A
T1MSC
b (SE)
Model 3B
T2MSC
b (SE)
Model 3C
∆T2MSC
b (SE)
Model 3: School-Average
Achievement and School Type
(continued)
Fixed effects
Level 1: Individual student
predictors
Math tests
.68 (.02)*
.63 (.02)*
.15 (.02)*
.65 (.01)*
.60 (.02)*
.14 (.02)*
.68 (.04)*
.64 (.02)*
.15 (.02)*
T1 math self-concept
.71 (.02)*
.71 (.02)*
.71 (.04)
Level 2: School-level
predictors
School-average math test –.39 (.04)* –.34 (.04)* –.07 (.03)*
–.25 (.04)* –.23 (.05)* –.05 (.03)
School type
–.20 (.02)* –.17 (.02)* –.03 (.02)
–.12 (.02)* –.10 (.02)* –.01 (.02)
Residual variance components
Level 2 school
.02 (.007)* .01 (.006)
.00 (.003)
.01 (.006)* .01 (.006)
.00 (.003)
.01 (.005)
.00 (.005)
.00 (.003)
Level 1 students
.63 (.019)* .68 (.021)* .36 (.013)* .63 (.019)* .68 (.021)* .36 (.011)* .63 (.019)* .68 (.021)* .36 (.011)*
Deviance summary
5,530.0
5,673.2
4,176.0
5,530.7
5,679.0
4,178.0
5,505.2
5,656.3
4,175.6
Variables
Table 1
Study 1: Stability of the Big-Fish-Little-Pond Effect: Effects of School-Average Achievement,
School Type, and Stability Over Time
18
.44 (.02)*
.44 (.02)*
.15 (.02)*
.13 (.02)*
.63 (.02)*
Model 4C
∆T2MSC
b (SE)
.42 (.02)*
.44 (.02)
Model 5B
T2MSC
b (SE)
.14 (.02)*
.13 (.02)*
.63 (.02)*
Model 5C
∆T2MSC
b (SE)
.46 (.01)*
.48 (.01)*
Model 6A
T1MSC
b (SE)
.44 (.02)*
.43 (.02)*
Model 6B
T2MSC
b (SE)
.15 (.02)*
.13 (.02)*
.63 (.02)
Model 6C
∆T2MSC
b (SE)
–.13 (.03)* –.12 (.04)* –.04 (.03)
–.10 (.01)* –.11 (.02)* –.03 (.01)* –.10 (.02)* –.08 (.02)* –.01 (.02)
.44 (.01)*
.49 (.01)*
Model 5A
T1MSC
b (SE)
Model 6: School-Average
Achievement and School Type
.01 (.005)* .01 (.005) .00 (.003) .01 (.004)* .00 (.004) .00 (.003) .00 (.004) .00 (.004) .00 (.003)
.44 (.013)* .52 (.016)* .35 (.010)* .44 (.013)* .52 (.016)* .35 (.010)* .44 (.013)* .52 (.016)* .35 (.010)*
4,687.0
5,058.9
4,110.2
4,674.7
5,054.6
4,111.0
4,663.1
5,046.0
4,109.5
–.25 (.03)* –.21 (.04)* –.06 (.03)*
.46 (.02)*
.49 (.02)*
Model 4B
T2MSC
b (SE)
Model 5: School Type
Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept ; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; schoolaverage math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the
individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are
based on responses by 2,306 students who completed the math self-concept instrument at Time 2.
* p < .01.
Fixed effects
Level 1: Individual student
predictors
Math tests
Math grades
T1MSC
Level 2: School-level
predictors
School-average math test
School type
Residual variance components
Level 2 school
Level 1 students
Deviance summary
Variables
Model 4A
T1MSC
b (SE)
Model 4: SchoolAverage Achievement
Table 1 (continued)
Stability of the BFLPE
The Predictive Effect of School Grades on the BFLPE
Models 4, 5, and 6 (Table 1) largely parallel Models 1, 2, and 3. The main
difference is that individual student grades are also included in each of the
models as a predictor of math self-concept. Not surprisingly, math grades
have a substantial positive predictive effect on math self-concept, and a substantial portion of the predictive effect of math achievement on math selfconcept (as shown in Model 1) can be explained in terms of math grades.
Thus, for example, in Model 4A the combined predictive effect of individual
achievement (.46) and grades (.49) is substantially larger than the predictive
effect of either of these individual student variables considered alone. Of particular relevance is the BFLPE—the predictive effects of the school-level variables. As predicted on the basis of the grade-on-a-curve effect, the BFLPE is
systematically smaller in models that include school grades. Equally able students get lower grades in schools where the school-average achievement
level is higher and, in this investigation, in academically selective schools (i.e.,
the Gymnasium schools). Hence, part of the negative predictive effect of these
school-level variables is mediated by school grades. However, the predictive
effects of school-average achievement (Model 4) and school type (Model 5)
are still significantly negative for both T1MSC and T2MSC. Furthermore, the
predictive effects of school-average achievement (Model 4C) and school type
(Model 5C) continue to have a significant predictive effect on T2MSC even
after controlling for T1MSC.
Generalizability of the BFLPE: Interactions With
Gender and Individual Achievement
In Models 7 and 8 (Table 2), we added individual student gender and
interaction effects involving gender and individual student ability to models
already considered. Whereas girls tend to have lower math self-concepts than
boys do, this gender difference only reaches statistical significance in Model
7A. Gender does, however, interact with individual student achievement such
that the predictive effect of math achievement on math self-concept is somewhat stronger for girls than for boys. The BFLPE (negative predictive effects
of school-average achievement in Model 7 and school type in Model 8) also
varies somewhat as a function of gender. In each case, the BFLPE is somewhat larger for girls than for boys. Of particular relevance to this investigation, the negative predictive effects of school-average achievement (Model 7)
and school type (Model 8) do not vary substantially with individual student
achievement. Whereas there is a marginally significant interaction with school
type for T1MSC (Model 8A), the predictive effect is not significant for T2MSC,
nor is the interaction with school-average achievement significant for T1MSC
or T2MSC (Model 7). In summary, the results of these extended models
suggest that the BFLPE is robust, generalizing reasonably well at different
levels of individual student achievement and gender.
19
20
–.20 (.04)*
–.10 (.04)*
.00 (.03)
–.24 (.04)*
–.10 (.02)*
.01 (.03)
.01 (.005)
.52 (.016)*
5,044.7
–.03 (.02)
.06 (.02)*
–.03 (.01)*
.04 (.02)*
.01 (.004)*
.44 (.013)*
4,674.0
.46 (.02)*
.44 (.01)*
Model 7B
T2MSC
b (SE)
.45 (.02)*
.49 (.02)*
Model 7A
T1MSC
b (SE)
.00 (.003)
.35 (.010)*
4,104.8
–.04 (.03)
.00(.03)
–.06 (.03)
.15 (.02)*
.13 (.02)*
.63 (.02)*
–.01 (.01)
.03 (.01)*
Model 7C
∆T2MSC
b (SE)
Model 7: School-Average Achievement
.01 (.004)*
.44 (.013)*
4,649.7
–.05(.02)*
.04 (.02)*
.13 (.02)*
–.01 (.02)
.04 (.02)*
.44 (.02)*
.48 (.01)*
Model 8A
T1MSC
b (SE)
.00 (.004)
.51 (.016)*
5,024.6
–.07(.02)*
.03 (.02)
–.11(.02)*
–.01 (.02)
.06 (.02)*
.43 (.02)*
.43 (.02)*
Model 8B
T2MSC
b (SE)
Model 8: School Type
.00 (.003)
.35 (.010)*
4,100.1
–.04(.02)*
.01 (.01)
–.03(.01)*
.15 (.02)*
.13 (.02)*
.62 (.02)*
–.00 (.01)
.03 (.01)*
Model 8C
∆T2MSC
b (SE)
Note.T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC. School-average
math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based
on responses by 2,306 students who completed the math self-concept instrument at Time 2.
*p < .05.
Fixed effects
Level 1: Individual student predictors
Math test
Math grades
T1MSC
Sex (0 = male, 1 = female)
Sex × Math Test
Level 2: School-level predictors
School-average math test
School type
Cross-level interactions
School-Average Math Test × Sex
School-Average Math Test × Math Test
School Type × Sex
School Type × Individual Achievement
Residual variance components
Level 2 school
Level 1 students
Deviance summary
Variables
Table 2
Study 1: Generalizability of the Big-Fish-Little-Pond Effect: Interactions With Gender and Individual Achievement
Stability of the BFLPE
Study 2
As described earlier, the primary purpose of Study 2 is to extend tests of
the stability of the BFLPE over a longer period of time. The main difference
between the two studies is that the T1-T2 interval is 4 years in Study 2 compared
to 2 years in Study 1. Despite this substantially longer interval, a priori predictions and research questions are the same in both studies (see earlier discussion).
Method
Description of the Study and Sample
Study 2 is based on a subsample of the longitudinal Learning Processes,
Educational Careers and Psychosocial Development in Adolescence study.
This investigation was conducted by the Max Planck Institute for Human
Development in Berlin (for more general descriptions of the study and resulting database, see Baumert et al., 1996). The data analyzed in this article are
based on students from 94 randomly selected upper secondary schools in four
German states: two in East Germany and two in West Germany. Schools were
randomly sampled in each participating state. The schools are representative of
the traditional Gymnasium school type and the comprehensive upper secondary school; both school types provide students with the qualifications to
attend university. Gymnasium students in the traditional Gymnasium are a particularly highly selected sample in terms of academic achievement. Trained
research assistants administered materials in each school at the end of the
1996–1997 school year (T1). At T1, all students were asked to provide written
consent to be contacted again for a second wave of data collection. At T2,
4 years later, participants were asked to complete an extensive questionnaire
taking about 2 hours in exchange for a financial reward of 15€ (about U.S. $18).
Because our focus is on the stability of predictive effects over time,
actual analyses are based on responses by the 1,758 (42%) students who
completed the math self-concept instrument at both T1 (1 year prior to graduation from high school) and T2 (3 years after graduation from high school).
The mean age of students at T1 was 18.55 (SD = 0.56). However, as in Study
1, T1 school-average achievement was based on responses by all students
from each of the 94 schools who completed the achievement tests administered at T1. As in Study 1, students in the final sample compared to students
in the original sample were more likely to be female, to be younger in age,
to be more mathematically able based on both test scores and school grades,
to come from a more academically selective school, and to have a higher
math self-concept (all effect sizes were modest, varying between .1 and .3).
Instruments
Math self-concept. T1MSC was assessed using a short, 5-item scale. This
standard German measure of the construct has been shown to be reliable and
valid in large-scale survey studies conducted over the past 25 years (Möller &
21
Marsh et al.
Köller, 2001, 2004; Trautwein, Lüdtke, Köller, & Baumert, 2006). An example
item is, “Nobody’s perfect, but I’m just not good at math.” Students responded
to each item on a 4-point scale (1 = totally agree to 4 = totally disagree). Total
scores were computed by aggregating all items, resulting in scores ranging
from 1 (low self-concept) to 4 (high self-concept). T2MSC was assessed by
the German adaptation of the Self Description Questionnaire III math selfconcept instrument that was used in Study 1. To test the comparability of the
math self-concept scales used at T1 and T2 in Study 2, a small subsequent
study was conducted. A self-concept questionnaire that included the two
scales was administered to 115 university students (54.8% women) from various fields of study at a Berlin university. The nine math self-concept items
were subjected to a confirmatory factor analysis in which the five items from
one scale were used to define one latent factor and the four items from the
second scale defined a second latent factor. The latent correlation between
the two factors was r = .97, indicating that both scales measured a similar
underlying math self-concept construct.
Academic outcome measures. The items on the math achievement test
were taken from previous national and international studies, in particular the
International Association for the Evaluation of Educational Achievement’s
First and Second International Mathematics Study, the Third International
Mathematics and Science Study (Beaton et al., 1996; Husén, 1967; Robitaille
& Garden, 1989) and an investigation conducted at the Max Planck Institute
for Human Development (Baumert, Roeder, Sang, & Schmitz, 1986).
Curriculum experts had assessed the curricular validity of all items beforehand. Individual achievement scores were calculated on the basis of item
response theory. Internal consistency estimates of reliability were above .80.
School grades in mathematics were based on student self-reports of grades
from their previous report cards.
Statistical Analyses
Statistical analyses were based on multilevel analyses of the same set of
a priori path models considered in Study 1. Because we based analyses on
responses by participants with both T1 and T2 data, there were few missing
data, and we again used the expectation maximization algorithm to impute
missing values for T1MSC, math grades, and math test scores. As described in
Study 1, we began by standardizing (z scoring) all variables to have M = 0 and
SD = 1 across the entire sample (Appendix, Study 2). School-average achievement was determined by taking the average of achievement scores for students
in each school (but not restandardizing these scores so that individual student
and school-average achievement scores were in the same metric). Product
terms were used to test interaction. Again, intraschool correlations indicated
that school-level variation was substantial for math achievement test scores
(.23) but substantially less for math school grades (.07) and particularly for
math self-concept (.02).
22
Stability of the BFLPE
Results and Discussion
The same set of models evaluated in Study 1 is considered in Study 2.
Again, the main focus of these analyses is on the predictive effects of schoollevel variables (school-average achievement and school type) on math selfconcept assessed at 1 year prior to the completion of high school (T1) and
3 years after graduation from high school (T2).
The BFLPE
The predictive effect of individual achievement is substantial on T1MSC
(.47; Model 1A in Table 3) and T2MSC (.46; Model 1B, Table 3). In Model 1C
(Table 3), T1MSC has the largest predictive effect on T2MSC (.61), but the predictive effects of individual achievement are still highly significant (.18). Hence,
individual achievement test scores from T1 have a significant direct predictive
effect on T2MSC beyond the substantial indirect predictive effect that is mediated by T1MSC.
The BFLPE (the negative predictive effect of school-average achievement
in Model 1) is statistically significant and substantial for T1MSC (–.28, Model
1A) and for T2MSC (–.21, Model 1B). Hence, there is a substantial negative
predictive effect of school-average achievement on T2MSC collected 4 years
later. However, after controlling for the substantial negative predictive effect
of school-average achievement on T1MSC, the negative predictive effect of
school-average achievement (based on T1 achievement) on T2MSC (–.03,
Model 1C) is not statistically significant. This implies that school-average
achievement has no additional negative predictive effects on math selfconcept after graduation from high school beyond what can be explained in
terms of the earlier negative predictive effects on T1MSC collected 1 year
before the end of high school.
Model 2 (Table 3) is essentially parallel to Model 1 except that the BFLPE
is represented by school type (i.e., attending an academically selective high
school) rather than school-average achievement. Whereas the negative predictive effects of school type in Model 2 are systematically smaller than the
corresponding predictive effects based on school-average achievement in
Model 1, the pattern and direction of predictive effects are similar.
In Model 3, both school-average achievement and school type are
included as predictor variables. Because these variables are substantially correlated, the unique predictive effects of each are smaller than in the parallel
analysis in which the other was not included. Nevertheless, both of these
school-level variables had significantly negative predictive effects on T1MSC
(Model 3A) and T2MSC (Model 3B). Thus, for example, whereas the negative predictive effect of school-average achievement was –.28 in Model 1A
(which did not include school type), the corresponding predictive effect was
–.24 in Model 3A (which did include school type).
In summary, the results of Models 1, 2, and 3 provide clear support for
the BFLPE and for the main a priori prediction of this investigation. The predictive effects of school-average achievement and school type (the BFLPE) are
23
24
Model 1A
T1MSC
b (SE)
Model 1B
T2MSC
b (SE)
Model 1C
∆T2MSC
b (SE)
Model 1: SchoolAverage Achievement
Model 2A
T1MSC
b (SE)
Model 2B
T2MSC
b (SE)
Model 2C
∆T2MSC
b (SE)
Model 2: School Type
Model 3A
T1MSC
b (SE)
Model 3B
T2MSC
b (SE)
Model 3C
∆T2MSC
b (SE)
Model 3: School-Average
Achievement and School Type
(continued)
Fixed effects
Level 1: Individual student
predictors
Math grades
.47 (.02)*
.46 (.02)*
.18 (.02)*
.45 (.02)*
.44 (.02)*
.18 (.02)*
.47 (.02)*
.46 (.02)*
.18 (.02)*
T1MSC
.61 (.02)*
.61 (.02)*
.61 (.02)*
Level 2: School-level
predictors
School-average math test –.28 (.06)* –.21 (.05)* –.03 (.04)
–.24 (.05)* –.15 (.06)* –.01(.05)
School type
–.09 (.02)* –.08 (.02)* –.03 (.02)
–.04 (.03)
–.05 (.02)* –.02 (.02)
Residual variance
components
Level 2 school
.01 (.008)
.00 (.006)
.00 (.004)
.02 (.009)* .00 (.006)
.00 (.001)
.01 (.008)
.00 (.001)
.00 (.004)
Level 1 students
.80 (.028)* .81 (.028)* .51 (.017)* .80 (.028)* .82 (.028)* .51 (.018)* .80 (.028)* .81 (.027)* .51 (.018)*
Deviance summary
4,628.2
4,630.6
3,812.7
4,637.1
4,633.1
3,811.4
4,626.1
4,626.9
3,811.43
Variables
Table 3
Study 2: Stability of the Big-Fish-Little-Pond Effect: Effects of School-Average
Achievement, School Type, and Stability Over Time
25
–.02(.04)
.16 (.02)*
.06 (.02)*
.58 (.02)*
Model 4C
∆T2MSC
b (SE)
.00 (.001) .00 (.004)
.74 (.025)* .51 (.017)*
4,454.1
3,802.7
–.13 (.05)*
–.18 (.06)*
.02 (.008)*
.66 (.023)*
4,289.8
.35 (.02)*
.30 (.02)*
.32 (.02)*
.40 (.02)*
Model 4B
T2MSC
b (SE)
.02 (.008)*
.66 (.023)*
4,291.1
–.07 (.02)*
.31 (.01)*
.41 (.01)*
Model 5A
T1MSC
b (SE)
.00 (.001)
.74 (.025)*
4,452.2
–.06 (.02)*
.34 (.02)*
.30 (.02)
Model 5B
T2MSC
b (SE)
.00 (.004)
.51 (.017)*
3,801.4
–.02 (.02)
.17 (.02)*
.06 (.02)*
.58 (.02)*
Model 5C
∆T2MSC
b (SE)
Model 5: School Type
.02 (.008)*
.66 (.023)*
4,287.7
–.13 (.06)*
–.04 (.03)
.32 (.02)*
.40 (.02)*
Model 6A
T1MSC
b (SE)
.00(.05)
–.03(.02)
.17 (.02)*
.06 (.02)*
.58 (.02)
Model 6C
∆T2MSC
b (SE)
.00 (.001) .00 (.004)
.74 (.025)* .51 (.017)*
4,450.5
3,801.4
–.08(.06)
–.05 (.02)*
.35 (.02)*
.30 (.02)*
Model 6B
T2MSC
b (SE)
Model 6: School-Average
Achievement and School Type
Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; school-average math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically
significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based
on responses by 1,758 students who completed the math self-concept instrument at Time 2.
*p < .01.
Fixed Effects
Level 1: Individual student
predictors
Math tests
Math grades
T1MSC
Level 2: School-level
predictors
School-average math test
School type
Residual variance
components
Level 2 school
Level 1 students
Deviance summary
Variables
Model 4A
T1MSC
b (SE)
Model 4: School-Average
Achievement
Marsh et al.
substantially negative near the end of high school and continue to be substantial and negative 4 years later. Whereas there are no new negative predictive effects of school-average achievement or school type on T2MSC after
controlling for the negative predictive effects of these school-level variables on
T1MSC, there are no indirect positive predictive effects of these school-level
variables to offset the substantial negative predictive effects that are largely
mediated by T1MSC. Hence, the predictive effects of school-average achievement and school type continue to have substantial negative predictive effects
on math self-concept 3 years after graduation from high school. On this basis,
we argue that the BFLPE is a persistent, long-lasting phenomenon.
The Predictive Effect of School Grades on the BFLPE
In Models 4, 5, and 6 (Table 3) individual student grades are added to
variables considered in Models 1, 2, and 3. Both individual achievement and
grades have substantial positive predictive effects on math self-concept. Thus,
for example, the predictive effect of individual math achievement on T1MSC
is .47 in Model 1A, but only .32 in Model 4A. What is surprising, perhaps, is
that the predictive effect of math achievement in Model 4A is nearly as large
as the predictive effect of math grades (.40). The combined predictive effects
of individual math achievement and math school grades on math self-concept
are substantially larger than the predictive effect of either of these individual
student variables considered alone.
For our purposes, the most important component of Models 4, 5, and
6 is the BFLPE—the predictive effects of the school-level variables. As predicted on the basis of the grade-on-a-curve effect, the BFLPE is systematically smaller in models that include school grades. Importantly, however, the
predictive effects of school-average achievement (in Model 4) and school
type (Model 5) are still significantly negative for both T1MSC and T2MSC.
Indeed, whereas the sizes of these negative predictive effects are smaller, the
pattern of significant predictive effects is nearly the same in Models 3, 4, and
5 (which include math grades) as the corresponding predictive effects in
Models 1, 2, and 3 (which do not include math grades).
Generalizability of the BFLPE: Interactions
With Gender and Individual Achievement
In Models 7 and 8 (Table 4), we added individual student gender, and interaction effects involving gender and individual student ability, to models already
considered. Although (based on previous findings) it is not surprising that girls
tend to have lower math self-concepts than boys do, the predictive effect of math
achievement on math self-concept does not vary as a function of gender. Of
more particular relevance to this investigation is the question of whether the
BFLPE (the negative predictive effects of school-average achievement or school
type) varies as a function of individual student achievement or individual
student gender. Of the total of 12 interactions in Models 7 and 8, only 1 is marginally significant. For T1MSC, the negative predictive effect of school-average
26
27
–.12 (.05)*
–.06 (.05)
–.05 (.05)
–.18 (.05)*
.02 (.05)
–.09 (.04)*
.00 (.001)
.72 (.024)*
4,416.1
–.12 (.02)*
.03 (.02)
–.15 (.02)*
.00 (.02)
.01 (.007)*
.64 (.022)*
4,232.2
.33 (.02)*
.30 (.02)*
Model 7B
T2MSC
b (SE)
.29 (.02)*
.41 (.02)*
Model 7A
T1MSC
b (SE)
.00 (.004)
.50 (.017)*
3,795.1
–.06 (.04)
.00 (.04)
–.02 (.04)
.16 (.02)*
.07 (.02)*
.58 (.02)*
–.04 (.02)*
.03 (.02)
Model 7C
∆T2MSC
b (SE)
Model 7: School-Average Achievement
.02 (.007)*
.64 (.022)*
4,238.4
.00 (.02)
–.02 (.02)
–.07 (.03)*
–.15 (.02)*
.01 (.02)
.28 (.02)*
.42 (.01)*
Model 8A
T1MSC
b (SE)
.00 (.001)
.72 (.024)*
4,416.5
–.02 (.02)
–.01 (.03)
–.06 (.03)*
–.12 (.02)
.03 (.02)
.32 (.02)*
.30 (.02)*
Model 8B
T2MSC
b (SE)
Model 8: School Type
.00 (.004)
.50 (.017)*
3,794.0
–.02 (.02)
.01 (.02)
–.02 (.02)
.16 (.02)*
.074 (.02)*
.58 (.02)*
–.04 (.02)*
.02 (.02)
Model 8C
∆T2MSC
b (SE)
Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; school average math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual
student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based on
responses by 1,758 students who completed the math self-concept instrument at Time 2.
*p < .05.
Fixed effects
Level 1: Individual student predictors
Math test
Math grades
T1MSC
Sex (0 = male, 1 = female)
Sex × Math Test
Level 2: School-level predictors
School-average math test
School type
Cross-level interactions
School-Average Math Test × Sex
School-Average Math Test × Math Test
School Type × Sex
School Type × Individual Achievement
Residual variance components
Level 2 school
Level 1 students
Deviance summary
Variables
Table 4
Study 2: Generalizability of the Big-Fish-Little-Pond Effect: Interactions With Gender and Individual Achievement
Marsh et al.
achievement is greater for more able students. This predictive effect is not,
however, significant for T2MSC, nor do the negative predictive effects of
school type vary with individual levels of student achievement. None of the
interactions with individual student gender is significant, indicating that the
size of the BFLPE is similar for boys and for girls. In summary, the results of
these extended models suggest that the BFLPE is robust, generalizing reasonably well at different levels of individual student achievement and gender.
General Discussion for Studies 1 and 2
Is the BFLPE a Persistent, Long-Lasting Phenomenon?
The overarching purpose of this investigation is to test a priori predictions that the BFLPE associated with school-average achievement and school
type is clearly evident at the end of high school and that these effects are still
evident for participants several years after graduation from high school. In
both studies, the BFLPE is substantial for both T1MSC and for T2MSC. Hence,
there is clear support for a priori predictions that the BFLPE is a persistent,
long-lasting phenomenon.
A critical and apparently unique contribution of this investigation is the
evaluation of the BFLPE (negative predictive effects of school-average
achievement and school type) on academic self-concept while students were
in high school (T1) and again several years after they had graduated from high
school (T2). Obviously, this is a more demanding test of the BFLPE than simply assessing the size of the BFLPE on a single occasion while students are in
high school or even tests of the BFLPE based on more than one occasion during high school. Whereas there now exists clear support for the BFLPE during high school, a focus of this investigation was whether this predictive effect
remained after students had graduated from high school and were no longer
in the high school setting in which the BFLPE was established. Although it
might be reasonable to suggest that the BFLPE should dissipate over time, the
results of the present investigation show that the predictive effects are persistent and continue to have a substantial predictive effect on academic selfconcepts long after graduation from high school.
In evaluating the stability of the BFLPE, it is critical to interpret carefully
the results from Model A (based on T1MSC), Model B (based on T2MSC), and
Model C (based on T2MSC controlling for the predictive effects of T1MSC).
Results based on Models A and B show that the negative predictive effects of
the BFLPE are statistically significant and substantial near the end of high school
and again several years after graduation. If the BFLPE was a short-term, transitory phenomena, one might expect that the predictive effect of school-level
achievement and school type would only be substantial for T1MSC (Model A)
but not for T2MSC (Model B). However, once the negative predictive effects of
these school-level variables on T1MSC had been controlled, a substantial
diminution of the BFLPE should result in a positive direct effect of these schoollevel variables for T2MSC (Model C, also see earlier discussion of Figure 1B).
28
Stability of the BFLPE
Thus, the indirect negative BFLPE mediated by T1MSC would be offset to a
large extent by a positive, direct effect of high-school-average achievement on
T2MSC. Hence, even though the BFLPE might be negative for T2MSC when
T1MSC is not controlled, the direct predictive effect of school-average ability
might be positive for T2MSC after controlling for the negative predictive effects
mediated through T1MSC. There was, however, absolutely no support for such
alternative speculations. Indeed, for Study 1 there was a small but statistically
negative predictive effect of school-average achievement (Model 1C, Table 1)
on T2MSC even after controlling for the substantial negative predictive effects
of school-average achievement on T1MSC. This implies that there were new,
more negative predictive effects of school-average achievement after graduation from high school (T2) in addition to the substantial negative predictive
effects already experienced during high school (T1). Whereas this pattern has
been found in previous research based on two data collections during high
school (e.g., Marsh, 1991), this is apparently the first such finding following
graduation from high school.
In Study 2, the corresponding negative direct predictive effect on T2MSC
after controlling for T1MSC was not statistically significant (Model 1C in Table
3). Importantly, however, there was no statistically significant positive predictive effect of school-average achievement, as would be expected if there
were a systematic diminution of the BFLPE over time. Hence, whereas there
was no evidence that the size of the BFLPE increased during the period following high school in Study 2, it did not decrease. Furthermore, there are differences between the two studies that might account for these different
results. In particular, the time lag in Study 1 was only about 2 years (from final
year of high school to 2 years after graduation from high school), whereas the
time lag in Study 2 was nearly twice as long (from the year before the final
year in high school to 3 years after graduation from high school).
School Grades, Grading on a Curve, and the BFLPE
The BFLPE presents a dilemma for academic self-concept researchers. On
one hand, there is clear evidence that academic self-concept is more strongly
related to school-based performance measures, such as school grades, compared to standardized test scores. On the other hand, because grades tend to
be idiosyncratic to a particular setting and teachers tend to grade on a curve,
school grades do not provide a common metric that is generalizable over different schools and classes. Hence, most BFLPE studies have been based on
standardized achievement test scores. However, Marsh (1987) noted that the
BFLPE and a grading-on-a-curve effect have a similar rationale and are mutually reinforcing, such that the BFLPE is mediated in part by school grades.
Results of the present investigation replicate these earlier results but also
extend them in some interesting ways. Not surprisingly, in both Studies 1 and
2, school grades have a substantial influence on T1MSC beyond the substantial predictive effect of individual student achievement. What may be more
surprising is that the achievement test scores continue to have such a substantial predictive effect on academic self-concept beyond the predictive
29
Marsh et al.
effect of school grades. This suggests that students know their relative abilities in relation to a broader, more generalizable frame of reference in addition to the more narrowly focused frame of reference provided by other
students in their school. It is also interesting to note that the relative contribution of test scores is as high or higher for models based on T2MSC, collected several years after graduation from high school (e.g., Models 4B and
4C), than for models based on T1MSC, collected near the end of high school
(e.g., Model 4A). Nevertheless, it is relevant to note that both school grades
and, in particular, test scores continue to have significant predictive effects on
T2MSC even after controlling for T1MSC. As such, the results of our study
demonstrate that each of these sources of information about achievement
continues to have substantial predictive effects on academic self-concept.
Whereas the predictive effects of test scores and grades in T2MSC are similar
in Study 1, the predictive effects of test scores are significantly larger than high
school grades after students have graduated from high school. Because test
scores reflect a broader frame of reference than school grades that are highly
dependent on the achievement levels of other students in the same high
school, it is not surprising that test scores are strongly related to academic selfconcept after graduation from high school.
The BFLPEs, the negative predictive effects of school-average achievement and school type, are substantially smaller after controlling for school
grades. This implies that the grading-on-a-curve effect and the BFLPE are substantially overlapping, mutually reinforcing processes that have independent
predictive effects on academic self-concept. Hence, results based on these two
large, representative samples of German high schools and the distinctive form
of tracking students into different school types in the German school system
replicate Marsh’s (1987) results based on a large representative sample of U.S.
high schools. Although beyond the scope of our study, a useful direction for
further research would be to disentangle the apparently confounded effects of
these two processes. Whereas typically these processes are positively related,
it should be possible to find naturally occurring situations in which the two
processes are not confounded (e.g., where school grades are based on
absolute criteria or are externally moderated in relation to external criteria that
are measured along a common metric) or, perhaps, to experimentally manipulate grades in a laboratory setting so that they are independent of achievement test scores. The critical issue is the extent to which the size of the BFLPE
varies as a function of the grading standards.
Generalizability of the Predictive Effects
The results of this investigation indicate that the BFLPE is reasonably
robust over two different studies, over time, over gender, and over individual student ability levels. The major focus of this investigation is to demonstrate that the BFLPE that is widely demonstrated in high school settings is
long lasting and persistent after students have graduated from high school.
We were also interested, however, in determining how robust the BFLPE is
30
Stability of the BFLPE
over gender and individual student achievement levels. The inclusion of additional variables representing gender, and interactions between individual
achievement with gender and with school-average achievement, had almost
no effect on the size of the BFLPE in either Study 1 or Study 2. Although the
BFLPE was marginally larger for girls than for boys, this interaction was only
significant for some models in Study 1 and was not statistically significant for
any analyses in Study 2. Consistent with previous results, the BFLPE did not
vary much as a function of individual student achievement. In Study 1, individual student achievement did not interact significantly with school-average
achievement in Models 6A, 6B, or 6C, but there was a marginally significant
interaction with school type in Model 7A (but not 7B or 7C), suggesting that
the BFLPE was slightly smaller for more able students. In Study 2, individual
student achievement did not interact significantly with school type in Models
7A, 7B, or 7C, but there was a marginally significant interaction with schoolaverage achievement in Model 6A (but not in 6B or 6C), suggesting that the
BFLPE was slightly larger for more able students. Although the precise nature
of the few small interactions that reached statistical significance (due in part
to the large sample sizes) was not entirely consistent across the two studies,
both studies provided reasonable support for the robustness of the BFLPE
over time, gender, and individual student achievement levels.
Given that Studies 1 and 2 were based on different materials, different
cohorts of students, and different time frames, there was good consistency in
the pattern of results. In particular, both studies clearly demonstrate that the
BFLPE is persistent in that the predictive effects are clearly evident even several years after graduation from high school. However, even though the patterns of predictive effects are largely similar in the two studies, the predictive
effects are systematically larger in Study 1 than in Study 2. The positive predictive effects of individual student characteristics (student achievement and
school grades) and the negative predictive effects of school-level characteristics (school-average achievement and school type) are all larger in Study 1
than in Study 2. Although there are several potential sources of difference, the
most likely seems to be the time frame of the two studies. Study 2 began earlier (T1 was the second-to-the-last year of high school) than Study 1 (T1 was
the last year of high school) and lasted longer (the T1-T2 interval was 2 years
in Study 1 and 4 years in Study 2). It is not, perhaps, surprising that the predictive effects of school grades and achievement in high school have smaller
predictive effects on self-concept measures collected 4 years later in Study 2
than parallel predictive effects after only 2 years in Study 1. More surprising,
perhaps, is the finding that the predictive effects in Study 2 are also smaller
at T1. The results may reflect the fact that the academic self-concepts of
students near the end of high school (Study 1) are more closely aligned to
objective indicators of academic accomplishment (including the relative performances of classmates) than they are in the penultimate year of high school
(in Study 2). However, the smaller BFLPE on T1 self-concept in Study 2 than
in Study 1 may also reflect the different self-concept instruments used at T1
in the two studies. Although clearly beyond the scope of this investigation, it
31
Marsh et al.
would be useful to assess the strength of the BFLPE more frequently over a
longer span of time to test the predictions that (a) the BFLPE grows larger the
longer the same students remain in the same school and (b) that the BFLPE
is reasonably stable over time even after students have graduated from high
school.
Limitations and Directions for Further Research
It is relevant to address potential limitations of this investigation in terms
of their implications for interpretation of our results and directions for future
research. BFLPE theory offers causal predictions about the effects of schoolaverage ability, and the combination of sophisticated statistical analyses and
longitudinal data provide strong tests of these causal predictions. It is, nevertheless, inappropriate to claim that our results prove that the predictive effects
that we have found represent true causal effects. This limitation appears to
be an inevitable consequence of the nature of this research, but convergence
of results based on alternative experimental designs and even stronger statistical techniques would strengthen the interpretation of the results.
In this investigation, we considered math self-concept only, and this has
been the focus of a number of other BFLPE studies (e.g., Marsh & Craven,
2002). On the basis of existing theory and a very limited amount of research,
there is no reason to expect that the BFLPE does not apply equally well to
other academic domains. Because school-average abilities in different academic domains (e.g., school-average abilities in mathematics, science, history,
language) are likely to be very highly correlated, it is unlikely that different
school-average abilities can be readily differentiated in the general population.
It is possible, however, that schools that are highly selective in a specific academic domain (e.g., mathematics, sport, performing arts) will have BFLPEs that
are specific to that domain (e.g.., Marsh, 1994; also see Marsh & Craven, 2002)
that will undermine some of the domain-specific attributes that such schools
seek to reinforce. A number of issues related to the representativeness of the
sample limit the generalizability of the results. Although the original samples
in both studies were representative of German students attending the final 2
years of high school, typically in preparation for subsequent university attendance, this only represents about 30% of the total population of this age cohort.
As is inevitable in large-scale longitudinal studies—particularly ones that
attempt to follow high school students after graduation from high school—the
representativeness of the sample was further compromised by nonresponse in
the second waves of each of the studies.
Although beyond the scope of this investigation, it would also be useful
to pursue how academic self-concept and the BFLPE affected the subsequent
life choices that students make after high school graduation and how, in turn,
these decisions affect subsequent self-concept. Thus, for example, Marsh and
O’Mara (2006) report that there is a reciprocal pattern of relations between
academic self-concept in high school, high school achievement, and educational attainment 5 years after high school.
32
Stability of the BFLPE
Implications for Policy Practice
In many educational systems across the world, there is an ongoing policy debate about the provision of highly segregated educational settings for
very bright students. This policy direction is based in part on a labeling theory
perspective, suggesting that bright students will have higher self-concepts and
experience other psychological benefits from being educated in the company
of other academically gifted students. Yet, our BFLPE and empirical evaluation
of the predictive effects of academically selective settings (e.g., Marsh et al.,
1995) shows exactly the opposite pattern of results. Placement of gifted students in academically selective settings results in lower academic self-concepts,
not higher academic self-concepts. Coupled with other research showing that
school-average ability has negative effects on other educational outcomes
(coursework selection, educational aspirations, effort; see Marsh, 1991) and
that academic self-concept has reciprocal effects with achievement for students
of all ability levels (Marsh & Craven, 2006), this finding has important policy
implications.
Whereas not all gifted and talented students will suffer lower academic
self-concepts when attending academically selective high schools, many will.
BFLPE research, however, provides an important alternative perspective to
existing policy directions that have not been adequately evaluated in relation
to current educational and psychological research. Hence, we urge parents,
policy makers, and practitioners to think carefully about the implications of
school placements and to reflect on potential negative side effects of current
policy toward segregation of school and classes on the basis of academic ability. A compromise position might be to recognize the negative implications of
the BFLPE and to develop policies to try to counter the negative effects of highschool-average achievement. Hence, Marsh and Craven (2002) suggest that the
BFLPE is reinforced by highly competitive classroom environments that
emphasize normative feedback that rank orders students. In the present investigation, the grade-on-the-curve effect, such that students in academically
selective schools get lower school grades than they would get if they were in
mixed-ability schools, also reinforces the BFLPE. Hence, even though the
BFLPE appears to be stable and pervasive, it may be possible to alter the
school environment in such a way as to undermine its negative effects.
The focus of our investigation has been on academically advantaged
students. However, the BFLPE has implications for special education at both
ends of the ability spectrum. In particular, consistent with BFLPE predictions,
research with academically disadvantaged students shows that moving academically disadvantaged students from special classes with other disadvantaged students to mixed-ability classes (mainstreaming or inclusion) lowers
not only math, verbal, and academic self-concepts but also social self-concept (Marsh, Tracey, & Craven, 2006; Tracey, Marsh & Craven, 2003; also see
Chapman, 1988).
33
34
1
2
3
4
5
6
7
8
9
10
11
12
T1MSC
T2MSC
S-AM
MTest
Math grade
Sex
ST
Sex × MTest
Sex × S-AM
S-AM × MTest
Sex × ST
MTest × ST
1.00
.79
.14
.57
.64
–.16
.03
–.07
–.03
–.03
–.04
.07
1
.79
1.00
.14
.54
.59
–.15
.05
–.05
–.02
–.03
–.05
.06
2
.14
.14
1.00
.52
.07
–.20
.65
.03
.19
–.42
.25
–.40
3
.57
.54
.52
1.00
.36
–.35
.36
–.11
.07
–.23
.05
–.12
4
.64
.59
.07
.36
1.00
.01
.03
–.06
–.01
.03
–.01
.08
5
–.16
–.15
–.20
–.35
.01
1.00
.00
.19
–.02
.07
–.00
.05
6
Study 1
.50
.47
.15
.72
.20
–.21
.05
–.12
–.03
–.03
–.06
.04
7
–.07
–.05
.03
–.11
–.06
.19
.05
1.00
.46
–.26
.35
–.12
8
–.03
–.02
.19
.07
–.01
–.02
.26
.46
1.00
–.37
.57
–.20
9
–.03
–.03
–.42
–.23
.03
.07
–.42
–.26
–.37
1.00
–.25
.68
10
–.04
–.05
.25
.05
–.01
–.00
–.00
.35
.57
–.25
1.00
–.40
11
.07
.06
–.40
–.12
.08
.05
–.23
–.12
–.20
.68
–.40
1.00
12
.00
.00
–.07
.00
.00
.00
.00
–.35
–.11
.28
.00
.36
M
APPENDIX
Correlations, Means, and Standard Deviations for Variables Considered in the Two Studies
1.00
1.00
.54
1.00
1.00
1.00
1.00
.96
.50
.55
1.00
.97
SD
35
T1MSC
T2MSC
S-AM
MTest
Math grade
Sex
ST
Sex × MTest
Sex × S-AM
S-AM × MTest
Sex × ST
MTest × ST
1.00
.68
.06
.41
.50
–.21
.01
–.04
.01
–.03
–.01
–.00
.68
1.00
.10
.42
.41
–.19
.02
–.02
.03
–.05
–.03
.01
2
.06
.10
1.00
.42
.03
–.03
.53
–.03
–.12
–.02
–.05
–.31
3
.41
.42
.42
1.00
.32
–.18
.23
–.10
.07
–.05
–.05
–.09
4
.50
.41
.03
.32
1.00
–.04
.02
–.02
.04
–.03
–.02
.02
5
–.21
–.19
–.03
–.18
–.04
1.00
.04
.09
–.05
.12
–.03
–.05
6
.01
.02
.53
.23
.02
.04
1.00
–.06
–.30
–.05
–.12
–.59
7
–.04
–.02
–.03
–.10
–.02
.09
–.06
1.00
–.04
.43
.26
.08
8
.01
.03
–.12
.07
.04
–.05
–.30
–.04
1.00
–.14
–.01
.57
9
–.03
–.05
–.02
–.05
–.03
.12
–.05
.43
–.14
1.00
.53
.01
10
–.01
–.03
–.05
–.05
–.02
–.03
–.12
.26
–.01
.53
1.00
–.03
11
–.00
.01
–.31
–.09
.02
–.05
–.59
.08
.57
.01
–.03
1.00
12
.00
.00
–.03
.00
.00
.00
.00
–.18
.18
–.01
.04
.23
M
1.00
1.00
.45
1.00
1.00
1.00
1.00
1.01
.46
.44
1.03
1.11
SD
Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; S-AM = school-average math achievement (test scores); MTest = math
achievement (test scores); math grade = school mark; sex = gender (1 = male, 2 = female); ST = school type (selective school: 1 = no, 2 = yes). For purposes
of analysis (see Method section for further discussion), all individual student terms were standardized (M = 0, SD = 1). Cross-product and school averages
of individual student responses were based on standardized individual student scores but not restandardized (so that they are in the same metric as the individual student scores).
1
2
3
4
5
6
7
8
9
10
11
12
1
Study 2
Marsh et al.
Note
Work on this investigation was conducted, in part, while Professor Marsh was a visiting
scholar at the Center for Educational Research at the Max Planck Institute for Human
Development and was supported in part by the University of Western Sydney and the Max Planck
Institute.
References
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Baumert, J., Bos, W., & Lehmann, R. (Eds.). (2000). TIMSS/III: Dritte Internationale
Mathematik- und Naturwissenschaftsstudie—Mathematische und naturwissenschaftliche Bildung am Ende der Schullaufbahn [TIMSS/III: Third international
mathematics and science study—Students’ knowledge of mathematics and science at the end of secondary education]. Opladen, Germany: Leske and Budrich.
Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills:
The effects of incentives on motivation and performance. European Journal of
Psychology of Education, 16, 441–462.
Baumert, J., Roeder, P. M., Gruehn, S., Heyn, S., Köller, O., Rimmele, R., et al. (1996).
Bildungsverläufe und psychosoziale Entwicklung im Jugendalter [Educational
careers and psychosocial development during adolescence]. In K. P. Treumann,
G. Neubauer, R. Möller, & J. Abel (Eds.), Methoden und Anwendungen empirisch
pädagogischer Forschung (pp. 170–180). Münster, Germany: Waxmann.
Baumert, J., Roeder, P. M., Sang, F., & Schmitz, B. (1986). Leistungsentwicklung und
Ausgleich von Leistungsunterschieden in Gymnasialklassen [Development of
achievement and leveling of achievement differences in Gymnasium classes].
Zeitschrift für Pädagogik, 32, 639–660.
Baumert, J., Trautwein, U., & Artelt, C. (2003). Schulumwelten—institutionelle
Bedingungen des Lehrens und Lernens [School environments—Institutional
conditions for learning and instruction]. In J. Baumert, C. Artelt, E. Klieme, M.
Neubrand, M. Prenzel, U. Schiefele, et al. (Eds.), PISA 2000—Ein differenzierter
Blick auf die Länder der Bundesrepublik Deutschland (pp. 259–330). Opladen,
Germany: Leske + Budrich.
Beaton, A. E., Mullis, I. V. S., Martin, M. O., Gonzales, E. J., Kelly, D. L., & Smith, T.
A. (1996). Mathematics achievement in the middle school years: IEA’s third international mathematics and science study. Chestnut Hill, MA: Boston College.
Branden, N. (1994). Six pillars of self-esteem. New York: Bantam.
Chapman, J. W. (1988). Learning disabled children’s self-concepts. Review of Educational
Research, 58, 347–371.
Dai, D. Y. (2004). How universal is the big-fish-little-pond effect? American
Psychologist, 59, 267–268.
Diener, E., & Fujita, F. (1997). Social comparison and subjective well-being. In
B. P. Buunk & F. X. Gibbons (Eds.), Health, coping, and well-being: Perspectives
from social comparison theory (pp. 329–358). Mahwah, NJ: Lawrence Erlbaum.
Espenshade, T. J., Hale, L. E., & Chung, C. Y. (2005). The frog pond revisited: High
school academic context, class rank, and elite college admission. Sociology of
Education, 78, 269–293.
Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7,
117–140.
Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Hodder Arnold.
Husén, T. (1967). International study of achievement in mathematics: A comparison
of 12 countries (Vols. 1–2). Stockholm: Almqvist & Wiksell.
36
Stability of the BFLPE
James, W. (1963). The principles of psychology. New York: Holt, Rinehart & Winston.
(Original work published 1890)
Jerusalem, M. (1984). Reference group, learning environment and self-evaluations: A
dynamic multi-level analysis with latent variables. In R. Schwarzer (Ed.), The self
in anxiety, stress and depression (pp. 61–73). New York: Elsevier North-Holland.
Köller, O., Watermann, R., Trautwein, U., & Lüdtke, O. (2004). Wege zur Hochschulreife
in Baden-Württemberg. TOSCA—Eine Untersuchung an allgemein bildenden und
beruflichen Gymnasien [Educational pathways to college in Baden-Württemberg.
TOSCA—A study at upper secondary level of traditional and vocational
Gymnasiums]. Opladen, Germany: Leske + Budrich.
Krull, J. L., & MacKinnon, D. P. (2001). Multilevel modeling of individual and group
level mediated effects. Multivariate Behavioral Research, 36, 249–277.
Lüdtke, O., Köller, O., Marsh, H. W., & Trautwein, U. (2005). Teacher frame of reference and the big-fish-little-pond effect. Contemporary Educational Psychology,
30, 263–285.
Marsh, H. W. (1974). Judgmental anchoring: Stimulus and response variables.
Unpublished doctoral dissertation, University of California, Los Angeles.
Marsh, H. W. (1984a). Self-concept: The application of a frame of reference model to
explain paradoxical results. Australian Journal of Education, 28, 165–181.
Marsh, H. W. (1984b). Self-concept, social comparison and ability grouping: A reply
to Kulik and Kulik. American Educational Research Journal, 21, 799–806.
Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal
of Educational Psychology, 79, 280–295.
Marsh, H. W. (1991). The failure of high ability high schools to deliver academic benefits: The importance of academic self-concept and educational aspirations.
American Educational Research Journal, 28, 445–480.
Marsh, H. W. (1993). Academic self-concept: Theory, measurement and research. In
J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp. 59–98). Hillsdale,
NJ: Lawrence Erlbaum.
Marsh, H. W. (1994). Using the National Educational Longitudinal Study of 1988 to
evaluate theoretical models of self-concept: The Self-Description Questionnaire.
Journal of Educational Psychology, 86, 439–456.
Marsh, H. W. (2005). Big fish little pond effect on academic self-concept. German
Journal of Educational Psychology, 19, 141–144.
Marsh, H. W., Chessor, D., Craven, R. G., & Roche, L. (1995). The effects of gifted
and talented programs on academic self-concept: The big fish strikes again.
American Educational Research Journal, 32, 285–319.
Marsh, H. W., & Craven, R. (1997). Academic self-concept: Beyond the dustbowl.
In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement,
and adjustment (pp. 131-198). Orlando, FL: Academic Press.
Marsh, H. W. & Craven, R. (2002). The pivotal role of frames of reference in academic self-concept formation: The big fish little pond effect. In F. Pajares &
T. Urdan (Eds.), Adolescence and education (Vol. 2, pp. 83–123). Greenwich,
CT: Information Age.
Marsh, H. W., & Craven, R. G. (2006). Reciprocal effects of self-concept and performance from a multidimensional perspective: Beyond seductive pleasure and unidimensional perspectives. Perspectives on Psychological Science, 1, 133–163.
Marsh, H. W., & Hau, K. T. (2003). Big fish little pond effect on academic self-concept:
A crosscultural (26 country) test of the negative effects of academically selective
schools. American Psychologist, 58, 364–376.
Marsh, H. W., Köller, O., & Baumert, J. (2001). Reunification of East and West German
school systems: Longitudinal multilevel modeling study of the big fish little pond
37
Marsh et al.
effect on academic self-concept. American Educational Research Journal, 38,
321–350.
Marsh, H. W., Kong, C.-K., Hau, K.-T. (2000). Longitudinal multilevel modeling of the
big fish little pond effect on academic self-concept: Counterbalancing social
comparison and reflected glory effects in Hong Kong high schools. Journal of
Personality and Social Psychology, 78, 337–349.
Marsh, H. W., & O’Mara, A. (2006). Reciprocal effects between academic self-concept,
self-esteem, achievement and attainment over seven adolescent-adult years:
Unidimensional and multidimensional perspectives of self-concept. Unpublished
manuscript, Department of Educational Studies, Oxford University, UK.
Marsh, H. W., & Parker, J. (1984). Determinants of student self-concept: Is it better to
be a relatively large fish in a small pond even if you don’t learn to swim as well?
Journal of Personality and Social Psychology, 47, 213–231.
Marsh, H. W., & Rowe, K. J. (1996). The negative effects of school-average ability on
academic self-concept: An application of multilevel modeling. Australian Journal
of Education, 40, 65–87.
Marsh, H. W., & Shavelson, R. (1985) Self-concept: Its multifaceted, hierarchical structure.
Educational Psychologist, 20, 107–125.
Marsh, H. W., Tracey, D. K., & Craven, R. G. (2006). Multidimensional self-concept
structure for preadolescents with mild intellectual disabilities: A hybrid multigroup-mimic approach to factorial invariance and latent mean differences.
Educational and Psychological Measurement, 66, 795–818.
Marsh, H. W., Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Integration of
multidimensional self-concept and core personality constructs: Construct validation
and relations to well-being and achievement. Journal of Personality, 74, 403–455.
Möller J., & Köller, O. (2001). Frame of reference effects following the announcement
of exam results. Contemporary Educational Psychology, 26, 277–287.
Möller, J., & Köller, O. (2004). Die Genese akademischer Selbstkonzepte: Effekte
dimensionaler und sozialer Vergleiche [On the development of academic selfconcepts: The impact of social and dimensional comparisons]. Psychologische
Rundschau, 55, 19–27.
Plucker, J. A., Robinson, N. M., Greenspon, T. S., Feldhusen, J. F., McCoach, D. B.,
& Subotnik, R. F. (2004). It’s not how the pond makes you feel, but rather how
high you can jump. American Psychologist, 59, 268–269.
Rasbash, J., Steele, F., Browne, W., & Prosser, B. (2005). A user’s guide to MLwiN - Version
3.0. Bristol, UK: University of Bristol.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications
and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Robitaille, D., & Garden, R. (1989). The IEA study of mathematics: II. Contents and
outcomes of school mathematics. Oxford, UK: Pergamon.
Schwanzer, A. D., Trautwein, U., Lüdtke, O., & Sydow, H. (2005). Entwicklung eines
Instruments zur Erfassung des Selbstkonzepts junger Erwachsener [Development
of a questionnaire on young adults’ self-concept]. Diagnostica, 51, 183–194.
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of construct interpretations. Review of Educational Research, 46, 407–441.
Sherif, M., & Hovland, C. W. (1961). Social judgment. New Haven, CT: Yale University
Press.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic
and advanced multilevel modeling. London: Sage.
Tracey, D. K., Marsh, H. W., & Craven, R. G. (2003). Self-Concepts of preadolescent
students with mild intellectual disabilities: Issues of measurement and educational
placement. In H. W. Marsh, R. G. Craven, & D. M. McInerney (Eds.), International
advances in self research (Vol. 1, pp. 203–230). Greenwich, CT: Information Age.
38
Stability of the BFLPE
Trautwein, U., Lüdtke, O., Marsh, H. W., & Köller, O. (2006). Tracking, grading, and
student motivation: Using group composition and status to predict self-concept
and interest in ninth-grade mathematics. Journal of Educational Psychology, 98,
788–806.
Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Self-esteem, academic
self-concept, and achievement: How the learning environment moderates the
dynamics of self-concept. Journal of Personality and Social Psychology, 90,
334–349.
Zeidner, M., & Schleyer, E. J. (1999). The big-fish-little-pond effect for academic selfconcept, test anxiety and school grades in gifted children. Contemporary
Educational Psychology, 24, 305–329.
Manuscript received May 15, 2006
Revision received September 12, 2006
Accepted September 15, 2006
39