[AERJ306728_correx2] American Educational Research Journal Month XXXX, Vol. XX, No. X, pp. X –X DOI: 10.3102/0002831207306728 © AERA. http://aerj.aera.net The Big-Fish-Little-Pond Effect: Persistent Negative Effects of Selective High Schools on Self-Concept After Graduation Herbert W. Marsh University of Oxford Ulrich Trautwein Oliver Lüdtke Jürgen Baumert Max Planck Institute for Human Development Olaf Köller Humboldt University According to the big-fish-little-pond effect (BFLPE), attending academically selective high schools negatively affects academic self-concept. Does the BFLPE persist after graduation from high school? In two large, representative samples of German high school students (Study 1: 2,306 students, 147 schools; Study 2: 1,758 students, 94 schools), the predictive effects of individual achievement test scores and school grades on math self-concept are very positive, whereas the predictive effects of school-average achievement are negative (the BFLPE). Both studies showed that the BFLPE was substantial at the end of high school and was still substantial 2 years (Study 1) or 4 years (Study 2) later. In addition, because of the highly salient system of school tracks within the German education system, the authors are able to show that negative effects associated with school type (highly academically selective schools, the Gymnasium) were similar—but smaller—than the BFLPE based on school-average achievement. KEYWORDS: academically selective schools, big-fish-little-pond effect, frameof-reference effects, German educational system, grade-on-a-curve effect, multilevel modeling, self-concept T he concept of self is one of the oldest (James, 1890/1963) and most important constructs in the social sciences (Branden, 1994; Marsh & Craven, 2002, 2006). However, psychologists from the time of William James have recognized that objective accomplishments are evaluated in relation to frames of reference (also see Festinger, 1954). Thus, James (1890/1963) indicated, “We Marsh et al. have the paradox of a man shamed to death because he is only the second pugilist or the second oarsman in the world” (p. 310). Marsh (1974, 1984b, 1991, 1993, 2005; Marsh & Craven, 2002; Marsh & Parker, 1984) proposed the big-fish-little-pond effect (BFLPE) to encapsulate frame-of-reference effects in educational settings. According to the BFLPE model, students compare their own academic ability with the academic abilities of their classmates and use this social comparison impression as one basis for forming their own academic self-concepts. A negative BFLPE (a contrast effect) occurs where equally able students have lower academic self-concepts when they compare themselves to more able students and higher academic self-concepts when they compare themselves to less able students. For average-ability students who attend a school where the average achievement level of other students is high (hereafter referred to as a high-ability school), their academic abilities HERB ERT W. MARSH is a professor of education at Oxford University, Department of HERBERT / Education, Oxford OX2 6PY UK; e-mail: [email protected]. He is widely [cap] published (350 articles in 70 journals, 55 chapters, 13 monographs, 350 conference papers) and coedits the International Advances in Self Research monograph series and is an ISI highly cited researcher. He founded the SELF Research Centre, with members and satellite centers around the world. His areas of specialization include self-concept, students’ evaluation of university teaching, and application of advanced quantitative research methods to diverse areas of education and psychology. ULRICH TRAUTWEIN is a research scientist at the Center for Educational Research at the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany; e-mail: [email protected]. His main research interests include the effects of various characteristics of the learning environments on student achieve[author names should all have full cap initial ment, self-concept, and interest. cap for first letter of first and last name and OLIVER LÜDTKE is a research scientist at the Center Research small for capsEducational for other letters -- see other at the Max Planck Institute for Human Development, Lentzeallee Berlin, AERJ articles 94, for 14195 examples.] Germany; e-mail: [email protected]. His main research interests include the modeling of effects of classroom factors on individual development and the role of personality development in school and university contexts. JÜRGEN BAUMERT is the director of the Center for Educational Research at the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany; e-mail: [email protected]. His major research interests include research on teaching and learning, large-scale assessment, cognitive and motivational development in adolescence, and educational institutions as developmental environments. OLAF KÖLLER is a full professor of educational research and director of the Institute for Educational Progress at Humboldt University Berlin, Unterden Linden 6, D-10099 Berlin, Germany; e-mail: [email protected]. His main research interests include educational assessment, the association between school achievement and motivation, and social comparison processes. 2 Stability of the BFLPE are below the average of other students in their school. According to the BFLPE, this educational context will foster social comparison processes leading to academic self-concepts that are lower than if the same students attended an average-ability school. Conversely, if these students attend a low-ability school, their abilities will be above average in relation to other students in the school, and the social comparison processes will result in higher academic self-concepts. Hence, academic self-concepts depend not only on a student’s academic accomplishments but also on the accomplishments of those in the educational setting that the student attends. The BFLPE is typically operationalized (see Marsh & Craven, 2002) as a path model (see Figure 1A) in which the effects of individual student achievement on academic self-concept are predicted to be positive and the effects of school-average achievement are predicted to be negative. Empirical support for this negative effect of school-average achievement on academic self-concept (the BFLPE) comes from numerous studies based on a variety of experimental and analytical approaches (see reviews by Marsh, 2005; Marsh & Craven, 2002). Good support for the cross-cultural generalizability of the BFLPE comes from studies conducted in Germany (e.g., Jerusalem, 1984; Marsh, Köller, & Baumert, 2001), Hong Kong (e.g., Marsh, Kong, & Hau, 2000), Israel (e.g., Zeidner & Schleyer, 1999), the United States (e.g., Marsh, 1991), and Australia (e.g., Marsh, Chessor, Craven, & Roche, 1995). Whereas most research has been limited to results from a single country, Marsh and Hau (2003) tested the cross-cultural generalizability of the BFLPE with nationally representative samples of approximately 4,000 15-year olds from each of 26 countries who completed the same self-concept instrument and achievement tests as part of the Organization for Economic Cooperation and Development (OECD) Program for International Student Assessment (PISA) study. Consistent with a priori predictions, the standardized effects of individual student achievement were substantial and positive (.38), and the effects of schoolaverage achievement were smaller and negative (–.21). Although the results varied somewhat from country to country, this variation was small. Distinctive Characteristics of the BFLPE Multidimensionality and Domain Specificity In self-concept research, there is growing support for a multidimensional perspective of self-concept rather than a unidimensional perspective that places primary reliance on a single global component of self-concept or selfesteem (Marsh & Craven, 2006). Particularly in educational settings, there is increasing evidence that global self-esteem is nearly unrelated to a wide variety of educational outcomes (Marsh, 1993; Marsh & Craven, 1997). Consistent with theoretical predictions and this growing support for a multidimensionsal perspective, the BFLPE is very specific to academic components of selfconcept. Marsh and Parker (1984; Marsh, 1987) showed that there were large negative BFLPEs for academic self-concept, but little or no BFLPEs on general 3 Marsh et al. self-concept or self-esteem. Marsh et al. (1995) reported two studies of the effects of participation in gifted and talented programs on different components of self-concept over time and in relation to a matched comparison group. There was clear evidence for negative BFLPEs in that the academic self-concepts of students in the gifted and talented programs declined over time and in relation to the comparison group. These BFLPEs were consistently large for academic components of self-concept but were small and largely nonsignificant for four nonacademic self-concepts and for global self-esteem. Whereas different studies have evaluated the BFLPE for different academic domains, the most frequent tests are based on a global measure of academic self-concept or on math self-concept (see review by Marsh & Craven, 2002). Relation to Individual Student Ability Do both the best and poorest students suffer the BFLPE? Marsh (1984a, 1987, 1991, 1993; Marsh et al., 1995; Marsh & Rowe, 1996) argued that attending selective schools should lead to reduced academic self-concepts for students of all achievement levels, based on several different theoretical perspectives. In their review of BFLPE research, Marsh and Craven (2002; also see Marsh & Hau, 2003) concluded that there is little evidence that the size of the BFLPE varies systematically with individual student ability levels. Juxtaposition Between Standardized Test Scores, School Grades, and Grading on a Curve It is important that tests of the BFLPE are based on achievement indicators that vary along a metric that is common to all individual students, classes, and schools. For this reason, BFLPE studies typically use standardized achievement tests as the basis of both individual student and school-average achievement. This, however, creates a potential dilemma in that there is a growing body of research demonstrating that academic self-concept is more strongly related to school-based measures (e.g., school grades) than to standardized achievement (Marsh, 1993; Marsh & Craven, 2006). This follows in that school grades are a more direct source of feedback to students about their academic accomplishments than test scores are—particularly ones that are not part of the formal assessment process for which students do not prepare and do not even receive feedback about their results. Also, school grades are likely to be responsive to motivational processes (e.g., effort, conscientiousness, appropriate preparation, self-regulation, etc.) that mediate the relation between selfconcept and achievement—particularly when such characteristics are part of the basis for assigning school grades. However, school grades typically reflect an idiosyncratic metric that varies depending on the particular teacher, class, or school. In particular, there is a widespread grading-on-a-curve phenomenon such that the percentage of students getting each grade does not vary much in different schools—even when there are substantial differences in the ability levels of students in the different schools; equally able students are 4 Stability of the BFLPE likely to get lower grades in schools where the school-average achievement levels are high than in schools where achievement levels of other students are low. An early BFLPE study (Marsh, 1987) shows how this grading-on-a-curve phenomenon influences the BFLPE. Based on a large representative sample of U.S. high schools, Marsh (1987; also see Marsh & Rowe, 1996) noted that the BFLPE and the grading-on-a-curve effect have a similar rationale and are mutually reinforcing. He found that the effects of school-average achievement on academic self-concept were negative. The results also clarified the distinction between academic ability and grade-point average, their respective influences on self-concept, and how this influenced the BFLPE. Consistent with the grading-on-a-curve effect, equally able students have lower gradepoint averages in high-ability schools than in low-ability schools. This phenomenon is so widespread that university admission officers have to take it into account when evaluating the high school transcripts of university applicants (Espenshade, Hale, & Chung, 2005). However, Marsh demonstrated that this frame-of-reference effect influencing grade-point average was separate from, but contributed to, the BFLPE on academic self-concept. More recently, Trautwein, Lüdtke, Marsh, and Köller (2006) demonstrated a similar phenomenon in German schools. Stability and Persistence of Effects A critical issue with theoretical and practical implications is whether BFLPEs are short term and ephemeral or stable and long lasting. In response to the Marsh and Hau (2003) OECD PISA study, both Dai (2004) and Plucker et al. (2004) suggested that the BFLPE might be temporary. Marsh and Hau reviewed a number of studies suggesting that the size of the BFLPE remained stable or even increased in size over time for students who remained in the same selective school setting. In evaluating the stability of the BFLPE over time (see Figure 1B), school-average ability is related to academic self-concept, collected on at least two occasions. Hence, the critical effects are the total, direct, and mediated effects of school-average ability on academic self-concept at Time 2 (T2). If the BFLPE were highly stable over time, then most of the effect of school-average ability on T2 self-concept would be mediated through its effect on Time 1 (T1) self-concept so that the direct effects of school-average ability on T2 self-concept (represented by the corresponding path coefficient in Figure 1B) would be close to zero. The direct effect of school-average ability could be positive even though the total effect was negative. This would indicate that the negative indirect effect of school-average ability that was mediated through T1 self-concept was offset to some extent by a positive direct effect. To the extent that the direct effect of school-average ability is negative, it would mean that there is a new (additional) negative effect of schoolaverage ability beyond the negative effect that can be explained in terms of T1 academic self-concept. 5 Marsh et al. A Positive effects of Time 1 achievement on academic self-concept Time 1 individual student achievement ++ - ++ Time 1 schoolaverage achievement Time 1 individual student academic self-concept negative effects of Time 1 school-average achievement on academic self-concept (BFLPE) ??? B Time 1 individual student achievement Positive effects of Time 1 achievement on academic self-concept ++ ++ Time 1 schoolaverage achievement - Time 1 individual student academic self-concept ++ Time 2 individual student academic self-concept Negative effectsof Time 1 school-average achievement on academic self-concept (BFLPE) ??? Figure 1. The Big-Fish-Little-Pond Effect (BFLPE). Theoretical predictions: (A) traditional analysis based on a single wave of data, and (B) stability of BFLPEs over time based on longitudinal data. The Marsh, Köller, and Baumert (2001) study was particularly important in demonstrating the temporal evolvement of the BFLPE. In 1991, former East and West German students experienced a remarkable social experiment—the reunification of very different school systems after the fall of the Berlin Wall. Self-concept scores were collected at the start, middle, and end of the first school year after reunification. East German students had not previously been grouped according to achievement. For them, over the three data collection waves, the BFLPE was initially small, then moderate, and then substantial by the end of the year. West German students had attended schools based on achievement grouping for the 2 years prior to reunification. For them, the BFLPE was substantial at all three times. A large East-West difference in the size of the BFLPE at the start of the year disappeared completely by the end of the year. The BFLPE was stable for West German students who had previously been in academically selective schools, but for East German students who had previously been in heterogeneous, mixed-ability schools, the size of the BFLPE grew steadily larger during the first year of school following the introduction of an ability-tracked school system. 6 Stability of the BFLPE In a large Hong Kong study of students entering selective schools in Grade 7 (Marsh, Kong & Hau, 2000), there was a substantial negative effect of school-average ability in Grade 9 even after controlling for the substantial negative effects in Grade 8. Also, in the large U.S. High School and Beyond Study, Marsh (1991) demonstrated that for many outcomes there were new negative effects of school-average achievement at the end of high school even after controlling for those already experienced earlier in high school. This research suggests that the BFLPE is likely to increase in size during the initial period of adjustment after students are introduced to a major shift in the frame of reference and provides no evidence that the effect declines during the period students are in the same frame of reference. Clearly, as long as students remain in the same high school and schoolaverage achievement is relatively stable so that the immediate frame of reference remains reasonably stable, there is ample evidence that the BFLPE persists or may even increase in size. This is not surprising and is consistent with the rationale underpinning the imposed social comparison paradigm posited by Diener and Fujita (1997). Thus, for example, in classic research on conformity to group norms, Sherif and Hovland (1961) reported that experimentally induced changes in the frame of reference were still evident when participants subsequently made judgments after they were no longer part of the original group. However, a more demanding challenge is to evaluate the stability of the BFLPE on academic self-concept several years after graduation from high school, when the frame of reference based on other students from the same high school is not so salient and is no longer imposed by the immediate context. The current study is apparently unique in evaluating the stability of the BFLPE formed in high school for self-concept measures collected at the end of high school and several years after graduation. The Present Investigation Our major focus is to evaluate the long-term stability and persistence of the BFLPE. Based on two large, representative studies of German high school students, we evaluate the effects of high-school-average achievement levels on academic self-concepts at the end of high school and several years after graduation, a time interval of 2 years in Study 1 and 4 years in Study 2. In addition, because of the highly salient system of school tracking within the German education system, we are able to contrast the effects of school-average achievement with those of school type (i.e., the effects associated with attending highly academically selective schools, called the Gymnasium in the German system). Although the primary focus of our study is on individual and schoolaverage achievement based on a standardized achievement test, we also examine the effects of school grades. Of particular interest is the extent to which the negative effect of school-average achievement on academic selfconcept can be explained by the typical grade-on-a-curve phenomenon in which equally able students receive lower grades in more academically selective settings (Espenshade et al., 2005; Marsh, 1987). 7 Marsh et al. An important feature of this research is the appropriate implementation of a multilevel model analysis that is suitable for hierarchical data, in which participants (e.g., students) are nested with naturally occurring groups (e.g., schools). Taking advantage of the appropriate tests of cross-level interactions that are possible in multilevel models, we evaluate the generalizability of the BFLPE by testing the extent to which the negative effects of school-average achievement levels and school type (school-level variables) interact with individual student gender and student achievement levels (individual student-level variables). Based on research reviewed earlier (also see Figure 1), we propose the following a priori hypotheses and research questions to guide our analyses and presentation of results: 1. Both individual student achievement and school grades at the end of high school have positive effects on academic self-concept at T1 (the end of high school) and T2 (after graduation from high school). We ask whether additional effects of T1 achievement on T2 academic self-concept remain after controlling for the effects of T1 academic self-concept (i.e., how changes in academic self-concept over time are related to prior levels of achievement). 2. The effects of high-school-average achievement and high school type are negative for academic self-concept at T1 and T2. The effects of school-average achievement are predicted to be systematically larger than school-type effects and to be evident even after controlling for school type. However, we pose a research question regarding the size and direction of school-type effects after controlling for school-average achievement—whether the effects are merely diminished substantially, as might be expected if school-type functions as a rough, dichotomous proxy measure of school-average achievement, or whether attending a prestigious school type contributes positively to self-concept (assimilation) after controlling for the negative BFLPE (contrast) associated with school-average achievement. We also pose the research question of whether these effects of school-average achievement and school type on T2 academic self-concept remain after controlling for the effects of T1 academic self-concept (i.e., how changes in academic self-concept over time are related to school-average achievement and school type). 3. School grades are a salient source of feedback to students about their academic accomplishments. Because teachers typically grade on a curve, there is a frameof-reference effect for school grades (equally able students get lower school grades in schools and classes where other students are particularly able) that is similar to the BFLPE. Based on Marsh (1987), we predict that the BFLPE will be partially mediated by school grades—that the sizes of the negative effects of school-average achievement and school type are expected to be statistically significant but smaller after controlling for the effects of school grades. 4. As a research question, we ask whether the negative effects of school-average achievement and school type are generalizable over responses by boys and girls and over different levels of academic achievement. Although cross-level interactions between these school-level variables and individual student-level variables are typically small (see Marsh & Craven, 2002), there is not a sufficient basis for making directional hypotheses. However, the generalizability of the BFLPE across different groups of students has potentially important practical implications for understanding the BFLPE. 8 Stability of the BFLPE Study 1 Method The German School Setting The German school system is well known for its early and selective differentiation of students in different school types. Selection takes place after Grade 4 (in a few states after Grade 6), when students are about age 10. Although there is considerable variation across the German states in terms of the number and quality of tracks (Baumert, Trautwein, & Artelt, 2003; Trautwein et al., 2006), the majority of states have adopted a variant of the tripartite system of Hauptschule (least-demanding track), Realschule (intermediate track), and Gymnasium (highest track). Hauptschule and Realschule students graduate after Grade 9 or 10 and then typically enter the dual system, which combines part-time education at vocational school with on-thejob training. Students from traditional Gymnasium graduate after Grades 12 or 13, which is a prerequisite for university entrance. To overcome the strict differentiation between Gymnasium and the other school types, several German states now allow the best students from different tracks to enter upper-secondary education at the vocational Gymnasium and the comprehensive school in order to qualify for university. Approximately 30% of students in the German school system qualify to go to university. Because non-Gymnasium students leave secondary school after Grade 9 or 10, it was not possible to include these students in this investigation. Description of the Study and Sample The data for Study 1 come from a large, ongoing German study (Transformation of the Secondary School System and Academic Careers, or TOSCA), conducted at the Max Planck Institute for Human Development, Berlin, and the Institute for Educational Progress at the Humboldt University, Berlin (see Köller, Watermann, Trautwein, & Lüdtke, 2004). The data considered here are based on students from 147 randomly selected upper secondary schools in a single German state. The schools are representative of the traditional and vocational Gymnasium school types, which provide students with the qualifications to attend university. Students in the traditional Gymnasium are a particularly highly selected sample in terms of academic achievement. A multistage sampling procedure was implemented to ensure that the data were representative. Schools and students were randomly selected. The participation rate at the school level was 99%, and a satisfactory participation rate of more than 80% was achieved at the student level. At T1, the students in the sample were in their final year of upper secondary schooling, with a mean age of 19.51 years (SD = 0.77). Two trained research assistants administered materials in each school between February and May 2002. Students participated voluntarily, without any financial reward. At T1, all students were asked to provide written consent to be contacted again later for a second wave 9 Marsh et al. of data collection. At T2, participants were asked to complete an extensive questionnaire, taking about 2 hours, in exchange for a financial reward of 10€ (about U.S. $12). Because the focus of this investigation is on the stability of effects over time, actual analyses are based on responses by the 2,306 (48%) students who completed the math self-concept instrument at T1 (final year of high school) and T2 (2 years after graduation from high school). However, T1 school-average achievement was based on responses by all students from each of the 147 schools who completed the achievement tests administered at T1. Comparing students in the final sample with those in the original sample, students responding at T2 were more likely to be female, to be younger in age, to be more mathematically able based on both test scores and school grades, to come from a more academically selective school, and to have a higher math self-concept (all effect sizes were modest, varying between .1 and .3). Instruments Math self-concept. Math self-concept was based on the German adaptation of the Self Description Questionnaire III, a multidimensional self-concept instrument for late adolescents and young adults based on the Shavelson, Hubner, and Stanton (1976; Marsh & Shavelson, 1985) model. In the German adaptation (Schwanzer, Trautwein, Lüdtke, & Sydow, 2005), four researchers with English as a second language translated all original items independently of each other. Because the translation was intended to be used with a diverse sample of students, there was an emphasis on using simple, common words that would be easily understood by all students. Subsequently, using the assistance of a professional translator, the most appropriate translation was chosen (and in some instances refined). Extensive pilot testing resulted in a short German instrument with four items per scale with a 4-point (disagree to agree) response format. The four items chosen per scale emphasized cognitive evaluations (e.g., “I’m good at mathematics”) rather than affective items (e.g., “I like mathematics”) and had the highest factor loadings on their respective factors. The math self-concept items were administered at both T1 and T2. Marsh, Trautwein, Lüdtke, Köller, and Baumert (2006) demonstrated strong support for the convergent and discriminant validity of responses to this math self-concept scale. Thus, for example, math self-concept was substantially positively related to math school grades (.71), math standardized achievement test scores (.59), and taking advanced math courses (.51) but was nearly unrelated or even negatively related to English and German outcomes. Academic outcome measures. The mathematics achievement test consisted of original items from the Third International Mathematics and Science Study (TIMSS; e.g., Baumert, Bos, & Lehmann, 2000). A total mathematics achievement score was constructed using the original metric of the TIMSS study. Although the test was low stakes (i.e., the results had no consequences and student received no feedback on the results), seperate experimental studies 10 separate Stability of the BFLPE with random assignment to conditions showed that the addition of rewards contingent on informational feedback, grading, and performance had no effect on test performances (Baumert & Demmrich, 2001). However, in a separate experimental study with random assignment to conditions, Baumert and Demmrich (2001) showed that the addition of rewards contingent on informational feedback, grading, and performance had no effect on test performances. All schools used a school grade scale that ranged between 0 points (very low achievement) to 15 points (very high achievement). Grades were based on the school grades that students had received on their report cards that covered work completed over approximately 6 months. Actual grades (based on official school records) were available for students from a majority—but not all—of schools considered in Study 1. The correlation between self-reported grades and documented grades was r = .93 (p < .001) for students who had nonmissing values for both constructs. However, to minimize missing values and to maximize the similarity between Studies 1 and 2 (actual grades were not available in Study 2), self-reported grades were used in both studies. School-average measures of achievement were always based on test scores rather than on school grades. Statistical Analyses Recent BFLPE studies (e.g., Lüdtke, Köller, Marsh, & Trautwein, 2005; Marsh & Hau, 2003) have used multilevel modeling approaches. In most studies conducted in school settings, individual participant characteristics are confounded with those associated with groups (e.g., classrooms, schools, etc.) because such groups typically are not established according to random assignment. This clustering effect entails problems with respect to the appropriate levels of analysis, aggregation bias, and heterogeneity of regression (Raudenbush & Bryk, 2002). Participants in the same group are typically more similar to other participants in the same group than they are to participants in other groups. Even when participants are initially assigned at random, they tend to become more similar to each other over time. Furthermore, a variable may have a very different meaning when measured at different levels. For example, the BFLPE research reviewed here suggests that a measure of ability at the student level provides an indicator of a student attribute, whereas school-average achievement at the school level becomes a proxy measure of a school’s normative environment. Thus, the average achievement of a school has an effect on student self-concept above and beyond the effect of the individual student’s ability. Multilevel modeling is designed to resolve the confounding of these two effects by facilitating a decomposition of any observed relationship among variables, such as self-concept and ability, into separate within-school and between-school components (see Goldstein, 2003; Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). The juxtaposition of the effects of individual achievement and class-average achievement is inherently a multilevel issue that cannot be represented adequately at either the individual 11 Marsh et al. or the classroom level. A detailed presentation of multilevel modeling (also known as hierarchical linear modeling) is available elsewhere (e.g., Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). By basing the analyses on responses by participants with both T1 and T2 data, there were very few missing data (less than 1% missing responses). Although missing data was not an important problem for the final sample, we used the expectation maximization algorithm, a widely recommended approach to imputation for missing data, as operationalized in SPSS (Version 11.5) to impute missing values for T1 math self-concept (T1MSC), math grades, and math test scores. This procedure is preferable to traditional procedures such as listwise and pairwise deletion for missing data. Based on strong theoretical models, it is appropriate and desirable to posit causal effects and to test causal predictions based on appropriate statistical analyses. However, even in rigorous studies, causality cannot be proven; it can only be shown that the data are consistent with strong tests of causality. Hence, we have been careful to use the generic term predictive effect, which does not imply causality, when discussing the actual results of our statistical analyses. Multilevel Models All multilevel analyses in this investigation were conducted with the commercially available MLwiN statistical package (Version 2; see Rasbash, Steele, Browne, & Prosser, 2005; also see Goldstein, 2003) that is specifically designed to analyze multilevel data. To test the a priori predictions and research questions, a set of eight multilevel models was evaluated. In each model, either T1MSC or T2 math self-concept (T2MSC) was the dependent variable. The overall strategy was to begin with the most basic tests of the BFLPE and then evaluate the consequences of adding variables or interaction terms in relation to a priori predictions or research questions. In this sense, all the models are a priori models. In Models 1, 2, and 3, we tested the effects of the BFLPE based on individual achievement test scores in combination with school-average achievement (Model 1, the traditional basis of the BFLPE), school type (1 = traditional Gymnasium, a selective school; 0 = other), and the combination of both school-average achievement and school type. The purpose of these comparisons was to determine whether attending a selective school has a positive (assimilation) effect on self-concept that offsets the negative (contrast) effect of school-average achievement. For each of these models, separate analyses were conducted for T1MSC (Models 1A, 2A, and 3A), T2MSC (Models 1B, 2B, and 3B), and T2MSC with T1MSC included as a predictor variable (Models 1C, 2C, and 3C). The purpose of these models was to compare the sizes of effects near the end of high school (T1) and several years after graduation from high school (T2) and to determine whether the effects of school-average achievement continued to have negative effects at T2 after controlling for the effects observed at T1. 12 Stability of the BFLPE Models 4, 5, and 6 paralleled the first three models except that they included both achievement tests scores (as in the first three models) and high school grades as predictor variables. The purpose of these models was to compare the effects of test scores and school grades on academic selfconcept at T1, T2, or T2 controlling for T1. Particularly relevant is the extent to which the BFLPE associated with school-average achievement is mediated by school grades. In Model 7, we explore whether the size of the BFLPE based on schoolaverage achievement is moderated by gender or individual achievement, whether the effect is larger or smaller for girls than for boys, and whether the effect is larger or smaller for students who are initially more able. This is accomplished by adding appropriate main and interaction effects to those predictor variables in Model 4. In Model 8, we conduct parallel analyses in which the BFLPE is based on school type (selective vs. nonselective) rather than school-average achievement. To facilitate the interpretation of the results and to reduce potential multicollinearity problems, we began by standardizing (z scoring) all variables to have M = 0 and SD = 1 across the entire sample (Appendix, Study 1). Schoolaverage achievement was determined by taking the average of achievement scores for students in each school (but not restandardizing these scores so that individual student and school-average achievement scores were in the same metric). Product terms were used to test interaction effects. In constructing these product variables, we used the product of standardized (z score) variables, but the product terms were not restandardized. In this respect, all parameter estimates based on multilevel models in this investigation are standardized parameter estimates in which the standardized metric is based on the metric of the individual-level variables (for further discussion, see Marsh & Rowe, 1996; also see Aiken & West, 1991; Raudenbush & Bryk, 2002). Total effects can be divided into direct and indirect (or mediated) effects when there is a potential mediating variable. In this investigation, for example, we predict that the negative effects of school-average ability on T2MSC will be largely mediated by T1MSC. It is, of course, relevant to evaluate the statistical significance of mediated effects using appropriate procedures such as those outlined by Krull and MacKinnon (2001) that are applied to multilevel models. In this investigation, whenever we discuss mediated effects, we tested the statistical significance of these effects using what Krull and MacKinnon (p. 259) refer to as the 2-1-1 model in which the first variable in the causal chain is a grouplevel variable (school-average ability in this investigation). Preliminary Analyses: School-Level Variation in Test Scores, Grades, and Self-Concept In preliminary analyses, we evaluated the multilevel structure of math achievement, math school grades, and math self-concept. Consistent with a priori predictions based on previous research, there was substantial variation between schools in terms of math achievement but substantially less variation 13 Marsh et al. between schools in math school grades and particularly in math self-concept. These differences are illustrated in terms of the caterpillar plots (Figure 2). Thus, for example, in Figure 2A, the 147 schools (each school is represented by one of the 147 vertical lines) are ranked in terms of math achievement, and a 95% error bar shows how variable scores are within a given school. For math achievement, many schools are clearly above or below the grand mean across all schools (zero, because these scores were standardized). School intercepts vary from more than a standard deviation below the mean to half a standard deviation above the mean. Consistent with the grade-on-a-curve effect, there is much less variation in school grades (Figure 2B). In contrast to math achievement, there are only a few error bars representing different schools that do not overlap with the mean grade across all schools. Consistent with the BFLPE, there is even less school-to-school variation in math self-concepts (Figure 2C). The intraschool correlation is an index of the amount of variation in each outcome that can be explained in terms of differences between schools. Whereas the intraschool correlation is substantial for math achievement (.26), it is much smaller for math school grades (.04) and even smaller for math self-concept (.02). In summary, consistent with the grade-on-a-curve effect and the BFLPE, there is substantial school-to-school variation in math achievement scores but much less variation in math school grades and math self-concept. Results and Discussion The BFLPE In each of a set of multilevel models, math self-concept responses are predicted from different combinations of individual-level and school-level variables specifically constructed to assess a priori hypotheses and research questions. For each model, three sets of analyses were conducted in which the dependent variable was T1MSC (Table 1, Models 1A, 2A, and 3A), T2MSC (Table 1, Models 1B, 2B, and 3B), and T2MSC in which T1MSC was included as a predictor variable to assess change in math self-concept (Table 1, Models 1C, 2C, and 3C). The predictive effect of individual achievement is substantial for T1MSC (.68; Model 1A in Table 1) and for T2MSC (.63; Model 1B, Table 1). In Model 1C (Table 1), T1MSC has the largest predictive effect on T2MSC (.73), but the predictive effect of individual achievement is still highly significant (.15). Hence, individual achievement test scores from T1 have a significant direct predictive effect on T2MSC beyond the substantial indirect predictive effect that is mediated by T1MSC. For present purposes, the most important results of Model 1 are the predictive effects of school-average achievement, the BFLPE. This predictive effect is statistically significant, substantial, and negative for T1MSC (–.39, Model 1A) and nearly as large for T2MSC (–.34, Model 1B). Hence, there is a substantial negative predictive effect of school-average achievement on T2MSC collected 2 years after graduation from high school. Furthermore, 14 Stability of the BFLPE Figure 2. School-to-school variation. (A) Math achievement: Each of the 147 vertical lines represents the school intercept (in terms of math achievement) with +/– 1.96 SEs). Schools are ranked in terms of math achievement (from an error bar (+ lowest to highest). (B) Math school grades: Each of the 147 vertical lines repre+/– 1.96 sents the school intercept (in terms of math grades) with an error bar (+ SEs). (C) Math self-concept: Each of the 147 vertical lines represents the school +/– 1.96 SEs). Individual intercept (in terms of math self-concept) with an error bar (+ student scores for achievement, grades, and self-concept were standardized (M = 0, SD = 1) across all students. 15 Marsh et al. even after controlling for the substantial negative predictive effect of schoolaverage achievement on T1MSC, school-average achievement still has a small, statistically significant, negative predictive effect on T2MSC (–.07, Model 1C). Hence, school-average achievements from T1 have a small (statistically significant) negative direct predictive effect on T2MSC beyond the substantial indirect negative predictive effect that is mediated by T1MSC. This implies that school-average achievement has new negative predictive effects on math self-concept after graduation from high school, beyond what can be explained in terms of the earlier negative predictive effects on T1MSC collected at the end of high school. Hence, at least from this perspective, the negative predictive effect of school-average achievement grows larger after graduation from high school rather than diminishing. Model 2 (Table 1) is essentially parallel to Model 1 except that the BFLPE is represented by school type (i.e., attending a highly selective traditional Gymnasium) rather than school-average achievement. Whereas the negative predictive effects of school type in Model 2 are systematically smaller than the corresponding predictive effects based on school-average achievement in Model 1, the pattern and direction of predictive effects are similar. The only major difference is that the negative predictive effect of school type on T2MSC after controlling for T1MSC (Model 2C) is not statistically significant (whereas the corresponding predictive effect of school-average achievement in Model 1 is significantly negative). In Model 3 (Table 1) we included both school-average achievement and school type as predictor variables. Not surprisingly, given that these variables are substantially correlated, the unique predictive effects of each are systematically smaller than in the corresponding analysis, in which only one or the other of these school-level variables was included. However, what is interesting is the fact that both of these school-level variables have significantly negative predictive effects on T1MSC (Model 3A) and T2MSC (Model 3B). Thus, for example, whereas the negative predictive effect of school-average achievement was –.39 in Model 1A (which did not include school type), the corresponding predictive effect was –.25 in Model 3A (which did include school type). Similarly, whereas the negative predictive effect of school type was –.20 in Model 2A (which did not include school-average achievement), the corresponding predictive effect was –.12 in Model 3A (which did include schoolaverage achievement). In summary, the results of Models 1, 2, and 3 provide clear support for the BFLPE and for the main a priori prediction of the present investigation. The predictive effect of school-average achievement (the BFLPE) is substantially negative at the end of high school and continues to be substantially negative 2 years after graduation from high school. Furthermore, there are new negative predictive effects of school-average achievement on T2MSC beyond the substantial negative predictive effects that are mediated by T1MSC. Hence, the predictive effects of school-average achievement and school type continue to have substantial negative predictive effects on math self-concept several years after graduation from high school. On this basis, we conclude that the BFLPE is a persistent, long-lasting phenomenon. 16 17 Model 1A T1MSC b (SE) Model 1B T2MSC b (SE) Model 1C ∆T2MSC b (SE) Model 1: SchoolAverage Achievement Model 2A T1MSC b (SE) Model 2B T2MSC b (SE) Model 2C ∆T2MSC b (SE) Model 2: School Type Model 3A T1MSC b (SE) Model 3B T2MSC b (SE) Model 3C ∆T2MSC b (SE) Model 3: School-Average Achievement and School Type (continued) Fixed effects Level 1: Individual student predictors Math tests .68 (.02)* .63 (.02)* .15 (.02)* .65 (.01)* .60 (.02)* .14 (.02)* .68 (.04)* .64 (.02)* .15 (.02)* T1 math self-concept .71 (.02)* .71 (.02)* .71 (.04) Level 2: School-level predictors School-average math test –.39 (.04)* –.34 (.04)* –.07 (.03)* –.25 (.04)* –.23 (.05)* –.05 (.03) School type –.20 (.02)* –.17 (.02)* –.03 (.02) –.12 (.02)* –.10 (.02)* –.01 (.02) Residual variance components Level 2 school .02 (.007)* .01 (.006) .00 (.003) .01 (.006)* .01 (.006) .00 (.003) .01 (.005) .00 (.005) .00 (.003) Level 1 students .63 (.019)* .68 (.021)* .36 (.013)* .63 (.019)* .68 (.021)* .36 (.011)* .63 (.019)* .68 (.021)* .36 (.011)* Deviance summary 5,530.0 5,673.2 4,176.0 5,530.7 5,679.0 4,178.0 5,505.2 5,656.3 4,175.6 Variables Table 1 Study 1: Stability of the Big-Fish-Little-Pond Effect: Effects of School-Average Achievement, School Type, and Stability Over Time 18 .44 (.02)* .44 (.02)* .15 (.02)* .13 (.02)* .63 (.02)* Model 4C ∆T2MSC b (SE) .42 (.02)* .44 (.02) Model 5B T2MSC b (SE) .14 (.02)* .13 (.02)* .63 (.02)* Model 5C ∆T2MSC b (SE) .46 (.01)* .48 (.01)* Model 6A T1MSC b (SE) .44 (.02)* .43 (.02)* Model 6B T2MSC b (SE) .15 (.02)* .13 (.02)* .63 (.02) Model 6C ∆T2MSC b (SE) –.13 (.03)* –.12 (.04)* –.04 (.03) –.10 (.01)* –.11 (.02)* –.03 (.01)* –.10 (.02)* –.08 (.02)* –.01 (.02) .44 (.01)* .49 (.01)* Model 5A T1MSC b (SE) Model 6: School-Average Achievement and School Type .01 (.005)* .01 (.005) .00 (.003) .01 (.004)* .00 (.004) .00 (.003) .00 (.004) .00 (.004) .00 (.003) .44 (.013)* .52 (.016)* .35 (.010)* .44 (.013)* .52 (.016)* .35 (.010)* .44 (.013)* .52 (.016)* .35 (.010)* 4,687.0 5,058.9 4,110.2 4,674.7 5,054.6 4,111.0 4,663.1 5,046.0 4,109.5 –.25 (.03)* –.21 (.04)* –.06 (.03)* .46 (.02)* .49 (.02)* Model 4B T2MSC b (SE) Model 5: School Type Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept ; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; schoolaverage math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based on responses by 2,306 students who completed the math self-concept instrument at Time 2. * p < .01. Fixed effects Level 1: Individual student predictors Math tests Math grades T1MSC Level 2: School-level predictors School-average math test School type Residual variance components Level 2 school Level 1 students Deviance summary Variables Model 4A T1MSC b (SE) Model 4: SchoolAverage Achievement Table 1 (continued) Stability of the BFLPE The Predictive Effect of School Grades on the BFLPE Models 4, 5, and 6 (Table 1) largely parallel Models 1, 2, and 3. The main difference is that individual student grades are also included in each of the models as a predictor of math self-concept. Not surprisingly, math grades have a substantial positive predictive effect on math self-concept, and a substantial portion of the predictive effect of math achievement on math selfconcept (as shown in Model 1) can be explained in terms of math grades. Thus, for example, in Model 4A the combined predictive effect of individual achievement (.46) and grades (.49) is substantially larger than the predictive effect of either of these individual student variables considered alone. Of particular relevance is the BFLPE—the predictive effects of the school-level variables. As predicted on the basis of the grade-on-a-curve effect, the BFLPE is systematically smaller in models that include school grades. Equally able students get lower grades in schools where the school-average achievement level is higher and, in this investigation, in academically selective schools (i.e., the Gymnasium schools). Hence, part of the negative predictive effect of these school-level variables is mediated by school grades. However, the predictive effects of school-average achievement (Model 4) and school type (Model 5) are still significantly negative for both T1MSC and T2MSC. Furthermore, the predictive effects of school-average achievement (Model 4C) and school type (Model 5C) continue to have a significant predictive effect on T2MSC even after controlling for T1MSC. Generalizability of the BFLPE: Interactions With Gender and Individual Achievement In Models 7 and 8 (Table 2), we added individual student gender and interaction effects involving gender and individual student ability to models already considered. Whereas girls tend to have lower math self-concepts than boys do, this gender difference only reaches statistical significance in Model 7A. Gender does, however, interact with individual student achievement such that the predictive effect of math achievement on math self-concept is somewhat stronger for girls than for boys. The BFLPE (negative predictive effects of school-average achievement in Model 7 and school type in Model 8) also varies somewhat as a function of gender. In each case, the BFLPE is somewhat larger for girls than for boys. Of particular relevance to this investigation, the negative predictive effects of school-average achievement (Model 7) and school type (Model 8) do not vary substantially with individual student achievement. Whereas there is a marginally significant interaction with school type for T1MSC (Model 8A), the predictive effect is not significant for T2MSC, nor is the interaction with school-average achievement significant for T1MSC or T2MSC (Model 7). In summary, the results of these extended models suggest that the BFLPE is robust, generalizing reasonably well at different levels of individual student achievement and gender. 19 20 –.20 (.04)* –.10 (.04)* .00 (.03) –.24 (.04)* –.10 (.02)* .01 (.03) .01 (.005) .52 (.016)* 5,044.7 –.03 (.02) .06 (.02)* –.03 (.01)* .04 (.02)* .01 (.004)* .44 (.013)* 4,674.0 .46 (.02)* .44 (.01)* Model 7B T2MSC b (SE) .45 (.02)* .49 (.02)* Model 7A T1MSC b (SE) .00 (.003) .35 (.010)* 4,104.8 –.04 (.03) .00(.03) –.06 (.03) .15 (.02)* .13 (.02)* .63 (.02)* –.01 (.01) .03 (.01)* Model 7C ∆T2MSC b (SE) Model 7: School-Average Achievement .01 (.004)* .44 (.013)* 4,649.7 –.05(.02)* .04 (.02)* .13 (.02)* –.01 (.02) .04 (.02)* .44 (.02)* .48 (.01)* Model 8A T1MSC b (SE) .00 (.004) .51 (.016)* 5,024.6 –.07(.02)* .03 (.02) –.11(.02)* –.01 (.02) .06 (.02)* .43 (.02)* .43 (.02)* Model 8B T2MSC b (SE) Model 8: School Type .00 (.003) .35 (.010)* 4,100.1 –.04(.02)* .01 (.01) –.03(.01)* .15 (.02)* .13 (.02)* .62 (.02)* –.00 (.01) .03 (.01)* Model 8C ∆T2MSC b (SE) Note.T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC. School-average math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based on responses by 2,306 students who completed the math self-concept instrument at Time 2. *p < .05. Fixed effects Level 1: Individual student predictors Math test Math grades T1MSC Sex (0 = male, 1 = female) Sex × Math Test Level 2: School-level predictors School-average math test School type Cross-level interactions School-Average Math Test × Sex School-Average Math Test × Math Test School Type × Sex School Type × Individual Achievement Residual variance components Level 2 school Level 1 students Deviance summary Variables Table 2 Study 1: Generalizability of the Big-Fish-Little-Pond Effect: Interactions With Gender and Individual Achievement Stability of the BFLPE Study 2 As described earlier, the primary purpose of Study 2 is to extend tests of the stability of the BFLPE over a longer period of time. The main difference between the two studies is that the T1-T2 interval is 4 years in Study 2 compared to 2 years in Study 1. Despite this substantially longer interval, a priori predictions and research questions are the same in both studies (see earlier discussion). Method Description of the Study and Sample Study 2 is based on a subsample of the longitudinal Learning Processes, Educational Careers and Psychosocial Development in Adolescence study. This investigation was conducted by the Max Planck Institute for Human Development in Berlin (for more general descriptions of the study and resulting database, see Baumert et al., 1996). The data analyzed in this article are based on students from 94 randomly selected upper secondary schools in four German states: two in East Germany and two in West Germany. Schools were randomly sampled in each participating state. The schools are representative of the traditional Gymnasium school type and the comprehensive upper secondary school; both school types provide students with the qualifications to attend university. Gymnasium students in the traditional Gymnasium are a particularly highly selected sample in terms of academic achievement. Trained research assistants administered materials in each school at the end of the 1996–1997 school year (T1). At T1, all students were asked to provide written consent to be contacted again for a second wave of data collection. At T2, 4 years later, participants were asked to complete an extensive questionnaire taking about 2 hours in exchange for a financial reward of 15€ (about U.S. $18). Because our focus is on the stability of predictive effects over time, actual analyses are based on responses by the 1,758 (42%) students who completed the math self-concept instrument at both T1 (1 year prior to graduation from high school) and T2 (3 years after graduation from high school). The mean age of students at T1 was 18.55 (SD = 0.56). However, as in Study 1, T1 school-average achievement was based on responses by all students from each of the 94 schools who completed the achievement tests administered at T1. As in Study 1, students in the final sample compared to students in the original sample were more likely to be female, to be younger in age, to be more mathematically able based on both test scores and school grades, to come from a more academically selective school, and to have a higher math self-concept (all effect sizes were modest, varying between .1 and .3). Instruments Math self-concept. T1MSC was assessed using a short, 5-item scale. This standard German measure of the construct has been shown to be reliable and valid in large-scale survey studies conducted over the past 25 years (Möller & 21 Marsh et al. Köller, 2001, 2004; Trautwein, Lüdtke, Köller, & Baumert, 2006). An example item is, “Nobody’s perfect, but I’m just not good at math.” Students responded to each item on a 4-point scale (1 = totally agree to 4 = totally disagree). Total scores were computed by aggregating all items, resulting in scores ranging from 1 (low self-concept) to 4 (high self-concept). T2MSC was assessed by the German adaptation of the Self Description Questionnaire III math selfconcept instrument that was used in Study 1. To test the comparability of the math self-concept scales used at T1 and T2 in Study 2, a small subsequent study was conducted. A self-concept questionnaire that included the two scales was administered to 115 university students (54.8% women) from various fields of study at a Berlin university. The nine math self-concept items were subjected to a confirmatory factor analysis in which the five items from one scale were used to define one latent factor and the four items from the second scale defined a second latent factor. The latent correlation between the two factors was r = .97, indicating that both scales measured a similar underlying math self-concept construct. Academic outcome measures. The items on the math achievement test were taken from previous national and international studies, in particular the International Association for the Evaluation of Educational Achievement’s First and Second International Mathematics Study, the Third International Mathematics and Science Study (Beaton et al., 1996; Husén, 1967; Robitaille & Garden, 1989) and an investigation conducted at the Max Planck Institute for Human Development (Baumert, Roeder, Sang, & Schmitz, 1986). Curriculum experts had assessed the curricular validity of all items beforehand. Individual achievement scores were calculated on the basis of item response theory. Internal consistency estimates of reliability were above .80. School grades in mathematics were based on student self-reports of grades from their previous report cards. Statistical Analyses Statistical analyses were based on multilevel analyses of the same set of a priori path models considered in Study 1. Because we based analyses on responses by participants with both T1 and T2 data, there were few missing data, and we again used the expectation maximization algorithm to impute missing values for T1MSC, math grades, and math test scores. As described in Study 1, we began by standardizing (z scoring) all variables to have M = 0 and SD = 1 across the entire sample (Appendix, Study 2). School-average achievement was determined by taking the average of achievement scores for students in each school (but not restandardizing these scores so that individual student and school-average achievement scores were in the same metric). Product terms were used to test interaction. Again, intraschool correlations indicated that school-level variation was substantial for math achievement test scores (.23) but substantially less for math school grades (.07) and particularly for math self-concept (.02). 22 Stability of the BFLPE Results and Discussion The same set of models evaluated in Study 1 is considered in Study 2. Again, the main focus of these analyses is on the predictive effects of schoollevel variables (school-average achievement and school type) on math selfconcept assessed at 1 year prior to the completion of high school (T1) and 3 years after graduation from high school (T2). The BFLPE The predictive effect of individual achievement is substantial on T1MSC (.47; Model 1A in Table 3) and T2MSC (.46; Model 1B, Table 3). In Model 1C (Table 3), T1MSC has the largest predictive effect on T2MSC (.61), but the predictive effects of individual achievement are still highly significant (.18). Hence, individual achievement test scores from T1 have a significant direct predictive effect on T2MSC beyond the substantial indirect predictive effect that is mediated by T1MSC. The BFLPE (the negative predictive effect of school-average achievement in Model 1) is statistically significant and substantial for T1MSC (–.28, Model 1A) and for T2MSC (–.21, Model 1B). Hence, there is a substantial negative predictive effect of school-average achievement on T2MSC collected 4 years later. However, after controlling for the substantial negative predictive effect of school-average achievement on T1MSC, the negative predictive effect of school-average achievement (based on T1 achievement) on T2MSC (–.03, Model 1C) is not statistically significant. This implies that school-average achievement has no additional negative predictive effects on math selfconcept after graduation from high school beyond what can be explained in terms of the earlier negative predictive effects on T1MSC collected 1 year before the end of high school. Model 2 (Table 3) is essentially parallel to Model 1 except that the BFLPE is represented by school type (i.e., attending an academically selective high school) rather than school-average achievement. Whereas the negative predictive effects of school type in Model 2 are systematically smaller than the corresponding predictive effects based on school-average achievement in Model 1, the pattern and direction of predictive effects are similar. In Model 3, both school-average achievement and school type are included as predictor variables. Because these variables are substantially correlated, the unique predictive effects of each are smaller than in the parallel analysis in which the other was not included. Nevertheless, both of these school-level variables had significantly negative predictive effects on T1MSC (Model 3A) and T2MSC (Model 3B). Thus, for example, whereas the negative predictive effect of school-average achievement was –.28 in Model 1A (which did not include school type), the corresponding predictive effect was –.24 in Model 3A (which did include school type). In summary, the results of Models 1, 2, and 3 provide clear support for the BFLPE and for the main a priori prediction of this investigation. The predictive effects of school-average achievement and school type (the BFLPE) are 23 24 Model 1A T1MSC b (SE) Model 1B T2MSC b (SE) Model 1C ∆T2MSC b (SE) Model 1: SchoolAverage Achievement Model 2A T1MSC b (SE) Model 2B T2MSC b (SE) Model 2C ∆T2MSC b (SE) Model 2: School Type Model 3A T1MSC b (SE) Model 3B T2MSC b (SE) Model 3C ∆T2MSC b (SE) Model 3: School-Average Achievement and School Type (continued) Fixed effects Level 1: Individual student predictors Math grades .47 (.02)* .46 (.02)* .18 (.02)* .45 (.02)* .44 (.02)* .18 (.02)* .47 (.02)* .46 (.02)* .18 (.02)* T1MSC .61 (.02)* .61 (.02)* .61 (.02)* Level 2: School-level predictors School-average math test –.28 (.06)* –.21 (.05)* –.03 (.04) –.24 (.05)* –.15 (.06)* –.01(.05) School type –.09 (.02)* –.08 (.02)* –.03 (.02) –.04 (.03) –.05 (.02)* –.02 (.02) Residual variance components Level 2 school .01 (.008) .00 (.006) .00 (.004) .02 (.009)* .00 (.006) .00 (.001) .01 (.008) .00 (.001) .00 (.004) Level 1 students .80 (.028)* .81 (.028)* .51 (.017)* .80 (.028)* .82 (.028)* .51 (.018)* .80 (.028)* .81 (.027)* .51 (.018)* Deviance summary 4,628.2 4,630.6 3,812.7 4,637.1 4,633.1 3,811.4 4,626.1 4,626.9 3,811.43 Variables Table 3 Study 2: Stability of the Big-Fish-Little-Pond Effect: Effects of School-Average Achievement, School Type, and Stability Over Time 25 –.02(.04) .16 (.02)* .06 (.02)* .58 (.02)* Model 4C ∆T2MSC b (SE) .00 (.001) .00 (.004) .74 (.025)* .51 (.017)* 4,454.1 3,802.7 –.13 (.05)* –.18 (.06)* .02 (.008)* .66 (.023)* 4,289.8 .35 (.02)* .30 (.02)* .32 (.02)* .40 (.02)* Model 4B T2MSC b (SE) .02 (.008)* .66 (.023)* 4,291.1 –.07 (.02)* .31 (.01)* .41 (.01)* Model 5A T1MSC b (SE) .00 (.001) .74 (.025)* 4,452.2 –.06 (.02)* .34 (.02)* .30 (.02) Model 5B T2MSC b (SE) .00 (.004) .51 (.017)* 3,801.4 –.02 (.02) .17 (.02)* .06 (.02)* .58 (.02)* Model 5C ∆T2MSC b (SE) Model 5: School Type .02 (.008)* .66 (.023)* 4,287.7 –.13 (.06)* –.04 (.03) .32 (.02)* .40 (.02)* Model 6A T1MSC b (SE) .00(.05) –.03(.02) .17 (.02)* .06 (.02)* .58 (.02) Model 6C ∆T2MSC b (SE) .00 (.001) .00 (.004) .74 (.025)* .51 (.017)* 4,450.5 3,801.4 –.08(.06) –.05 (.02)* .35 (.02)* .30 (.02)* Model 6B T2MSC b (SE) Model 6: School-Average Achievement and School Type Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; school-average math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based on responses by 1,758 students who completed the math self-concept instrument at Time 2. *p < .01. Fixed Effects Level 1: Individual student predictors Math tests Math grades T1MSC Level 2: School-level predictors School-average math test School type Residual variance components Level 2 school Level 1 students Deviance summary Variables Model 4A T1MSC b (SE) Model 4: School-Average Achievement Marsh et al. substantially negative near the end of high school and continue to be substantial and negative 4 years later. Whereas there are no new negative predictive effects of school-average achievement or school type on T2MSC after controlling for the negative predictive effects of these school-level variables on T1MSC, there are no indirect positive predictive effects of these school-level variables to offset the substantial negative predictive effects that are largely mediated by T1MSC. Hence, the predictive effects of school-average achievement and school type continue to have substantial negative predictive effects on math self-concept 3 years after graduation from high school. On this basis, we argue that the BFLPE is a persistent, long-lasting phenomenon. The Predictive Effect of School Grades on the BFLPE In Models 4, 5, and 6 (Table 3) individual student grades are added to variables considered in Models 1, 2, and 3. Both individual achievement and grades have substantial positive predictive effects on math self-concept. Thus, for example, the predictive effect of individual math achievement on T1MSC is .47 in Model 1A, but only .32 in Model 4A. What is surprising, perhaps, is that the predictive effect of math achievement in Model 4A is nearly as large as the predictive effect of math grades (.40). The combined predictive effects of individual math achievement and math school grades on math self-concept are substantially larger than the predictive effect of either of these individual student variables considered alone. For our purposes, the most important component of Models 4, 5, and 6 is the BFLPE—the predictive effects of the school-level variables. As predicted on the basis of the grade-on-a-curve effect, the BFLPE is systematically smaller in models that include school grades. Importantly, however, the predictive effects of school-average achievement (in Model 4) and school type (Model 5) are still significantly negative for both T1MSC and T2MSC. Indeed, whereas the sizes of these negative predictive effects are smaller, the pattern of significant predictive effects is nearly the same in Models 3, 4, and 5 (which include math grades) as the corresponding predictive effects in Models 1, 2, and 3 (which do not include math grades). Generalizability of the BFLPE: Interactions With Gender and Individual Achievement In Models 7 and 8 (Table 4), we added individual student gender, and interaction effects involving gender and individual student ability, to models already considered. Although (based on previous findings) it is not surprising that girls tend to have lower math self-concepts than boys do, the predictive effect of math achievement on math self-concept does not vary as a function of gender. Of more particular relevance to this investigation is the question of whether the BFLPE (the negative predictive effects of school-average achievement or school type) varies as a function of individual student achievement or individual student gender. Of the total of 12 interactions in Models 7 and 8, only 1 is marginally significant. For T1MSC, the negative predictive effect of school-average 26 27 –.12 (.05)* –.06 (.05) –.05 (.05) –.18 (.05)* .02 (.05) –.09 (.04)* .00 (.001) .72 (.024)* 4,416.1 –.12 (.02)* .03 (.02) –.15 (.02)* .00 (.02) .01 (.007)* .64 (.022)* 4,232.2 .33 (.02)* .30 (.02)* Model 7B T2MSC b (SE) .29 (.02)* .41 (.02)* Model 7A T1MSC b (SE) .00 (.004) .50 (.017)* 3,795.1 –.06 (.04) .00 (.04) –.02 (.04) .16 (.02)* .07 (.02)* .58 (.02)* –.04 (.02)* .03 (.02) Model 7C ∆T2MSC b (SE) Model 7: School-Average Achievement .02 (.007)* .64 (.022)* 4,238.4 .00 (.02) –.02 (.02) –.07 (.03)* –.15 (.02)* .01 (.02) .28 (.02)* .42 (.01)* Model 8A T1MSC b (SE) .00 (.001) .72 (.024)* 4,416.5 –.02 (.02) –.01 (.03) –.06 (.03)* –.12 (.02) .03 (.02) .32 (.02)* .30 (.02)* Model 8B T2MSC b (SE) Model 8: School Type .00 (.004) .50 (.017)* 3,794.0 –.02 (.02) .01 (.02) –.02 (.02) .16 (.02)* .074 (.02)* .58 (.02)* –.04 (.02)* .02 (.02) Model 8C ∆T2MSC b (SE) Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; ∆T2MSC = Time 2 math self-concept controlling for T1MSC; school average math test = school average of math achievement test scores; school type: 1 = selective Gymnasium, 0 = other. All parameter estimates are statistically significant when they differ from zero by more than 2 standard errors. All outcome and predictor variables were standardized (M = 0, SD = 1) at the individual student level so that parameter estimates are standardized in relation the mean and standard deviation of individual-level variables. Analyses are based on responses by 1,758 students who completed the math self-concept instrument at Time 2. *p < .05. Fixed effects Level 1: Individual student predictors Math test Math grades T1MSC Sex (0 = male, 1 = female) Sex × Math Test Level 2: School-level predictors School-average math test School type Cross-level interactions School-Average Math Test × Sex School-Average Math Test × Math Test School Type × Sex School Type × Individual Achievement Residual variance components Level 2 school Level 1 students Deviance summary Variables Table 4 Study 2: Generalizability of the Big-Fish-Little-Pond Effect: Interactions With Gender and Individual Achievement Marsh et al. achievement is greater for more able students. This predictive effect is not, however, significant for T2MSC, nor do the negative predictive effects of school type vary with individual levels of student achievement. None of the interactions with individual student gender is significant, indicating that the size of the BFLPE is similar for boys and for girls. In summary, the results of these extended models suggest that the BFLPE is robust, generalizing reasonably well at different levels of individual student achievement and gender. General Discussion for Studies 1 and 2 Is the BFLPE a Persistent, Long-Lasting Phenomenon? The overarching purpose of this investigation is to test a priori predictions that the BFLPE associated with school-average achievement and school type is clearly evident at the end of high school and that these effects are still evident for participants several years after graduation from high school. In both studies, the BFLPE is substantial for both T1MSC and for T2MSC. Hence, there is clear support for a priori predictions that the BFLPE is a persistent, long-lasting phenomenon. A critical and apparently unique contribution of this investigation is the evaluation of the BFLPE (negative predictive effects of school-average achievement and school type) on academic self-concept while students were in high school (T1) and again several years after they had graduated from high school (T2). Obviously, this is a more demanding test of the BFLPE than simply assessing the size of the BFLPE on a single occasion while students are in high school or even tests of the BFLPE based on more than one occasion during high school. Whereas there now exists clear support for the BFLPE during high school, a focus of this investigation was whether this predictive effect remained after students had graduated from high school and were no longer in the high school setting in which the BFLPE was established. Although it might be reasonable to suggest that the BFLPE should dissipate over time, the results of the present investigation show that the predictive effects are persistent and continue to have a substantial predictive effect on academic selfconcepts long after graduation from high school. In evaluating the stability of the BFLPE, it is critical to interpret carefully the results from Model A (based on T1MSC), Model B (based on T2MSC), and Model C (based on T2MSC controlling for the predictive effects of T1MSC). Results based on Models A and B show that the negative predictive effects of the BFLPE are statistically significant and substantial near the end of high school and again several years after graduation. If the BFLPE was a short-term, transitory phenomena, one might expect that the predictive effect of school-level achievement and school type would only be substantial for T1MSC (Model A) but not for T2MSC (Model B). However, once the negative predictive effects of these school-level variables on T1MSC had been controlled, a substantial diminution of the BFLPE should result in a positive direct effect of these schoollevel variables for T2MSC (Model C, also see earlier discussion of Figure 1B). 28 Stability of the BFLPE Thus, the indirect negative BFLPE mediated by T1MSC would be offset to a large extent by a positive, direct effect of high-school-average achievement on T2MSC. Hence, even though the BFLPE might be negative for T2MSC when T1MSC is not controlled, the direct predictive effect of school-average ability might be positive for T2MSC after controlling for the negative predictive effects mediated through T1MSC. There was, however, absolutely no support for such alternative speculations. Indeed, for Study 1 there was a small but statistically negative predictive effect of school-average achievement (Model 1C, Table 1) on T2MSC even after controlling for the substantial negative predictive effects of school-average achievement on T1MSC. This implies that there were new, more negative predictive effects of school-average achievement after graduation from high school (T2) in addition to the substantial negative predictive effects already experienced during high school (T1). Whereas this pattern has been found in previous research based on two data collections during high school (e.g., Marsh, 1991), this is apparently the first such finding following graduation from high school. In Study 2, the corresponding negative direct predictive effect on T2MSC after controlling for T1MSC was not statistically significant (Model 1C in Table 3). Importantly, however, there was no statistically significant positive predictive effect of school-average achievement, as would be expected if there were a systematic diminution of the BFLPE over time. Hence, whereas there was no evidence that the size of the BFLPE increased during the period following high school in Study 2, it did not decrease. Furthermore, there are differences between the two studies that might account for these different results. In particular, the time lag in Study 1 was only about 2 years (from final year of high school to 2 years after graduation from high school), whereas the time lag in Study 2 was nearly twice as long (from the year before the final year in high school to 3 years after graduation from high school). School Grades, Grading on a Curve, and the BFLPE The BFLPE presents a dilemma for academic self-concept researchers. On one hand, there is clear evidence that academic self-concept is more strongly related to school-based performance measures, such as school grades, compared to standardized test scores. On the other hand, because grades tend to be idiosyncratic to a particular setting and teachers tend to grade on a curve, school grades do not provide a common metric that is generalizable over different schools and classes. Hence, most BFLPE studies have been based on standardized achievement test scores. However, Marsh (1987) noted that the BFLPE and a grading-on-a-curve effect have a similar rationale and are mutually reinforcing, such that the BFLPE is mediated in part by school grades. Results of the present investigation replicate these earlier results but also extend them in some interesting ways. Not surprisingly, in both Studies 1 and 2, school grades have a substantial influence on T1MSC beyond the substantial predictive effect of individual student achievement. What may be more surprising is that the achievement test scores continue to have such a substantial predictive effect on academic self-concept beyond the predictive 29 Marsh et al. effect of school grades. This suggests that students know their relative abilities in relation to a broader, more generalizable frame of reference in addition to the more narrowly focused frame of reference provided by other students in their school. It is also interesting to note that the relative contribution of test scores is as high or higher for models based on T2MSC, collected several years after graduation from high school (e.g., Models 4B and 4C), than for models based on T1MSC, collected near the end of high school (e.g., Model 4A). Nevertheless, it is relevant to note that both school grades and, in particular, test scores continue to have significant predictive effects on T2MSC even after controlling for T1MSC. As such, the results of our study demonstrate that each of these sources of information about achievement continues to have substantial predictive effects on academic self-concept. Whereas the predictive effects of test scores and grades in T2MSC are similar in Study 1, the predictive effects of test scores are significantly larger than high school grades after students have graduated from high school. Because test scores reflect a broader frame of reference than school grades that are highly dependent on the achievement levels of other students in the same high school, it is not surprising that test scores are strongly related to academic selfconcept after graduation from high school. The BFLPEs, the negative predictive effects of school-average achievement and school type, are substantially smaller after controlling for school grades. This implies that the grading-on-a-curve effect and the BFLPE are substantially overlapping, mutually reinforcing processes that have independent predictive effects on academic self-concept. Hence, results based on these two large, representative samples of German high schools and the distinctive form of tracking students into different school types in the German school system replicate Marsh’s (1987) results based on a large representative sample of U.S. high schools. Although beyond the scope of our study, a useful direction for further research would be to disentangle the apparently confounded effects of these two processes. Whereas typically these processes are positively related, it should be possible to find naturally occurring situations in which the two processes are not confounded (e.g., where school grades are based on absolute criteria or are externally moderated in relation to external criteria that are measured along a common metric) or, perhaps, to experimentally manipulate grades in a laboratory setting so that they are independent of achievement test scores. The critical issue is the extent to which the size of the BFLPE varies as a function of the grading standards. Generalizability of the Predictive Effects The results of this investigation indicate that the BFLPE is reasonably robust over two different studies, over time, over gender, and over individual student ability levels. The major focus of this investigation is to demonstrate that the BFLPE that is widely demonstrated in high school settings is long lasting and persistent after students have graduated from high school. We were also interested, however, in determining how robust the BFLPE is 30 Stability of the BFLPE over gender and individual student achievement levels. The inclusion of additional variables representing gender, and interactions between individual achievement with gender and with school-average achievement, had almost no effect on the size of the BFLPE in either Study 1 or Study 2. Although the BFLPE was marginally larger for girls than for boys, this interaction was only significant for some models in Study 1 and was not statistically significant for any analyses in Study 2. Consistent with previous results, the BFLPE did not vary much as a function of individual student achievement. In Study 1, individual student achievement did not interact significantly with school-average achievement in Models 6A, 6B, or 6C, but there was a marginally significant interaction with school type in Model 7A (but not 7B or 7C), suggesting that the BFLPE was slightly smaller for more able students. In Study 2, individual student achievement did not interact significantly with school type in Models 7A, 7B, or 7C, but there was a marginally significant interaction with schoolaverage achievement in Model 6A (but not in 6B or 6C), suggesting that the BFLPE was slightly larger for more able students. Although the precise nature of the few small interactions that reached statistical significance (due in part to the large sample sizes) was not entirely consistent across the two studies, both studies provided reasonable support for the robustness of the BFLPE over time, gender, and individual student achievement levels. Given that Studies 1 and 2 were based on different materials, different cohorts of students, and different time frames, there was good consistency in the pattern of results. In particular, both studies clearly demonstrate that the BFLPE is persistent in that the predictive effects are clearly evident even several years after graduation from high school. However, even though the patterns of predictive effects are largely similar in the two studies, the predictive effects are systematically larger in Study 1 than in Study 2. The positive predictive effects of individual student characteristics (student achievement and school grades) and the negative predictive effects of school-level characteristics (school-average achievement and school type) are all larger in Study 1 than in Study 2. Although there are several potential sources of difference, the most likely seems to be the time frame of the two studies. Study 2 began earlier (T1 was the second-to-the-last year of high school) than Study 1 (T1 was the last year of high school) and lasted longer (the T1-T2 interval was 2 years in Study 1 and 4 years in Study 2). It is not, perhaps, surprising that the predictive effects of school grades and achievement in high school have smaller predictive effects on self-concept measures collected 4 years later in Study 2 than parallel predictive effects after only 2 years in Study 1. More surprising, perhaps, is the finding that the predictive effects in Study 2 are also smaller at T1. The results may reflect the fact that the academic self-concepts of students near the end of high school (Study 1) are more closely aligned to objective indicators of academic accomplishment (including the relative performances of classmates) than they are in the penultimate year of high school (in Study 2). However, the smaller BFLPE on T1 self-concept in Study 2 than in Study 1 may also reflect the different self-concept instruments used at T1 in the two studies. Although clearly beyond the scope of this investigation, it 31 Marsh et al. would be useful to assess the strength of the BFLPE more frequently over a longer span of time to test the predictions that (a) the BFLPE grows larger the longer the same students remain in the same school and (b) that the BFLPE is reasonably stable over time even after students have graduated from high school. Limitations and Directions for Further Research It is relevant to address potential limitations of this investigation in terms of their implications for interpretation of our results and directions for future research. BFLPE theory offers causal predictions about the effects of schoolaverage ability, and the combination of sophisticated statistical analyses and longitudinal data provide strong tests of these causal predictions. It is, nevertheless, inappropriate to claim that our results prove that the predictive effects that we have found represent true causal effects. This limitation appears to be an inevitable consequence of the nature of this research, but convergence of results based on alternative experimental designs and even stronger statistical techniques would strengthen the interpretation of the results. In this investigation, we considered math self-concept only, and this has been the focus of a number of other BFLPE studies (e.g., Marsh & Craven, 2002). On the basis of existing theory and a very limited amount of research, there is no reason to expect that the BFLPE does not apply equally well to other academic domains. Because school-average abilities in different academic domains (e.g., school-average abilities in mathematics, science, history, language) are likely to be very highly correlated, it is unlikely that different school-average abilities can be readily differentiated in the general population. It is possible, however, that schools that are highly selective in a specific academic domain (e.g., mathematics, sport, performing arts) will have BFLPEs that are specific to that domain (e.g.., Marsh, 1994; also see Marsh & Craven, 2002) that will undermine some of the domain-specific attributes that such schools seek to reinforce. A number of issues related to the representativeness of the sample limit the generalizability of the results. Although the original samples in both studies were representative of German students attending the final 2 years of high school, typically in preparation for subsequent university attendance, this only represents about 30% of the total population of this age cohort. As is inevitable in large-scale longitudinal studies—particularly ones that attempt to follow high school students after graduation from high school—the representativeness of the sample was further compromised by nonresponse in the second waves of each of the studies. Although beyond the scope of this investigation, it would also be useful to pursue how academic self-concept and the BFLPE affected the subsequent life choices that students make after high school graduation and how, in turn, these decisions affect subsequent self-concept. Thus, for example, Marsh and O’Mara (2006) report that there is a reciprocal pattern of relations between academic self-concept in high school, high school achievement, and educational attainment 5 years after high school. 32 Stability of the BFLPE Implications for Policy Practice In many educational systems across the world, there is an ongoing policy debate about the provision of highly segregated educational settings for very bright students. This policy direction is based in part on a labeling theory perspective, suggesting that bright students will have higher self-concepts and experience other psychological benefits from being educated in the company of other academically gifted students. Yet, our BFLPE and empirical evaluation of the predictive effects of academically selective settings (e.g., Marsh et al., 1995) shows exactly the opposite pattern of results. Placement of gifted students in academically selective settings results in lower academic self-concepts, not higher academic self-concepts. Coupled with other research showing that school-average ability has negative effects on other educational outcomes (coursework selection, educational aspirations, effort; see Marsh, 1991) and that academic self-concept has reciprocal effects with achievement for students of all ability levels (Marsh & Craven, 2006), this finding has important policy implications. Whereas not all gifted and talented students will suffer lower academic self-concepts when attending academically selective high schools, many will. BFLPE research, however, provides an important alternative perspective to existing policy directions that have not been adequately evaluated in relation to current educational and psychological research. Hence, we urge parents, policy makers, and practitioners to think carefully about the implications of school placements and to reflect on potential negative side effects of current policy toward segregation of school and classes on the basis of academic ability. A compromise position might be to recognize the negative implications of the BFLPE and to develop policies to try to counter the negative effects of highschool-average achievement. Hence, Marsh and Craven (2002) suggest that the BFLPE is reinforced by highly competitive classroom environments that emphasize normative feedback that rank orders students. In the present investigation, the grade-on-the-curve effect, such that students in academically selective schools get lower school grades than they would get if they were in mixed-ability schools, also reinforces the BFLPE. Hence, even though the BFLPE appears to be stable and pervasive, it may be possible to alter the school environment in such a way as to undermine its negative effects. The focus of our investigation has been on academically advantaged students. However, the BFLPE has implications for special education at both ends of the ability spectrum. In particular, consistent with BFLPE predictions, research with academically disadvantaged students shows that moving academically disadvantaged students from special classes with other disadvantaged students to mixed-ability classes (mainstreaming or inclusion) lowers not only math, verbal, and academic self-concepts but also social self-concept (Marsh, Tracey, & Craven, 2006; Tracey, Marsh & Craven, 2003; also see Chapman, 1988). 33 34 1 2 3 4 5 6 7 8 9 10 11 12 T1MSC T2MSC S-AM MTest Math grade Sex ST Sex × MTest Sex × S-AM S-AM × MTest Sex × ST MTest × ST 1.00 .79 .14 .57 .64 –.16 .03 –.07 –.03 –.03 –.04 .07 1 .79 1.00 .14 .54 .59 –.15 .05 –.05 –.02 –.03 –.05 .06 2 .14 .14 1.00 .52 .07 –.20 .65 .03 .19 –.42 .25 –.40 3 .57 .54 .52 1.00 .36 –.35 .36 –.11 .07 –.23 .05 –.12 4 .64 .59 .07 .36 1.00 .01 .03 –.06 –.01 .03 –.01 .08 5 –.16 –.15 –.20 –.35 .01 1.00 .00 .19 –.02 .07 –.00 .05 6 Study 1 .50 .47 .15 .72 .20 –.21 .05 –.12 –.03 –.03 –.06 .04 7 –.07 –.05 .03 –.11 –.06 .19 .05 1.00 .46 –.26 .35 –.12 8 –.03 –.02 .19 .07 –.01 –.02 .26 .46 1.00 –.37 .57 –.20 9 –.03 –.03 –.42 –.23 .03 .07 –.42 –.26 –.37 1.00 –.25 .68 10 –.04 –.05 .25 .05 –.01 –.00 –.00 .35 .57 –.25 1.00 –.40 11 .07 .06 –.40 –.12 .08 .05 –.23 –.12 –.20 .68 –.40 1.00 12 .00 .00 –.07 .00 .00 .00 .00 –.35 –.11 .28 .00 .36 M APPENDIX Correlations, Means, and Standard Deviations for Variables Considered in the Two Studies 1.00 1.00 .54 1.00 1.00 1.00 1.00 .96 .50 .55 1.00 .97 SD 35 T1MSC T2MSC S-AM MTest Math grade Sex ST Sex × MTest Sex × S-AM S-AM × MTest Sex × ST MTest × ST 1.00 .68 .06 .41 .50 –.21 .01 –.04 .01 –.03 –.01 –.00 .68 1.00 .10 .42 .41 –.19 .02 –.02 .03 –.05 –.03 .01 2 .06 .10 1.00 .42 .03 –.03 .53 –.03 –.12 –.02 –.05 –.31 3 .41 .42 .42 1.00 .32 –.18 .23 –.10 .07 –.05 –.05 –.09 4 .50 .41 .03 .32 1.00 –.04 .02 –.02 .04 –.03 –.02 .02 5 –.21 –.19 –.03 –.18 –.04 1.00 .04 .09 –.05 .12 –.03 –.05 6 .01 .02 .53 .23 .02 .04 1.00 –.06 –.30 –.05 –.12 –.59 7 –.04 –.02 –.03 –.10 –.02 .09 –.06 1.00 –.04 .43 .26 .08 8 .01 .03 –.12 .07 .04 –.05 –.30 –.04 1.00 –.14 –.01 .57 9 –.03 –.05 –.02 –.05 –.03 .12 –.05 .43 –.14 1.00 .53 .01 10 –.01 –.03 –.05 –.05 –.02 –.03 –.12 .26 –.01 .53 1.00 –.03 11 –.00 .01 –.31 –.09 .02 –.05 –.59 .08 .57 .01 –.03 1.00 12 .00 .00 –.03 .00 .00 .00 .00 –.18 .18 –.01 .04 .23 M 1.00 1.00 .45 1.00 1.00 1.00 1.00 1.01 .46 .44 1.03 1.11 SD Note. T1MSC = Time 1 math self-concept; T2MSC = Time 2 math self-concept; S-AM = school-average math achievement (test scores); MTest = math achievement (test scores); math grade = school mark; sex = gender (1 = male, 2 = female); ST = school type (selective school: 1 = no, 2 = yes). For purposes of analysis (see Method section for further discussion), all individual student terms were standardized (M = 0, SD = 1). Cross-product and school averages of individual student responses were based on standardized individual student scores but not restandardized (so that they are in the same metric as the individual student scores). 1 2 3 4 5 6 7 8 9 10 11 12 1 Study 2 Marsh et al. Note Work on this investigation was conducted, in part, while Professor Marsh was a visiting scholar at the Center for Educational Research at the Max Planck Institute for Human Development and was supported in part by the University of Western Sydney and the Max Planck Institute. References Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Baumert, J., Bos, W., & Lehmann, R. (Eds.). (2000). TIMSS/III: Dritte Internationale Mathematik- und Naturwissenschaftsstudie—Mathematische und naturwissenschaftliche Bildung am Ende der Schullaufbahn [TIMSS/III: Third international mathematics and science study—Students’ knowledge of mathematics and science at the end of secondary education]. Opladen, Germany: Leske and Budrich. Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: The effects of incentives on motivation and performance. European Journal of Psychology of Education, 16, 441–462. Baumert, J., Roeder, P. M., Gruehn, S., Heyn, S., Köller, O., Rimmele, R., et al. (1996). Bildungsverläufe und psychosoziale Entwicklung im Jugendalter [Educational careers and psychosocial development during adolescence]. In K. P. Treumann, G. Neubauer, R. Möller, & J. Abel (Eds.), Methoden und Anwendungen empirisch pädagogischer Forschung (pp. 170–180). Münster, Germany: Waxmann. Baumert, J., Roeder, P. M., Sang, F., & Schmitz, B. (1986). Leistungsentwicklung und Ausgleich von Leistungsunterschieden in Gymnasialklassen [Development of achievement and leveling of achievement differences in Gymnasium classes]. Zeitschrift für Pädagogik, 32, 639–660. Baumert, J., Trautwein, U., & Artelt, C. (2003). Schulumwelten—institutionelle Bedingungen des Lehrens und Lernens [School environments—Institutional conditions for learning and instruction]. In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, et al. (Eds.), PISA 2000—Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (pp. 259–330). Opladen, Germany: Leske + Budrich. Beaton, A. E., Mullis, I. V. S., Martin, M. O., Gonzales, E. J., Kelly, D. L., & Smith, T. A. (1996). Mathematics achievement in the middle school years: IEA’s third international mathematics and science study. Chestnut Hill, MA: Boston College. Branden, N. (1994). Six pillars of self-esteem. New York: Bantam. Chapman, J. W. (1988). Learning disabled children’s self-concepts. Review of Educational Research, 58, 347–371. Dai, D. Y. (2004). How universal is the big-fish-little-pond effect? American Psychologist, 59, 267–268. Diener, E., & Fujita, F. (1997). Social comparison and subjective well-being. In B. P. Buunk & F. X. Gibbons (Eds.), Health, coping, and well-being: Perspectives from social comparison theory (pp. 329–358). Mahwah, NJ: Lawrence Erlbaum. Espenshade, T. J., Hale, L. E., & Chung, C. Y. (2005). The frog pond revisited: High school academic context, class rank, and elite college admission. Sociology of Education, 78, 269–293. Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7, 117–140. Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Hodder Arnold. Husén, T. (1967). International study of achievement in mathematics: A comparison of 12 countries (Vols. 1–2). Stockholm: Almqvist & Wiksell. 36 Stability of the BFLPE James, W. (1963). The principles of psychology. New York: Holt, Rinehart & Winston. (Original work published 1890) Jerusalem, M. (1984). Reference group, learning environment and self-evaluations: A dynamic multi-level analysis with latent variables. In R. Schwarzer (Ed.), The self in anxiety, stress and depression (pp. 61–73). New York: Elsevier North-Holland. Köller, O., Watermann, R., Trautwein, U., & Lüdtke, O. (2004). Wege zur Hochschulreife in Baden-Württemberg. TOSCA—Eine Untersuchung an allgemein bildenden und beruflichen Gymnasien [Educational pathways to college in Baden-Württemberg. TOSCA—A study at upper secondary level of traditional and vocational Gymnasiums]. Opladen, Germany: Leske + Budrich. Krull, J. L., & MacKinnon, D. P. (2001). Multilevel modeling of individual and group level mediated effects. Multivariate Behavioral Research, 36, 249–277. Lüdtke, O., Köller, O., Marsh, H. W., & Trautwein, U. (2005). Teacher frame of reference and the big-fish-little-pond effect. Contemporary Educational Psychology, 30, 263–285. Marsh, H. W. (1974). Judgmental anchoring: Stimulus and response variables. Unpublished doctoral dissertation, University of California, Los Angeles. Marsh, H. W. (1984a). Self-concept: The application of a frame of reference model to explain paradoxical results. Australian Journal of Education, 28, 165–181. Marsh, H. W. (1984b). Self-concept, social comparison and ability grouping: A reply to Kulik and Kulik. American Educational Research Journal, 21, 799–806. Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational Psychology, 79, 280–295. Marsh, H. W. (1991). The failure of high ability high schools to deliver academic benefits: The importance of academic self-concept and educational aspirations. American Educational Research Journal, 28, 445–480. Marsh, H. W. (1993). Academic self-concept: Theory, measurement and research. In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp. 59–98). Hillsdale, NJ: Lawrence Erlbaum. Marsh, H. W. (1994). Using the National Educational Longitudinal Study of 1988 to evaluate theoretical models of self-concept: The Self-Description Questionnaire. Journal of Educational Psychology, 86, 439–456. Marsh, H. W. (2005). Big fish little pond effect on academic self-concept. German Journal of Educational Psychology, 19, 141–144. Marsh, H. W., Chessor, D., Craven, R. G., & Roche, L. (1995). The effects of gifted and talented programs on academic self-concept: The big fish strikes again. American Educational Research Journal, 32, 285–319. Marsh, H. W., & Craven, R. (1997). Academic self-concept: Beyond the dustbowl. In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment (pp. 131-198). Orlando, FL: Academic Press. Marsh, H. W. & Craven, R. (2002). The pivotal role of frames of reference in academic self-concept formation: The big fish little pond effect. In F. Pajares & T. Urdan (Eds.), Adolescence and education (Vol. 2, pp. 83–123). Greenwich, CT: Information Age. Marsh, H. W., & Craven, R. G. (2006). Reciprocal effects of self-concept and performance from a multidimensional perspective: Beyond seductive pleasure and unidimensional perspectives. Perspectives on Psychological Science, 1, 133–163. Marsh, H. W., & Hau, K. T. (2003). Big fish little pond effect on academic self-concept: A crosscultural (26 country) test of the negative effects of academically selective schools. American Psychologist, 58, 364–376. Marsh, H. W., Köller, O., & Baumert, J. (2001). Reunification of East and West German school systems: Longitudinal multilevel modeling study of the big fish little pond 37 Marsh et al. effect on academic self-concept. American Educational Research Journal, 38, 321–350. Marsh, H. W., Kong, C.-K., Hau, K.-T. (2000). Longitudinal multilevel modeling of the big fish little pond effect on academic self-concept: Counterbalancing social comparison and reflected glory effects in Hong Kong high schools. Journal of Personality and Social Psychology, 78, 337–349. Marsh, H. W., & O’Mara, A. (2006). Reciprocal effects between academic self-concept, self-esteem, achievement and attainment over seven adolescent-adult years: Unidimensional and multidimensional perspectives of self-concept. Unpublished manuscript, Department of Educational Studies, Oxford University, UK. Marsh, H. W., & Parker, J. (1984). Determinants of student self-concept: Is it better to be a relatively large fish in a small pond even if you don’t learn to swim as well? Journal of Personality and Social Psychology, 47, 213–231. Marsh, H. W., & Rowe, K. J. (1996). The negative effects of school-average ability on academic self-concept: An application of multilevel modeling. Australian Journal of Education, 40, 65–87. Marsh, H. W., & Shavelson, R. (1985) Self-concept: Its multifaceted, hierarchical structure. Educational Psychologist, 20, 107–125. Marsh, H. W., Tracey, D. K., & Craven, R. G. (2006). Multidimensional self-concept structure for preadolescents with mild intellectual disabilities: A hybrid multigroup-mimic approach to factorial invariance and latent mean differences. Educational and Psychological Measurement, 66, 795–818. Marsh, H. W., Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Integration of multidimensional self-concept and core personality constructs: Construct validation and relations to well-being and achievement. Journal of Personality, 74, 403–455. Möller J., & Köller, O. (2001). Frame of reference effects following the announcement of exam results. Contemporary Educational Psychology, 26, 277–287. Möller, J., & Köller, O. (2004). Die Genese akademischer Selbstkonzepte: Effekte dimensionaler und sozialer Vergleiche [On the development of academic selfconcepts: The impact of social and dimensional comparisons]. Psychologische Rundschau, 55, 19–27. Plucker, J. A., Robinson, N. M., Greenspon, T. S., Feldhusen, J. F., McCoach, D. B., & Subotnik, R. F. (2004). It’s not how the pond makes you feel, but rather how high you can jump. American Psychologist, 59, 268–269. Rasbash, J., Steele, F., Browne, W., & Prosser, B. (2005). A user’s guide to MLwiN - Version 3.0. Bristol, UK: University of Bristol. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Robitaille, D., & Garden, R. (1989). The IEA study of mathematics: II. Contents and outcomes of school mathematics. Oxford, UK: Pergamon. Schwanzer, A. D., Trautwein, U., Lüdtke, O., & Sydow, H. (2005). Entwicklung eines Instruments zur Erfassung des Selbstkonzepts junger Erwachsener [Development of a questionnaire on young adults’ self-concept]. Diagnostica, 51, 183–194. Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of construct interpretations. Review of Educational Research, 46, 407–441. Sherif, M., & Hovland, C. W. (1961). Social judgment. New Haven, CT: Yale University Press. Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage. Tracey, D. K., Marsh, H. W., & Craven, R. G. (2003). Self-Concepts of preadolescent students with mild intellectual disabilities: Issues of measurement and educational placement. In H. W. Marsh, R. G. Craven, & D. M. McInerney (Eds.), International advances in self research (Vol. 1, pp. 203–230). Greenwich, CT: Information Age. 38 Stability of the BFLPE Trautwein, U., Lüdtke, O., Marsh, H. W., & Köller, O. (2006). Tracking, grading, and student motivation: Using group composition and status to predict self-concept and interest in ninth-grade mathematics. Journal of Educational Psychology, 98, 788–806. Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Self-esteem, academic self-concept, and achievement: How the learning environment moderates the dynamics of self-concept. Journal of Personality and Social Psychology, 90, 334–349. Zeidner, M., & Schleyer, E. J. (1999). The big-fish-little-pond effect for academic selfconcept, test anxiety and school grades in gifted children. Contemporary Educational Psychology, 24, 305–329. Manuscript received May 15, 2006 Revision received September 12, 2006 Accepted September 15, 2006 39
© Copyright 2026 Paperzz