Educational tracking, inequality and performance. New evidence using differences-in-differences. Jeroen Lavrijsen & Ides Nicaise Leuven, VFO-SSL studiedag 18/9/2014 Abstract Educational systems differ dramatically in the age at which students are placed in different tracks. We examine the effects of early tracking on student achievement, making use of a collection of recent waves of internationally standardized student assessments (PISA, TIMSS, PIRLS). In order to control for unobserved country heterogeneity, we adopt a differences-in-differences approach, controlling secondary school results for differences already present before the introduction of tracking (in primary school). Our results show that early tracking has a negative effect on mean performance of students, particularly regarding reading and science competencies. Moreover, by separating out groups with different abilities, we show that early tracking has a very strong negative effect on the group of low achieving students, suggesting that negative peer- and environmental effects in the lower tracks can have detrimental consequences for their academic achievement. By contrast, we find a null effect on the group of top achieving students, suggesting that comprehensive systems can equally challenge high performers to learn at a high pace. 1. Introduction One of the most influential characteristics of an educational system is the way it deals with differences in capacities between pupils. A common approach in many countries has been to place pupils with different abilities in separate tracks, usually academical or vocational in nature (OECD 2007; OECD 2012). While some countries do not track students until age 16, others already have different tracks starting at age 10. Based on the wealth of internationally standardized student tests that have become available in the last two decades (e.g. PISA), it has been convincingly argued that tracking students at a young age reinforces the impact of social background on achievement (Ammermüller 2005; Brunello and Checchi 2007; Van de Werfhorst and Mijs 2010). 1 The interesting issue is whether this disadvantage of early tracking regarding equity is traded-off against an efficiency gain. At first sight, sorting pupils according to ability may hold the promise of facilitating tailored instruction at the right level and pace for everyone. This might boost the efficiency of the educational system (specialization effect). However, empirical cross-national research has not yet clearly detected such an efficiency gain of early tracking. By contrast, the bulk of the literature (see paragraph 2 for an overview) reported null or even negative relationships between early tracking and performance. The apparent lack of an efficiency advantage of early specialization has been attributed to issues such as the sizeable misallocation of students (biased by social background) and to non-linear peer- or environmental effects. Still, the evidence of the effect of early tracking on performance leaves some ground to be covered. Firstly, most studies have focused solely on its effect on mean performance. As we can expect tracking to have different effects on low and high achievers, concentrating on averages may conceal an underlying heterogeneity. Secondly, the effect of tracking on the mean varies across studies, from negative to null, and with an exceptional example even detecting a positive effect (Rindermann and Ceci 2009). A closer inspection of this literature reveals that differences in the valuation of tracking seem related to the essential problem of cross-national studies: the possible bias induced by unobserved confounding country variables. Across the literature, this is typically dealt with by controlling for a selection of possible confounders. Wealth (GDP/capita), expenditure on education and educational characteristics such as school accountability have been popular choices for controls. However, in the end no-one can ever be sure that all relevant variables have been taken into account. Some factors, e.g. cultural influences, may even prove quite difficult to measure in an comparative way for all countries under study. Moreover, the limited degrees of freedom (with typical samples consisting of 20 to 30 countries) precludes a simultaneous control for many confounders. An interesting solution to this problem is the differences-in-differences approach. Essentially, diff-indiff corrects outcomes for differences in starting position. Hence, it does not have to attempt to include all confounding factors themselves. When the starting position can be assumed to be influenced by the same unobserved variables as the outcome, the resulting estimates would be unbiased by unobserved differences. Hence, the effect of tracking can be determined by comparing secondary school results (which are influenced by tracking age) with results from primary school (which are not) in both tracked and non-tracked systems. Clearly, diff-in-diff also relies on certain assumptions, the most important being that confounding factors influence primary and secondary school results in the same way. Still, it remains a promising tool to deal with unobserved country heterogeneity. 2 The frequently cited article of Hanushek and Woessmann (2006) made use of this diff-in-diffapproach. Their results indicated a weak, but not very consistent, negative effect of early tracking on average performance. Moreover, tracking amplified the spread between weak and strong students. While their findings received considerable resonance, recent waves of student assessments have not yet been analysed with the diff-in-diff-approach. For example, to our best knowledge no diff-in-diff analysis have been undertaken of the two most recent waves of PISA (2006 and 2009), in sharp contrast with analyses based on the inclusion of control variables into the model (e.g. the analysis of PISA 2009 by Bol and Van de Werfhorst (2013)). Hence, our article adds to the literature in the following ways. First, we apply the diff-in-diff design to the set of recently made available student assessments. This is not only important because of the timeliness of the analysis, but also because the number of participating countries has increased over time, enlarging sample sizes and augmenting statistical power. Secondly, while the literature has focused mainly on average performance, we explicitly model the effect of early tracking on both low and top achievers. Finally, we take into account earlier criticisms raised against the article by Hanushek and Woessmann (2006), in particular the observation that the countries under study differ in the average age of the tested participants. 3 2. Earlier research Across education systems large differences exists in the age students are streamed into educational tracks with different endpoints (academically oriented or vocational oriented). For example, in Germany students are tracked after Grade 4 in three school types (Hauptschüle, Realschüle, Gymnasium). By contrast, many countries have systematically postponed tracking towards the end of lower secondary. For example, in Nordic Europe the comprehensive school lasts until age 16. Theoretically, there is some support for both positive and negative effects of early tracking. On the positive side, it has been argued that when pupils are sorted according to ability, this makes instruction at the right level and pace more feasible (specialization effect). On the negative side, it has been stressed that an early assignment of pupils to tracks is far from noiseless (Dustmann 2004; Brunello and Checchi 2007). Obviously, such a misallocation implies a waste of talents. Moreover, shifting struggling pupils to less demanding tracks may lead to ignoring their problems instead of adequately addressing them. It has also been argued that class-room peers influence individual performance, although the size and the direction is less clear (Hanushek et al. 2003): proponents of early tracking argue that both high- and low-ability students are better off with peers of their own level (Dobbelsteen et al. 2002), while their opponents maintain that weak students benefit more from the presence of stronger peers than that the stronger pupils lose (if at all). A related argument generalizes this peer effect to the broader learning environment: teachers and students tend to have lower expectations and act likewise (e.g. teachers devote less time to actual instruction in lower tracks). Moreover, in many countries the more experienced and more capable teachers are often assigned to the higher level tracks (OECD 2012). Finally, comprehensive systems have equally adopted mechanisms to differentiate instruction by ability level, without having to revert to rigid tracking, e.g. by within class differentiation (Kulik and Kulik 1992; Dupriez et al. 2008). With theoretical effects going in both directions, empirical evaluations are necessary to determine the net effect of early tracking. During the last two decades this has been heavily facilitated by the delivery of large sets of internationally standardized student assessments. The three most influential assessments are PISA (measuring mathematics, science and reading proficiency at age 15), TIMSS (measuring mathematics and science proficiency at two moments, namely in 4th and in 8th grade) and PIRLS (measuring reading proficiency in 4th grade ). Comparing average student performance on these assessments with key characteristics of the educational system yields important policy guidelines on how a successful educational system can be 4 designed. Hence, the effect of early tracking on average performance has been the subject of a number of cross-national studies. However, an important pitfall in cross-national studies is that (1) educational systems have many different features all influencing performance (besides tracking age, one could think of school autonomy or education budgets), and (2) socio-economic or cultural factors influence educational outcomes as well (e.g. wealth or income inequality). Cross-national research has to take these confounders into account in order to provide unbiased estimates. Most studies have tried to accommodate for unobserved heterogeneity bias by including some of the confounding country variables in the model. For example, Horn (2009) examined the effect of tracking age on student performance in PISA 2003 with a multilevel set-up. While he adequately kept other differences between educational systems under control (e.g. school autonomy, accountability or the size of vocational education), he did not address the possible bias induced by differences in the socio-economic context of nations. On the other hand, while Duru-Bellat and Suchaut (2005) analysed PISA 2003 with GDP and the educational level of society as context controls, they did not address the bias because of possible correlations with other educational system characteristics on their estimates of the tracking effect. While these and other studies (Dupriez, Dumay, and Vause 2008; Schütz et al. 2008; Woessmann et al. 2009) reported negative or null effects of early tracking on average performance, Rindermann and Ceci (2009) came to the opposite conclusion on the basis of an elaborate comparison of the aggregate national score over an array of student assessments from different sources. This aggregation allowed the construction of a very large dataset, heavily expanding the scope of the rest of the literature which was typically limited to Western countries. However, a reanalysis of the data (available upon request at the author) showed that the reported positive effect would have vanished when datasets would have been restricted to OECD-countries. This indicates that context control is ultimately critical to the reliability of cross-sectional studies, and that discrepancies between study results may be caused by a failure to adequately take into account those confounding differences between countries. Obviously, the more heterogeneous datasets are (as in Rindermann and Ceci 2009), the more such a failure will bias estimates. Still, even within reasonably homogeneous cross-sectional studies, unobserved variables problem are a cause of concern. First, it can never be guaranteed that all (unknown) confounders have been taken into account. Moreover, while some scholars have mentioned cultural influences (e.g. the value attributed to education or to discipline and diligence) as possible confounders of school performance, it is very difficult to develop accurate indicators to control them out. Hence, such issues have often been neglected. Secondly, confounding variables 5 may have non-linear effects as well. For example, it has been argued that wealth or the size of the educational budget does not influence performance once a certain threshold has been exceeded (Hanushek and Luque 2003). A final restriction is that because of limited data set size (with typical samples between 20 to 30 countries) a simultaneous control for many confounders at once is impossible. 6 3. Differences-in-differences By contrast, the differences in differences approach used by Hanushek and Woessman (2006) does not rely on the inclusion of explicit controls for all individual confounders. Instead, Hanushek and Woessman employed on the following specification on the country level: (1) Here, Y is the outcome under study in a primary resp. secondary school assessment (e.g. the average math performance), T is the tracking indicator (e.g. age of tracking) and is a normally distributed error term. Hence, the effect of tracking (c) is determined by the difference between tracked and untracked systems in the achievement gain between primary and secondary school. To check the efficacy of the control for cross-country differences, it is possible to additionally include cross-country differences as controls in X. Indeed, when a confounder would influence primary and secondary school results in a different way, this would be signalled with d being significantly different from zero. Using this specification, Hanushek and Woessman (2006) studied eight combinations of primary and secondary school assessments. These combinations are listed in Table 1. As can be seen in the last column of this table, the overall results for the effects of tracking on performance were rather mixed. Three combinations produced significantly negative estimates. However, one specification yielded a significantly positive estimate. Hanushek and Woessmann concluded that “there is a tendency for early tracking to reduce mean performance” but admitted that this part of the conclusion was “less clear”. Furthermore, Hanushek and Woessmann found that early tracking enlarged the gap between strong and weak students. By contrast, this finding was reproduced more consistently across different specifications. The increase of the spread in outcomes was attributed to the particularly damaging effects of early tracking on the achievement of weak students. [TABLE 1 AROUND HERE] As can be observed in Table 1, Hanushek and Woessmann used PISA only in the first two specifications, with the other six employing TIMSS (8th grade) as the secondary assessment. It is important to note that participants in TIMSS-8 are a lot younger than those in PISA (on average 14.3 years compared to 15.8). Hence, the use of PISA as secondary measurement point seems preferable as it leaves more time for early tracking to influence performance: the more time a student has spent in the tracked system, the clearer the effect on his performance should be. Interestingly, this is 7 exactly what we observe in the results of Hanushek and Woessmann. Indeed, the two combinations that used PISA as the secondary endpoint yielded the most significant results. Unfortunately, the number of countries in the two combinations using PISA as the secondary endpoint was rather small. Since the publication of the article by Hanushek and Woessmann two new waves of PISA has become available. These recent waves contain information on an increasing number of countries. Hence, this calls for an analysis of the recent sets of student assessments with the diff-in-diff design, in order to validate the observations by Hanushek and Woessmann on larger and newer datasets. Finally, Jakubowski (2010) has noted that the participants’ average age drastically differed across nations, in particular in PIRLS and TIMSS. So, the time between the first and the second measurement point is not equal across nations. When these discrepancies would be correlated with tracking age, this could distort the estimate of the tracking effect. 8 4. Data and methodology We employ the differences-in-differences design given in (1), but use a new set of combinations of recent student assessments to determine the tracking effect. We used data from PIRLS (2006 and 2011) and TIMSS – 4th grade (2007 and 2011) as primary measurement points and TIMSS-8th grade (2007 and 2011) and PISA (2006 and 2009) as secondary points. Of course, only performance scores on the same domain (reading, mathematics, or science) were compared. In total, 26 combinations of primary and secondary measurement points were examined. However, because only results for countries participating in both measurement points could be used, some of the combinations relied on only small datasets. Hence, we focussed our attention to the eight combinations with the largest N, as these provide the most reliable estimates. While we will limit our discussion primarily to these eight combinations, results for the other combinations (available from the authors) were in line with our findings, although with lower power (as expected). The eight principal combinations under study are depicted in Table 2. As can be seen, these combinations fortunately all have PISA as the second endpoint, which is preferable to TIMSS for reasons explained above. While sample sizes were restricted to 18 to 26 countries in Hanushek and Woessmann (2006), we now dispose of sets of 23 to 35 nations. Moreover, Table 2 shows that our combinations involved both cohort-following combinations (in which the pupils at the second endpoint belong more or less to the same cohort as those in the associated primary test, such as in combinations 3 and 5) and contemporaneous combinations (delivered at more or less the same timepoint). The cohort-following combinations ensure that we are comparing populations with roughly the same characteristics (e.g. ethnic or social composition), while the contemporaneous combinations are robust against sudden shocks in exogenous variables (e.g. economic developments). [TABLE 2 AROUND HERE] With specification (1), we can now examine the effect of tracking age both on the mean country performance as on any national quantile score. For the tracking indicator we used the tracking ages reported in OECD 2010, containing information from over 60 countries. In line with Hanushek and Woessmann (2006), we define early tracking as tracking ages under 15 years (i.e. before the secondary measurement took place). We also included average participants ages to prevent age discrepancies between nations to distort our estimates (Jakubowski 2010). 9 5. Mean performance Table 3 gives an overview of the regression analyses with mean performance as the outcome variable. As expected, average results in primary school assessments correlate strongly with later secondary school assessments, while the age difference between the two tests also enters significantly, validating the caution issued by Jakubowski (2010) on the original results by Hanushek and Woessmann (2006). For our variable of interest, early tracking, the estimates clearly point to a negative effect of early tracking on average performance. In contrast with Hanushek and Woessmann (2006), who reported rather mixed effects, our results are more consistent. Six out of eight specifications yield a negative effect of early tracking on mean performance, with the two other producing a null effect. The negative effect becomes significant in two specifications, with a maximal difference of 22 PISA-points between early and late tracking countries in specification (4). When we calculate the weighted average (taking into account the number of countries used in each combination), the loss in mean performance due to early tracking is 9 points1. Hence, reusing the diff-in-diff approach of Hanushek and Woessmann with more recent sets of student assessments strongly confirms their message that early tracking offers no benefits for average performance, rather to the contrary. [TABLE 3 AROUND HERE] Note that the two principal combinations referring to mathematical combinations (models 1 and 6) yielded null effects. However, when we take into account the other eight combinations about mathematical performance, which we did not explicitly report because of smaller N, the weighted average effect in mathematical performance is minus 7 points (with three estimates being significantly negative). Hence, we observe a negative effect of early tracking on mean mathematical performance as well, even if it seems somewhat smaller in size than the effect on literacy (which has a weighted average of minus 16 points over all combinations). Additionally, we checked the adequacy of the diff-in-diff design to control for cross-national confounders by including an additional control for wealth (GDP/capita). Wealth proved not to be consistently related (neither positively nor negatively) with performance gain between the two measurement points. Hence, the additional control did alter neither the sign nor the size of the effect 1 When we taking into account all the 26 possible combinations, the weighted estimate of the effect of early tracking is identical to the one obtained in the eight depicted specifications (-9 points). 10 of early tracking; weighted over the 8 combinations, the effect was now estimated at a loss of 8 PISApoints, virtually equal to the effect reported above. Note that this does not imply that wealth would not exercise a significant effect on assessment scores: it only means that wealth influences both primary and secondary assessments in the same way and thus is not associated with the performance gain. Figure 1 visualizes this phenomenon for specification 6. Wealth is obviously positive associated with both the primary assessment scores and secondary assessment scores; however, it does not have a clear-cut association with the difference between both (when added to model (6), the estimate for wealth has a p-value of 0.97). [FIGURE 1 AROUND HERE] Hence, the diff-in-diff-design adequately precludes wealth to bias the estimation of the effect of early tracking. Here diff-in-diff shows its advantages over explicitly controlling possible confounders. First, we may assume that the diff-in-diff-design accommodates for all the bias that could be induced by any confounding variable, being cultural, economic or social in nature. Hence, we do not have to worry about whether or not we accommodated for all the possible bias. Secondly, we do neither have to worry about non-linear effects of confounding variables, as the diff-in-diff accommodates for higher-order effects as well. 11 6. Low and high achievers We now turn to the effect of early tracking on different groups in the achievement distribution. We follow the same estimation strategy as above, but our inputs (Yprimary) and outcomes (Ysecondary) now refer to quantile scores instead of mean scores. This gives an indication of the effect of tracking on the performance of various groups across the performance distribution. For example, the 25%quantile (Q25) is the score of the first pupil who scores better than the bottom 25% of the participants. Hence, the effect of early tracking on Q25 indicates the effect of tracking on performance growth for the lowest quartile of the achievement distribution. We estimated the effect of early tracking on four quantiles in the achievement distribution, ranging from Q5 (very low achievers) to Q95 (very high achievers). Table 4 contains the results. [TABLE 4 AROUND HERE] First, note that in all specifications, the effects of early tracking are the most negative in the groups with the lowest achievement. This is what we would expect on the basis of peer- and environmental effects, which have a negative impact in particular on those grouped in the lower tracks. Early tracking has a strong negative effect on the performance of lower achievers. The estimates for the 5%- and the 25%-quantile are consistently negative in all specifications and gain statistical significance in three of the combinations. The weighted average of the effect of tracking is minus 14 points (for both quantiles). Hence, in terms of the results of low achievers, there should be no doubt on the negative effects of early tracking. Secondly, note that even for high achievers, early tracking offers no tangible benefits. In most specifications, the effect of early tracking on the upper quarter (Q75) is even negative (with a weighted effect of minus 5 points). None of the specifications yields significant estimates (positive nor negative). The same holds for the effect on the Q95, which is an indication of the level of the top achievers. Three specifications show a positive (but not significant) effect of early tracking on the strongest performers, others still show small negative effects, and the weighted average is practically zero (+0,1). This indicates that early tracking is not even beneficial for the strongest performers. 12 7. Conclusion In this paper, we examined the effects of early tracking on achievement by combining a diff-in-diff design with data from an array of recent waves of student assessments (PISA, TIMSS, PIRLS). Our results showed consistently negative effects of early tracking on mean performance. Hence, the hypothesis that early tracking offers specialization benefits seems implausible. By separating out the effects on groups with different abilities, we showed that early tracking has particularly strong negative effects on the group of low achievers. Apparently, grouping weak pupils together invokes negative peer- and environmental effects with possibly detrimental consequences for their academic achievement. We did not find a consistent effect of early tracking on the achievement of top performers. Hence, comprehensive systems seem to be able to challenge high performers to learn at a high pace, without having to isolate them in separate tracks, e.g. by more flexible differentiation mechanisms such as “individualized integration” or “à la carte integration” (Dupriez, Dumay, and Vause 2008). 13 Reference List Ammermüller, A. 2005. "Educational Opportunities and the Role of Institutions." ZEW Discussion Papers 05-44. Bol, Thijs, and Herman Van de Werfhorst. 2013. "Educational Systems and the Trade-off Between Labor Market Allocation and Equality of Educational Opportunity." Comparative Education Review 57: 285-308. Brunello, Giorgio, and Daniele Checchi. 2007. "Does school tracking affect equality of opportunity? New international evidence." Economic Policy 22: 781-861. Dobbelsteen, Simone, Jesse Levin, and Hessel Oosterbeek. 2002. "The causal effect of class size on scholastic achievement: distinguishing the pure class size effect from the effect of changes in class composition." Oxford Bulletin of Economics and Statistics 64: 17-38. Dupriez, Vincent, Xavier Dumay, and Anne Vause. 2008. "How Do School Systems Manage Pupils' Heterogeneity?" Comparative Education Review 52: 245-273. Duru-Bellat, Marie, and Bruno Suchaut. 2005. "Organisation and Context, Efficiency and Equity of Educational Systems: what PISA tells us." European Educational Research Journal 4: 181-194. Dustmann, Christian. 2004. "Parental background, secondary school track choice, and wages." Oxford Economic Papers 56: 209-230. Hanushek, E. A., and L. Woessmann. 2006. "Does educational tracking affect performance and inequality? Differences-in-differences evidence across countries." Economic Journal 116: C63-C76. Hanushek, Eric A., John F. Kain, Jacob M. Markman, and Steven G. Rivkin. 2003. "Does peer ability affect student achievement?" Journal of Applied Econometrics 18: 527-544. Hanushek, Eric A., and Javier A. Luque. 2003. "Efficiency and equity in schools around the world." Economics of education Review 22: 481-502. Horn, Daniel. 2009. "Age of selection counts: a cross-country analysis of educational institutions." Educational research and evaluation 15: 343-366. Jakubowski, Maciej. 2010. "Institutional Tracking and Achievement Growth: Exploring Difference-inDifferences Approach to PIRLS, TIMSS, and PISA Data." In Quality and Inequality of Education, ed. J. Dronkers. Springer. Kulik, James A., and Chen L. Kulik. 1992. "Meta-analytic findings on grouping programs." Gifted Child Quarterly 36: 73-77. OECD. 2007. No More Failures. Paris: OECD. OECD. 2010. PISA 2009 Results: What Makes a School Successful? Paris: OECD. OECD. 2012. Equity and Quality in Education. Paris: OECD. 14 Rindermann, Heiner, and Stephen J. Ceci. 2009. "Educational Policy and Country Outcomes in International Cognitive Competence Studies." Perspectives on Psychological Science 4: 551568. Schütz, Gabriela, Heinrich W. Ursprung, and Ludger Woessmann. 2008. "Education policy and equality of opportunity." Kyklos 61: 279-308. Van de Werfhorst, Herman, and Jonathan J. Mijs. 2010. "Achievement inequality and the institutional structure of educational systems: A comparative perspective." Annual Review of Sociology 36: 407-428. Woessmann, Ludger, Elke Luedemann, Gabriela Schuetz, and M. West. 2009. School accountability, autonomy, and choice around the world. Cheltham: Edward Elgar Publishing. 15 TABLES AND FIGURES Table 1: The eight combinations of primary and secondary assessments studied by Hanushek and Woessmann (2006) Model Primary school assessment Secondary school assessment A. PIRLS 2001 (4th grade) B. Domain N Estimate PISA 2003 (15-year-olds) Reading 18 -1.1*** PIRLS 2001 (4th grade) PISA 2000 (15-year-olds) Reading 20 -1.0*** C. TIMSS 1995 (4th grade) TIMSS 1995 (8th grade) Math 26 -0,1 D. TIMSS 1995 (4th grade) TIMSS 1995 (8th grade) Science 26 0,6** E. TIMSS 2003 (4th grade) TIMSS 2003 (8th grade) Math 25 -0,0 F. TIMSS 2003 (4th grade) TIMSS 2003 (8th grade) Science 25 -0,0 G. TIMSS 1995 (4th grade) TIMSS 1999 (8th grade) Math 18 -0,4* H. TIMSS 1995 (4th grade) TIMSS 1999 (8th grade) Science 18 0,2 * p < 0.1 ** p < 0.05 *** p < 0.01 16 Table 2: Our eight combinations of primary and secondary assessments Model Primary school assessment Secondary school assessment Domain N 1. TIMSS 2007 (4th grade) PISA 2006 (15-year-olds) Math 23 2. TIMSS 2007 (4th grade) PISA 2006 (15-year-olds) Science 23 3. PIRLS 2001 (4th grade) PISA 2006 (15-year-olds) Reading 23 4. PIRLS 2006 (4th grade) PISA 2006 (15-year-olds) Reading 27 5. PIRLS 2006 (4th grade) PISA 2009 (15-year-olds) Reading 30 6. TIMSS 2011 (4th grade) PISA 2009 (15-year-olds) Math 35 7. TIMSS 2011 (4th grade) PISA 2009 (15-year-olds) Science 35 8. PIRLS 2011 (4th grade) PISA 2009 (15-year-olds) Reading 35 17 Table 3: Regression of mean performance on early tracking Model 1 Model 2 Model 3 Model 4 Constant 17,58 2,73 -193,54 -206,98 Primary assessment 0,74*** 0,71*** 0,84*** 0,97*** Early tracking -0,11 -7,65 -10,55 -21,81* Age difference 17,93 24,48* 42,30** 33,03** R² 0,86 0,85 0,61 0,72 N 23 23 23 27 Model 5 Model 6 Model 7 Model 8 Constant -87,65 -89,21 -159,25 -224,10 Primary assessment 0,75*** 0,84*** 0,89*** 1,01*** Early tracking -16,17** 0,10 -7,72 -11,99 Age difference 33,56*** 25,87** 35,69*** 31,49** R² 0,81 0,81 0,75 0,76 N 30 35 35 35 * p < 0.1 ** p < 0.05 *** p < 0.01 18 Table 4: Coefficients of the effect of early tracking on the scores for different quantiles Model 1 Model 2 Model 3 Model 4 Q05 -3,44 -2,24 -9,08 -36,51*** Q25 -3,95 -8,67 -11,75 -26,17** Mean -0,11 -7,65 -10,55 -21,81* Q75 3,63 -8,03 -9,33 -13,79 Q95 9,78 -2,59 -7,52 -9,88 Model 5 Model 6 Model 7 Model 8 Q05 -23,83** -5,01 -6,71** -20,51 Q25 -21,82*** -6,51 -11,11** -19,69 Mean -16,17** 0,10 -7,72 -11,99 Q75 -10,55 6,02 -3,35 -4,75 Q95 -8,80 15,11 1,30 -0,21 * p < 0.1 ** p < 0.05 *** p < 0.01 19 th Figure 1: Mathematical performance in PISA2009 and TIMMS-4 grade 2011 (below) as a function of GDP/capita 20
© Copyright 2026 Paperzz