Educational tracking, inequality and performance

Educational tracking, inequality and performance. New evidence using
differences-in-differences.
Jeroen Lavrijsen & Ides Nicaise
Leuven, VFO-SSL studiedag 18/9/2014
Abstract
Educational systems differ dramatically in the age at which students are placed in different tracks.
We examine the effects of early tracking on student achievement, making use of a collection of
recent waves of internationally standardized student assessments (PISA, TIMSS, PIRLS). In order to
control for unobserved country heterogeneity, we adopt a differences-in-differences approach,
controlling secondary school results for differences already present before the introduction of
tracking (in primary school). Our results show that early tracking has a negative effect on mean
performance of students, particularly regarding reading and science competencies. Moreover, by
separating out groups with different abilities, we show that early tracking has a very strong negative
effect on the group of low achieving students, suggesting that negative peer- and environmental
effects in the lower tracks can have detrimental consequences for their academic achievement. By
contrast, we find a null effect on the group of top achieving students, suggesting that comprehensive
systems can equally challenge high performers to learn at a high pace.
1. Introduction
One of the most influential characteristics of an educational system is the way it deals with
differences in capacities between pupils. A common approach in many countries has been to place
pupils with different abilities in separate tracks, usually academical or vocational in nature (OECD
2007; OECD 2012). While some countries do not track students until age 16, others already have
different tracks starting at age 10. Based on the wealth of internationally standardized student tests
that have become available in the last two decades (e.g. PISA), it has been convincingly argued that
tracking students at a young age reinforces the impact of social background on achievement
(Ammermüller 2005; Brunello and Checchi 2007; Van de Werfhorst and Mijs 2010).
1
The interesting issue is whether this disadvantage of early tracking regarding equity is traded-off
against an efficiency gain. At first sight, sorting pupils according to ability may hold the promise of
facilitating tailored instruction at the right level and pace for everyone. This might boost the
efficiency of the educational system (specialization effect). However, empirical cross-national
research has not yet clearly detected such an efficiency gain of early tracking. By contrast, the bulk of
the literature (see paragraph 2 for an overview) reported null or even negative relationships between
early tracking and performance. The apparent lack of an efficiency advantage of early specialization
has been attributed to issues such as the sizeable misallocation of students (biased by social
background) and to non-linear peer- or environmental effects.
Still, the evidence of the effect of early tracking on performance leaves some ground to be covered.
Firstly, most studies have focused solely on its effect on mean performance. As we can expect
tracking to have different effects on low and high achievers, concentrating on averages may conceal
an underlying heterogeneity. Secondly, the effect of tracking on the mean varies across studies, from
negative to null, and with an exceptional example even detecting a positive effect (Rindermann and
Ceci 2009). A closer inspection of this literature reveals that differences in the valuation of tracking
seem related to the essential problem of cross-national studies: the possible bias induced by
unobserved confounding country variables. Across the literature, this is typically dealt with by
controlling for a selection of possible confounders. Wealth (GDP/capita), expenditure on education
and educational characteristics such as school accountability have been popular choices for controls.
However, in the end no-one can ever be sure that all relevant variables have been taken into
account. Some factors, e.g. cultural influences, may even prove quite difficult to measure in an
comparative way for all countries under study. Moreover, the limited degrees of freedom (with
typical samples consisting of 20 to 30 countries) precludes a simultaneous control for many
confounders.
An interesting solution to this problem is the differences-in-differences approach. Essentially, diff-indiff corrects outcomes for differences in starting position. Hence, it does not have to attempt to
include all confounding factors themselves. When the starting position can be assumed to be
influenced by the same unobserved variables as the outcome, the resulting estimates would be
unbiased by unobserved differences. Hence, the effect of tracking can be determined by comparing
secondary school results (which are influenced by tracking age) with results from primary school
(which are not) in both tracked and non-tracked systems. Clearly, diff-in-diff also relies on certain
assumptions, the most important being that confounding factors influence primary and secondary
school results in the same way. Still, it remains a promising tool to deal with unobserved country
heterogeneity.
2
The frequently cited article of Hanushek and Woessmann (2006) made use of this diff-in-diffapproach. Their results indicated a weak, but not very consistent, negative effect of early tracking on
average performance. Moreover, tracking amplified the spread between weak and strong students.
While their findings received considerable resonance, recent waves of student assessments have not
yet been analysed with the diff-in-diff-approach. For example, to our best knowledge no diff-in-diff
analysis have been undertaken of the two most recent waves of PISA (2006 and 2009), in sharp
contrast with analyses based on the inclusion of control variables into the model (e.g. the analysis of
PISA 2009 by Bol and Van de Werfhorst (2013)).
Hence, our article adds to the literature in the following ways. First, we apply the diff-in-diff design to
the set of recently made available student assessments. This is not only important because of the
timeliness of the analysis, but also because the number of participating countries has increased over
time, enlarging sample sizes and augmenting statistical power. Secondly, while the literature has
focused mainly on average performance, we explicitly model the effect of early tracking on both low
and top achievers. Finally, we take into account earlier criticisms raised against the article by
Hanushek and Woessmann (2006), in particular the observation that the countries under study differ
in the average age of the tested participants.
3
2. Earlier research
Across education systems large differences exists in the age students are streamed into educational
tracks with different endpoints (academically oriented or vocational oriented). For example, in
Germany students are tracked after Grade 4 in three school types (Hauptschüle, Realschüle,
Gymnasium). By contrast, many countries have systematically postponed tracking towards the end of
lower secondary. For example, in Nordic Europe the comprehensive school lasts until age 16.
Theoretically, there is some support for both positive and negative effects of early tracking. On the
positive side, it has been argued that when pupils are sorted according to ability, this makes
instruction at the right level and pace more feasible (specialization effect). On the negative side, it
has been stressed that an early assignment of pupils to tracks is far from noiseless (Dustmann 2004;
Brunello and Checchi 2007). Obviously, such a misallocation implies a waste of talents. Moreover,
shifting struggling pupils to less demanding tracks may lead to ignoring their problems instead of
adequately addressing them. It has also been argued that class-room peers influence individual
performance, although the size and the direction is less clear (Hanushek et al. 2003): proponents of
early tracking argue that both high- and low-ability students are better off with peers of their own
level (Dobbelsteen et al. 2002), while their opponents maintain that weak students benefit more
from the presence of stronger peers than that the stronger pupils lose (if at all). A related argument
generalizes this peer effect to the broader learning environment: teachers and students tend to have
lower expectations and act likewise (e.g. teachers devote less time to actual instruction in lower
tracks). Moreover, in many countries the more experienced and more capable teachers are often
assigned to the higher level tracks (OECD 2012). Finally, comprehensive systems have equally
adopted mechanisms to differentiate instruction by ability level, without having to revert to rigid
tracking, e.g. by within class differentiation (Kulik and Kulik 1992; Dupriez et al. 2008).
With theoretical effects going in both directions, empirical evaluations are necessary to determine
the net effect of early tracking. During the last two decades this has been heavily facilitated by the
delivery of large sets of internationally standardized student assessments. The three most influential
assessments are PISA (measuring mathematics, science and reading proficiency at age 15), TIMSS
(measuring mathematics and science proficiency at two moments, namely in 4th and in 8th grade) and
PIRLS (measuring reading proficiency in 4th grade ).
Comparing average student performance on these assessments with key characteristics of the
educational system yields important policy guidelines on how a successful educational system can be
4
designed. Hence, the effect of early tracking on average performance has been the subject of a
number of cross-national studies. However, an important pitfall in cross-national studies is that (1)
educational systems have many different features all influencing performance (besides tracking age,
one could think of school autonomy or education budgets), and (2) socio-economic or cultural factors
influence educational outcomes as well (e.g. wealth or income inequality). Cross-national research
has to take these confounders into account in order to provide unbiased estimates.
Most studies have tried to accommodate for unobserved heterogeneity bias by including some of the
confounding country variables in the model. For example, Horn (2009) examined the effect of
tracking age on student performance in PISA 2003 with a multilevel set-up. While he adequately kept
other differences between educational systems under control (e.g. school autonomy, accountability
or the size of vocational education), he did not address the possible bias induced by differences in
the socio-economic context of nations. On the other hand, while Duru-Bellat and Suchaut (2005)
analysed PISA 2003 with GDP and the educational level of society as context controls, they did not
address the bias because of possible correlations with other educational system characteristics on
their estimates of the tracking effect.
While these and other studies (Dupriez, Dumay, and Vause 2008; Schütz et al. 2008; Woessmann et
al. 2009) reported negative or null effects of early tracking on average performance, Rindermann and
Ceci (2009) came to the opposite conclusion on the basis of an elaborate comparison of the
aggregate national score over an array of student assessments from different sources. This
aggregation allowed the construction of a very large dataset, heavily expanding the scope of the rest
of the literature which was typically limited to Western countries. However, a reanalysis of the data
(available upon request at the author) showed that the reported positive effect would have vanished
when datasets would have been restricted to OECD-countries. This indicates that context control is
ultimately critical to the reliability of cross-sectional studies, and that discrepancies between study
results may be caused by a failure to adequately take into account those confounding differences
between countries.
Obviously, the more heterogeneous datasets are (as in Rindermann and Ceci 2009), the more such a
failure will bias estimates. Still, even within reasonably homogeneous cross-sectional studies,
unobserved variables problem are a cause of concern. First, it can never be guaranteed that all
(unknown) confounders have been taken into account. Moreover, while some scholars have
mentioned cultural influences (e.g. the value attributed to education or to discipline and diligence) as
possible confounders of school performance, it is very difficult to develop accurate indicators to
control them out. Hence, such issues have often been neglected. Secondly, confounding variables
5
may have non-linear effects as well. For example, it has been argued that wealth or the size of the
educational budget does not influence performance once a certain threshold has been exceeded
(Hanushek and Luque 2003). A final restriction is that because of limited data set size (with typical
samples between 20 to 30 countries) a simultaneous control for many confounders at once is
impossible.
6
3. Differences-in-differences
By contrast, the differences in differences approach used by Hanushek and Woessman (2006) does
not rely on the inclusion of explicit controls for all individual confounders. Instead, Hanushek and
Woessman employed on the following specification on the country level:
(1)
Here, Y is the outcome under study in a primary resp. secondary school assessment (e.g. the average
math performance), T is the tracking indicator (e.g. age of tracking) and
is a normally distributed
error term. Hence, the effect of tracking (c) is determined by the difference between tracked and
untracked systems in the achievement gain between primary and secondary school. To check the
efficacy of the control for cross-country differences, it is possible to additionally include
cross-country differences as controls in X. Indeed, when a confounder would influence primary and
secondary school results in a different way, this would be signalled with d being significantly different
from zero.
Using this specification, Hanushek and Woessman (2006) studied eight combinations of primary and
secondary school assessments. These combinations are listed in Table 1. As can be seen in the last
column of this table, the overall results for the effects of tracking on performance were rather mixed.
Three combinations produced significantly negative estimates. However, one specification yielded a
significantly positive estimate. Hanushek and Woessmann concluded that “there is a tendency for
early tracking to reduce mean performance” but admitted that this part of the conclusion was “less
clear”. Furthermore, Hanushek and Woessmann found that early tracking enlarged the gap between
strong and weak students. By contrast, this finding was reproduced more consistently across
different specifications. The increase of the spread in outcomes was attributed to the particularly
damaging effects of early tracking on the achievement of weak students.
[TABLE 1 AROUND HERE]
As can be observed in Table 1, Hanushek and Woessmann used PISA only in the first two
specifications, with the other six employing TIMSS (8th grade) as the secondary assessment. It is
important to note that participants in TIMSS-8 are a lot younger than those in PISA (on average 14.3
years compared to 15.8). Hence, the use of PISA as secondary measurement point seems preferable
as it leaves more time for early tracking to influence performance: the more time a student has spent
in the tracked system, the clearer the effect on his performance should be. Interestingly, this is
7
exactly what we observe in the results of Hanushek and Woessmann. Indeed, the two combinations
that used PISA as the secondary endpoint yielded the most significant results. Unfortunately, the
number of countries in the two combinations using PISA as the secondary endpoint was rather small.
Since the publication of the article by Hanushek and Woessmann two new waves of PISA has become
available. These recent waves contain information on an increasing number of countries. Hence, this
calls for an analysis of the recent sets of student assessments with the diff-in-diff design, in order to
validate the observations by Hanushek and Woessmann on larger and newer datasets.
Finally, Jakubowski (2010) has noted that the participants’ average age drastically differed across
nations, in particular in PIRLS and TIMSS. So, the time between the first and the second
measurement point is not equal across nations. When these discrepancies would be correlated with
tracking age, this could distort the estimate of the tracking effect.
8
4. Data and methodology
We employ the differences-in-differences design given in (1), but use a new set of combinations of
recent student assessments to determine the tracking effect. We used data from PIRLS (2006 and
2011) and TIMSS – 4th grade (2007 and 2011) as primary measurement points and TIMSS-8th grade
(2007 and 2011) and PISA (2006 and 2009) as secondary points. Of course, only performance scores
on the same domain (reading, mathematics, or science) were compared. In total, 26 combinations of
primary and secondary measurement points were examined. However, because only results for
countries participating in both measurement points could be used, some of the combinations relied
on only small datasets. Hence, we focussed our attention to the eight combinations with the largest
N, as these provide the most reliable estimates. While we will limit our discussion primarily to these
eight combinations, results for the other combinations (available from the authors) were in line with
our findings, although with lower power (as expected).
The eight principal combinations under study are depicted in Table 2. As can be seen, these
combinations fortunately all have PISA as the second endpoint, which is preferable to TIMSS for
reasons explained above. While sample sizes were restricted to 18 to 26 countries in Hanushek and
Woessmann (2006), we now dispose of sets of 23 to 35 nations. Moreover, Table 2 shows that our
combinations involved both cohort-following combinations (in which the pupils at the second
endpoint belong more or less to the same cohort as those in the associated primary test, such as in
combinations 3 and 5) and contemporaneous combinations (delivered at more or less the same timepoint). The cohort-following combinations ensure that we are comparing populations with roughly
the same characteristics (e.g. ethnic or social composition), while the contemporaneous
combinations are robust against sudden shocks in exogenous variables (e.g. economic
developments).
[TABLE 2 AROUND HERE]
With specification (1), we can now examine the effect of tracking age both on the mean country
performance as on any national quantile score. For the tracking indicator we used the tracking ages
reported in OECD 2010, containing information from over 60 countries. In line with Hanushek and
Woessmann (2006), we define early tracking as tracking ages under 15 years (i.e. before the
secondary measurement took place). We also included average participants ages to prevent age
discrepancies between nations to distort our estimates (Jakubowski 2010).
9
5. Mean performance
Table 3 gives an overview of the regression analyses with mean performance as the outcome
variable. As expected, average results in primary school assessments correlate strongly with later
secondary school assessments, while the age difference between the two tests also enters
significantly, validating the caution issued by Jakubowski (2010) on the original results by Hanushek
and Woessmann (2006).
For our variable of interest, early tracking, the estimates clearly point to a negative effect of early
tracking on average performance. In contrast with Hanushek and Woessmann (2006), who reported
rather mixed effects, our results are more consistent. Six out of eight specifications yield a negative
effect of early tracking on mean performance, with the two other producing a null effect. The
negative effect becomes significant in two specifications, with a maximal difference of 22 PISA-points
between early and late tracking countries in specification (4). When we calculate the weighted
average (taking into account the number of countries used in each combination), the loss in mean
performance due to early tracking is 9 points1. Hence, reusing the diff-in-diff approach of Hanushek
and Woessmann with more recent sets of student assessments strongly confirms their message that
early tracking offers no benefits for average performance, rather to the contrary.
[TABLE 3 AROUND HERE]
Note that the two principal combinations referring to mathematical combinations (models 1 and 6)
yielded null effects. However, when we take into account the other eight combinations about
mathematical performance, which we did not explicitly report because of smaller N, the weighted
average effect in mathematical performance is minus 7 points (with three estimates being
significantly negative). Hence, we observe a negative effect of early tracking on mean mathematical
performance as well, even if it seems somewhat smaller in size than the effect on literacy (which has
a weighted average of minus 16 points over all combinations).
Additionally, we checked the adequacy of the diff-in-diff design to control for cross-national
confounders by including an additional control for wealth (GDP/capita). Wealth proved not to be
consistently related (neither positively nor negatively) with performance gain between the two
measurement points. Hence, the additional control did alter neither the sign nor the size of the effect
1
When we taking into account all the 26 possible combinations, the weighted estimate of the effect of early
tracking is identical to the one obtained in the eight depicted specifications (-9 points).
10
of early tracking; weighted over the 8 combinations, the effect was now estimated at a loss of 8 PISApoints, virtually equal to the effect reported above.
Note that this does not imply that wealth would not exercise a significant effect on assessment
scores: it only means that wealth influences both primary and secondary assessments in the same
way and thus is not associated with the performance gain. Figure 1 visualizes this phenomenon for
specification 6. Wealth is obviously positive associated with both the primary assessment scores and
secondary assessment scores; however, it does not have a clear-cut association with the difference
between both (when added to model (6), the estimate for wealth has a p-value of 0.97).
[FIGURE 1 AROUND HERE]
Hence, the diff-in-diff-design adequately precludes wealth to bias the estimation of the effect of
early tracking. Here diff-in-diff shows its advantages over explicitly controlling possible confounders.
First, we may assume that the diff-in-diff-design accommodates for all the bias that could be induced
by any confounding variable, being cultural, economic or social in nature. Hence, we do not have to
worry about whether or not we accommodated for all the possible bias. Secondly, we do neither
have to worry about non-linear effects of confounding variables, as the diff-in-diff accommodates for
higher-order effects as well.
11
6. Low and high achievers
We now turn to the effect of early tracking on different groups in the achievement distribution. We
follow the same estimation strategy as above, but our inputs (Yprimary) and outcomes (Ysecondary) now
refer to quantile scores instead of mean scores. This gives an indication of the effect of tracking on
the performance of various groups across the performance distribution. For example, the 25%quantile (Q25) is the score of the first pupil who scores better than the bottom 25% of the
participants. Hence, the effect of early tracking on Q25 indicates the effect of tracking on
performance growth for the lowest quartile of the achievement distribution.
We estimated the effect of early tracking on four quantiles in the achievement distribution, ranging
from Q5 (very low achievers) to Q95 (very high achievers). Table 4 contains the results.
[TABLE 4 AROUND HERE]
First, note that in all specifications, the effects of early tracking are the most negative in the groups
with the lowest achievement. This is what we would expect on the basis of peer- and environmental
effects, which have a negative impact in particular on those grouped in the lower tracks. Early
tracking has a strong negative effect on the performance of lower achievers. The estimates for the
5%- and the 25%-quantile are consistently negative in all specifications and gain statistical
significance in three of the combinations. The weighted average of the effect of tracking is minus 14
points (for both quantiles). Hence, in terms of the results of low achievers, there should be no doubt
on the negative effects of early tracking.
Secondly, note that even for high achievers, early tracking offers no tangible benefits. In most
specifications, the effect of early tracking on the upper quarter (Q75) is even negative (with a
weighted effect of minus 5 points). None of the specifications yields significant estimates (positive
nor negative). The same holds for the effect on the Q95, which is an indication of the level of the top
achievers. Three specifications show a positive (but not significant) effect of early tracking on the
strongest performers, others still show small negative effects, and the weighted average is practically
zero (+0,1). This indicates that early tracking is not even beneficial for the strongest performers.
12
7. Conclusion
In this paper, we examined the effects of early tracking on achievement by combining a diff-in-diff
design with data from an array of recent waves of student assessments (PISA, TIMSS, PIRLS). Our
results showed consistently negative effects of early tracking on mean performance. Hence, the
hypothesis that early tracking offers specialization benefits seems implausible.
By separating out the effects on groups with different abilities, we showed that early tracking has
particularly strong negative effects on the group of low achievers. Apparently, grouping weak pupils
together invokes negative peer- and environmental effects with possibly detrimental consequences
for their academic achievement. We did not find a consistent effect of early tracking on the
achievement of top performers. Hence, comprehensive systems seem to be able to challenge high
performers to learn at a high pace, without having to isolate them in separate tracks, e.g. by more
flexible differentiation mechanisms such as “individualized integration” or “à la carte integration”
(Dupriez, Dumay, and Vause 2008).
13
Reference List
Ammermüller, A. 2005. "Educational Opportunities and the Role of Institutions." ZEW Discussion
Papers 05-44.
Bol, Thijs, and Herman Van de Werfhorst. 2013. "Educational Systems and the Trade-off Between
Labor Market Allocation and Equality of Educational Opportunity." Comparative Education
Review 57: 285-308.
Brunello, Giorgio, and Daniele Checchi. 2007. "Does school tracking affect equality of opportunity?
New international evidence." Economic Policy 22: 781-861.
Dobbelsteen, Simone, Jesse Levin, and Hessel Oosterbeek. 2002. "The causal effect of class size on
scholastic achievement: distinguishing the pure class size effect from the effect of changes in
class composition." Oxford Bulletin of Economics and Statistics 64: 17-38.
Dupriez, Vincent, Xavier Dumay, and Anne Vause. 2008. "How Do School Systems Manage Pupils'
Heterogeneity?" Comparative Education Review 52: 245-273.
Duru-Bellat, Marie, and Bruno Suchaut. 2005. "Organisation and Context, Efficiency and Equity of
Educational Systems: what PISA tells us." European Educational Research Journal 4: 181-194.
Dustmann, Christian. 2004. "Parental background, secondary school track choice, and wages." Oxford
Economic Papers 56: 209-230.
Hanushek, E. A., and L. Woessmann. 2006. "Does educational tracking affect performance and
inequality? Differences-in-differences evidence across countries." Economic Journal 116:
C63-C76.
Hanushek, Eric A., John F. Kain, Jacob M. Markman, and Steven G. Rivkin. 2003. "Does peer ability
affect student achievement?" Journal of Applied Econometrics 18: 527-544.
Hanushek, Eric A., and Javier A. Luque. 2003. "Efficiency and equity in schools around the world."
Economics of education Review 22: 481-502.
Horn, Daniel. 2009. "Age of selection counts: a cross-country analysis of educational institutions."
Educational research and evaluation 15: 343-366.
Jakubowski, Maciej. 2010. "Institutional Tracking and Achievement Growth: Exploring Difference-inDifferences Approach to PIRLS, TIMSS, and PISA Data." In Quality and Inequality of Education,
ed. J. Dronkers. Springer.
Kulik, James A., and Chen L. Kulik. 1992. "Meta-analytic findings on grouping programs." Gifted Child
Quarterly 36: 73-77.
OECD. 2007. No More Failures. Paris: OECD.
OECD. 2010. PISA 2009 Results: What Makes a School Successful? Paris: OECD.
OECD. 2012. Equity and Quality in Education. Paris: OECD.
14
Rindermann, Heiner, and Stephen J. Ceci. 2009. "Educational Policy and Country Outcomes in
International Cognitive Competence Studies." Perspectives on Psychological Science 4: 551568.
Schütz, Gabriela, Heinrich W. Ursprung, and Ludger Woessmann. 2008. "Education policy and
equality of opportunity." Kyklos 61: 279-308.
Van de Werfhorst, Herman, and Jonathan J. Mijs. 2010. "Achievement inequality and the institutional
structure of educational systems: A comparative perspective." Annual Review of Sociology
36: 407-428.
Woessmann, Ludger, Elke Luedemann, Gabriela Schuetz, and M. West. 2009. School accountability,
autonomy, and choice around the world. Cheltham: Edward Elgar Publishing.
15
TABLES AND FIGURES
Table 1: The eight combinations of primary and secondary assessments studied by Hanushek and Woessmann (2006)
Model
Primary school assessment
Secondary school assessment
A.
PIRLS 2001 (4th grade)
B.
Domain
N
Estimate
PISA 2003 (15-year-olds)
Reading 18
-1.1***
PIRLS 2001 (4th grade)
PISA 2000 (15-year-olds)
Reading 20
-1.0***
C.
TIMSS 1995 (4th grade)
TIMSS 1995 (8th grade)
Math
26
-0,1
D.
TIMSS 1995 (4th grade)
TIMSS 1995 (8th grade)
Science
26
0,6**
E.
TIMSS 2003 (4th grade)
TIMSS 2003 (8th grade)
Math
25
-0,0
F.
TIMSS 2003 (4th grade)
TIMSS 2003 (8th grade)
Science
25
-0,0
G.
TIMSS 1995 (4th grade)
TIMSS 1999 (8th grade)
Math
18
-0,4*
H.
TIMSS 1995 (4th grade)
TIMSS 1999 (8th grade)
Science
18
0,2
* p < 0.1
** p < 0.05
*** p < 0.01
16
Table 2: Our eight combinations of primary and secondary assessments
Model
Primary school assessment
Secondary school assessment
Domain
N
1.
TIMSS 2007 (4th grade)
PISA 2006 (15-year-olds)
Math
23
2.
TIMSS 2007 (4th grade)
PISA 2006 (15-year-olds)
Science
23
3.
PIRLS 2001 (4th grade)
PISA 2006 (15-year-olds)
Reading
23
4.
PIRLS 2006 (4th grade)
PISA 2006 (15-year-olds)
Reading
27
5.
PIRLS 2006 (4th grade)
PISA 2009 (15-year-olds)
Reading
30
6.
TIMSS 2011 (4th grade)
PISA 2009 (15-year-olds)
Math
35
7.
TIMSS 2011 (4th grade)
PISA 2009 (15-year-olds)
Science
35
8.
PIRLS 2011 (4th grade)
PISA 2009 (15-year-olds)
Reading
35
17
Table 3: Regression of mean performance on early tracking
Model 1
Model 2
Model 3
Model 4
Constant
17,58
2,73
-193,54
-206,98
Primary assessment
0,74***
0,71***
0,84***
0,97***
Early tracking
-0,11
-7,65
-10,55
-21,81*
Age difference
17,93
24,48*
42,30**
33,03**
R²
0,86
0,85
0,61
0,72
N
23
23
23
27
Model 5
Model 6
Model 7
Model 8
Constant
-87,65
-89,21
-159,25
-224,10
Primary assessment
0,75***
0,84***
0,89***
1,01***
Early tracking
-16,17**
0,10
-7,72
-11,99
Age difference
33,56***
25,87**
35,69***
31,49**
R²
0,81
0,81
0,75
0,76
N
30
35
35
35
* p < 0.1
** p < 0.05
*** p < 0.01
18
Table 4: Coefficients of the effect of early tracking on the scores for different quantiles
Model 1
Model 2
Model 3
Model 4
Q05
-3,44
-2,24
-9,08
-36,51***
Q25
-3,95
-8,67
-11,75
-26,17**
Mean
-0,11
-7,65
-10,55
-21,81*
Q75
3,63
-8,03
-9,33
-13,79
Q95
9,78
-2,59
-7,52
-9,88
Model 5
Model 6
Model 7
Model 8
Q05
-23,83**
-5,01
-6,71**
-20,51
Q25
-21,82***
-6,51
-11,11**
-19,69
Mean
-16,17**
0,10
-7,72
-11,99
Q75
-10,55
6,02
-3,35
-4,75
Q95
-8,80
15,11
1,30
-0,21
* p < 0.1
** p < 0.05
*** p < 0.01
19
th
Figure 1: Mathematical performance in PISA2009 and TIMMS-4 grade 2011 (below) as a function of GDP/capita
20