Response Bias in England in PISA 2000 and 2003 John

R ESEARCH
Response Bias in England
In PISA 2000 and 2003
John Micklewright
Sylke V. Schnepf
Southampton Statistical Sciences Research Institute (S3RI)
University of Southampton
Research Report RR771
Research Report
No 771
Response Bias in England
in PISA 2000 and 2003
John Micklewright
Sylke V. Schnepf
Southampton Statistical Sciences Research Institute (S3RI)
University of Southampton
The views expressed in this report are the authors’ and do not necessarily reflect those of the Department
for Education and Skills.
© The University of Southampton 2006
ISBN 1 84478 764 8
-ii-
Contents
Executive summary.......................................................................................................iii
1
Introduction............................................................................................................1
2
Data ........................................................................................................................7
3
4
5
6
2.1
Data sets used for the analysis .......................................................................7
2.2
Weights ........................................................................................................15
2.3
Summary ......................................................................................................28
Bias analysis – 2003 data.....................................................................................31
3.1
Analysis at five stages of sampling and response........................................31
3.2
Response bias at school and pupil levels .....................................................40
3.3
Using OECD weights and our own response weights .................................50
3.4
Response bias in PISA scores ......................................................................56
Bias analysis – 2000 data.....................................................................................63
4.1
Analysis at five different stages of sampling and response .........................63
4.2
Response bias at school and pupil levels .....................................................69
4.3
Using the OECD weights and our own response weights ...........................76
4.4
Response bias in PISA scores ......................................................................80
Changes over time................................................................................................85
5.1
Summary of results for different stages of sampling and response .............85
5.2
Response bias at school and pupil levels .....................................................88
5.3
Associations between PISA scores and KS3 and KS4 scores .....................91
Conclusions and recommendations......................................................................99
References..................................................................................................................103
Appendix 1: Summary statistics for five groups of pupils in 2003 .......................105
Appendix 2: Summary statistics for five groups of pupils in 2000 .......................108
Appendix 3: Summary statistics for private and public schools............................111
Appendix 4: Summary statistics with adjusted data for 2003................................113
Appendix 5: Bias analysis with adjusted data for 2003 .........................................117
Appendix 6: Peer group effects are biased downwards in PISA data?..................119
-iii-
Executive summary
The Programme for International Student Assessment (PISA) is an international
survey of the learning achievement of 15 year olds. Each participating country
(mainly OECD members) draws samples of schools and then samples of pupils from
within these schools.
Response rates in England to PISA 2003 were well below the average for OECD
countries: 64 percent for schools initially sampled and 77 percent for pupils,
compared to an OECD average of 90 percent at both school and pupil levels.
Results for the UK were not included in the international report prepared by OECD on
the first results from PISA 2003 on account of concerns that the data for England
suffered from response bias. Our analysis investigates the pattern of response for
England in 2003 in much more depth than was possible at the time of the decision
taken by the OECD to exclude the results for the UK. We also investigate response in
England to PISA in 2000, the first year of the survey, when response rates were at
similar levels to those in 2003. In contrast to 2003, results for the UK were included
in the international report for 2000 prepared by the OECD.
We consider the degree of bias in estimates of average scores, the dispersion of
scores, and the percentage of the population with scores at or above a given threshold.
The analysis exploits information on measures of learning achievement that are
available for all schools and all pupils, both those that respond and those that do not:
results from Key Stage 3 (KS3) tests and Key Stage 4 (KS4) exams (GCSEs or their
equivalent). These and other data, such as Free School Meals receipt, are taken from
the Pupil Level Annual Schools Census and the national pupil database.
We show that KS3 and KS4 scores for respondents are strongly correlated with PISA
scores. The average correlation coefficient is 0.76 in 2003.
For each year, 2003 and 2000, we first provide results for five groups:
i) the population of all 15 year old pupils in England schools
ii) all pupils in sampled schools
iii) all pupils in responding schools
iv) sampled pupils in responding schools
v) responding pupils
Comparison of these different groups requires that the data are appropriately
weighted. We focus on results using weights that only take into account the design of
the survey. However, we provide some limited comparisons for responding pupils that
use the weights developed by the OECD. These allow for the levels of response by
schools and by pupils and also for the variation of response by schools with their
average GCSE results. We also develop our own weights that go beyond those of the
OECD by taking into account the variation of the response probability of individual
pupils with their GCSE results.
-ivThe impact of pupil response is similar in the two years: in moving from group (iv) to
group (v), mean achievement scores rise and the standard deviation falls. The
combination of these effects results in under-representation of pupils with low
achievement. The impact of school response is less clear and its assessment is
complicated by the phenomenon of the replacement of schools that do not respond. In
2000 mean scores rise between group (ii) and group (iii) by a similar amount to
between groups (iv) and (v), whereas in 2003 they fall slightly.
In 2003, the percentage of pupils with 5 or more GCSEs at grades A*-5 rises by 5.2
percentage points between groups (i) and (v), with the bulk of the increase coming as
a result of pupil response. In 2000, the same figure rises by 7.1 percent points with the
increases between groups (ii) and (iii) and between groups (iv) and (v) accounting for
2.2 and 2.7 percent points respectively.
We test for statistically significant differences in average scores and the dispersion of
scores between responding and non-responding schools and between responding and
non-responding pupils. In both years, responding pupils have higher average scores
and less dispersed scores than pupils who do not respond. The differences between
these two groups of students are statistically significant, meaning that they are very
unlikely to have arisen by chance. This is true both as regards the mean and the
variance of scores.
By contrast, the differences in the means between responding and non-responding
schools are typically insignificant in both years if we take the school as the unit of
analysis. If we analyse the pupils within the schools, then significant differences in
means are found in 2000 (with average scores higher in responding schools) but
results for 2003 are mixed. On the other hand, the variance of scores appears higher in
non-responding schools in 2003 but not in 2000, again taking the school as the unit of
the analysis.
We then estimate the degree of response bias in summary statistics of PISA scores
recorded for responding pupils. We focus mainly on trying to estimate bias that results
from the pattern of response by pupils rather than the pattern of response by their
schools, the latter having been established to be less important, at least in 2003. We
use three methods: a z-score transformation of KS3 or KS4 scores (which takes no
account of the actual relationship between these domestic achievement variables and
PISA scores), a regression adjustment method (based on a regression of PISA scores
on KS3 or KS4 scores of respondents), and the application of our own response
weights. Our own weights are based on a statistical model that summarises the
relationship between pupil response and domestic achievement variables. (These
weights also take account of the differences in the response rates of schools by their
average achievement levels.) The three methods give reasonably similar results.
Mean PISA scores in reading, maths and science are estimated to be biased upwards
by about 6 points in both years. And the standard deviations of scores are estimated to
be biased downwards by about 3 or 4 points, again in both years. These amounts are
both about twice the size of the standard errors estimated by the OECD for the means
and standard deviations. In contrast to the OECD view stated in the international
report for the 2003 PISA round – a view based on less extensive evidence – we
believe the direction and magnitude of the biases to be reliably estimated.
-v -
The OECD reports unfortunately do not make explicit the criteria for inclusion for a
country that has been asked to investigate the extent of response bias. However, the
2003 report appears to suggest that the estimated bias in key parameters should be less
than the estimated standard error. On this criterion, the England data for 2003 fail the
test. However, we estimate that the bias in mean scores would have shifted England’s
position in a ranking of countries by only one place.
The 2000 data also appear to fail a test based on a criterion of bias relative to standard
error (although the impact on England’s position in a ranking by mean scores is again
only slight). However, the response rate for pupils in England in 2000 did just clear
the minimum required threshold. This illustrates the dangers of the OECD practice of
setting arbitrary thresholds for satisfactory response rates. It also raises the question as
to the likely response biases in other countries participating in PISA with pupil
response rates that were little higher than those in England. The same question could
be asked of the 2003 data for other countries.
We judge the data for England for both 2000 and 2003 to be valuable despite the
biases. The return on the investment made in the collection of the data can be
enhanced by the provision of suitable weights to adjust for the variation of pupil
response with domestic achievement scores along the lines of those we have
experimented with in this report.
Summary of recommendations:
1. Analysis of response in future waves of PISA should take place on data sets
that are prepared and documented at an early stage.
2. Future analyses of response bias should investigate the impact of (i) school
replacement and (ii) the use of weights that take into account different
response levels within school strata.
3. Consideration should be given to whether it is practical to stratify samples of
pupils within schools by a domestic measure of achievement (KS3 or KS4).
4. DfES should consider ways of raising response among pupils and not just
among schools. It is the response biases at the pupil level that emerge most
clearly in our study.
5. Criteria for exclusion of any country from the international reports for PISA
need to be made explicit by OECD and clearly justified. Evidence for a
decision on exclusion needs to be published.
6. Response weights for responding pupils in England in both 2000 and 2003
that are based on a statistical model of pupil response of the type we present
in this report should be provided for users of the data.
7. The OECD should be engaged in discussion of whether adjustment for
response bias using post-stratification response weights could be used in the
future to avoid excluding a country from the international report.
-vi-
-1 -
1 Introduction
The Programme for International Student Assessment (PISA) co-ordinated by the
OECD is a cross-national survey of the learning achievement of 15 year olds.
Together with other international surveys such as TIMSS and PIRLS, PISA makes it
possible to compare children’s learning achievement across different countries.
Results for the UK were not included in the international report on the first results
from PISA 2003 (OECD 2004) on account of concerns that data for England suffered
from response bias. (Scotland and Northern Ireland are surveyed separately in PISA
but the data are combined with those for England to provide estimates for the UK in
the international reports. There is no separate survey for Wales, which participated in
the 2003 survey for England pro rata to its share of the population in the two
countries.)
The present report investigates the pattern of response for England in 2003 in order to
look more closely into possible biases than was possible when the decision to exclude
results for the UK from OECD (2004) was taken. It also considers the possible
response bias in England in 2000, which was the first round of the PISA, when results
for the UK were included in the international report on the survey (OECD 2001a).
1.1
The PISA survey and response in England
PISA has a two stage sampling design, with sampling of schools and then of pupils
within schools. At the first stage, schools with 15 year olds are sampled with
probability proportional to their size. In 2003 the sample for England comprised 570
schools: 190 ‘initial schools’, all of which were approached to take part in PISA,
together with two potential replacements for each of these initial schools drawn from
the same stratum. If an initial school declined to participate in the survey, its ‘first
replacement’ school was approached. If this first replacement school did not respond,
the second replacement school was asked to participate. The final school sample
comprised all the responding schools.
At the second stage, a simple random sample of 35 pupils aged 15 was drawn from
each responding school, after excluding certain small categories of pupils not eligible
for inclusion in the survey. If a school had fewer than 35 pupils of this age, all of them
were approached to take part in the survey. There was no replacement of nonresponding pupils.
Table 1.1 shows the levels of response in England at both school and pupil levels in
2000 and 2003. In both years, response at both school and pupil levels fell well below
the average for all OECD countries in the survey.
Automatic inclusion of countries into the international report for PISA depends on the
level of response achieved:
•
Minimum response rate of 85 per cent of the schools initially selected or where
the initial response rate of schools was between 65 and 85 percent an ‘acceptable’
-2 school response rate, where this level varies depending on the initial response rate
achieved. (The level rises by one percentage point for every half a percentage
point the initial response rate falls below the 85 percent threshold.) An initial
school response rate smaller than 65 percent is ‘not acceptable’.
•
Minimum response rate of 80 per cent of students within participating schools.
If a country does not meet these requirements it is requested to give evidence that its
sample of pupils is representative. This request was made of England in both 2000
and 2003.
In 2000, England did not meet the 85 percent response rate requirement at the school
level but met (just) the 80 percent threshold for student response. After evidence was
provided on the characteristics of responding schools, the UK was included into the
2000 international report.
In 2003, England met neither the school nor the student response requirement. The
evidence from the analysis of response bias that was provided by DfES to OECD in
early 2004 (an analysis undertaken by us) was judged by the latter to indicate
potential problems:
“The results of a bias analysis provided no evidence for any significant bias of
school-level performance but did suggest that there was potential non-response
bias at student levels. The PISA Consortium concluded that it was not possible
to reliably assess the magnitude, or even the direction, of this non-response
bias and to correct for this….The uncertainties surrounding the sample and its
bias are such that scores for the United Kingdom cannot be reliably compared
with those of other countries. They can also not be compared with the
performance scores for the United Kingdom from PISA 2000” (OECD 2004:
328).
The result was that the UK was excluded from the 2003 international report, although
the microdata for England, as well as for Scotland and Northern Ireland, are made
available on the OECD website.
Table 1.1: Response rates at school and student levels in 2000 and 2003 (%)
PISA 2000
OECD
England
average
School response
before replacement
School response after
replacement
Student response
PISA 2003
OECD
England
average
59
86
64
90
82
92
77
95
81
90
77
90
Source: response rates are averaged across all OECD countries participating in PISA in 2000
and 2003. Response rates for OECD countries are given in OECD (2001a) and OECD (2004);
response rates for England are given in Gill et al (2000) and DfES (2005).
-3 -
1.2
Aims of the study
Our aims are to analyse the patterns of response to PISA in England in 2000 and
2003, both among schools and among pupils, and to estimate the extent of any
resulting response biases. One key aim is to show that the magnitude and direction of
biases in the 2003 data are possible to estimate reliably, contrary to the statement in
OECD (2004) cited above.
The focus is on how response varies with measures of learning achievement that are
available for all schools and all pupils, both those that respond and those that do not.
England is unusual for having national tests and public exam records: scores from
Key Stage 3 (KS3) tests and Key Stage 4 (KS4) exams (GCSEs or their equivalent).
The KS3 and KS4 scores of respondents are strongly correlated with their PISA
scores. We find correlation coefficients in the range 0.7 to 0.8. These measures
therefore provide a very useful source of information with which to compare
respondents and non-respondents and to construct weights that adjust for differential
response by achievement levels.
The possibility of response bias in PISA data (or in data from any other such survey)
is often discussed without specifying the population parameter for which there is
concern that there might be a biased estimate. But ‘bias’ only has a meaning if we
state what it is that might have bias. Implicitly it seems that many people are
concerned with possible bias in the estimate of population mean achievement.
Average achievement is clearly of interest and is the most common basis for ranking
countries in surveys such as PISA, at least in popular debate. But we should also be
interested in the degree of inequality in achievement within countries. We therefore
consider both bias in estimates of the mean and bias in estimates of dispersion, the
latter measured by the standard deviation of achievement scores. One might also be
interested in bias in the estimate of the percentage of pupils at or above a given level
of achievement, e.g. with 5 or more GCSEs (or equivalents) at grades A*-C, and we
include some results on this basis.
We consider the impact of non-response on summary statistics of ‘domestic’ measures
of achievement (KS3 and KS4). We also estimate the extent of response bias in
summary statistics of PISA scores that can be calculated from the final sample of
responding pupils.
Besides considering possible response biases, we were also asked to investigate how
the characteristics of all sampled schools and all sampled pupils, whether or not they
responded, differed from the populations from which they were drawn. These
comparisons show the impact of sampling error on the estimates of population
parameters, not of response bias. While these results are of interest, it should be noted
that sampling error is feature of any sample survey and that there are well established
methods for taking its impact into account, notably through the calculation of
confidence intervals.
The analysis of response bias in England in 2003 that DfES reported to OECD and
which led to the UK data being excluded from the international report was undertaken
by us in early 2004. This work was done over a brief period of time and was not
published. The analysis we have undertaken in 2005-6 that is presented here has been
-4 in much greater depth. One major advance over our earlier work is that we compare
the domestic achievement scores of respondents with their PISA scores. Our work in
2004 took place before PISA scores for respondents were available.
Our terms of reference for the current report have also included analysis of response
in 2000 and a comparison with 2003. In the case of the 2000 data we have been able
to exploit information on domestic achievement scores of sampled pupils that was not
available to the team undertaking the bias analysis at the time of the 2000 survey
(which had access only to summary scores for schools). We thus are able to
substantially extend that earlier analysis of the 2000 data.
1.3
Limitations of the study
The work we have carried out has several limitations. As a result, this report is neither
the last word in the analysis of the pattern of response to PISA in 2000 or 2003 nor is
it comprehensive in its analysis of actual or potential adjustments to the data for
response bias.
First, we remained uncertain about aspects of the national databases with which we
were provided, of the weights supplied with the PISA files, and of a number of other
aspects of the data. (These issues are documented in Chapter 2.) These uncertainties
mean that we have more confidence in our comparisons of sampled respondents with
non-respondents (whether schools or pupils) than the comparisons between either of
these groups and the populations from which the samples were drawn.
Second, we have not considered how biases introduced by the pattern of response at
the school level are affected by the strategy of replacement of non-responding schools
by other schools. As noted by Sturgis et al (2006), this is an open issue since the
assumption that replacement schools are a perfect match for non-responding schools
(since they come from the same achievement stratum) is one that is untested.
Third, we have made only limited analysis of whether weights calculated by OECD
help resolve problems of response bias at school or pupil level (a subject not included
in our terms of reference). Schools were stratified by GCSE results in the school
sampling frame. In calculating weights to apply to the data, OECD took into account
the average school response rate within each stratum, i.e. within each of several bands
of summary school GCSE scores. (Details are given in Chapter 2.) Suppose that low
achievement schools (in GCSE terms) have response rates that on average are lower
than those of high achievement schools. In the absence of any adjustment to the data,
this would introduce an upward bias into an estimate of mean achievement based on
the responding schools only. However, application of the OECD school weights
should in principle compensate for this. The low achieving schools will receive a
larger weight, thus bringing down the estimate of overall mean achievement.
At the pupil level, the weights calculated by OECD take into account the response rate
of pupils in each school (details in Chapter 2). In other words, pupils in schools where
a below average number of pupils respond receive a higher weight than pupils in other
schools. However, in contrast to the school level there is no stratification of sampling
at the pupil level by domestic achievement scores within schools. Hence the
adjustment for non-response within each school does not vary according to the
-5 achievement levels of the responding pupils. Every pupil within the school receives
the same weight.
Most of our results are based on use of simple design weights only, which account for
the design of the survey (these weights are described in Chapter 2). However, we have
included some limited comparisons of results using these design weights with results
using the OECD weights (Chapters 3 and 4).
We also experiment with an approach to adjustment for non-response at the pupil
level that goes beyond that of the OECD. We estimate logistic regression models of
the probability that a sampled student responds to the survey as a function of
individual and school characteristics, including domestic achievement scores (Key
Stage 3 and Key Stage 4 scores). A new adjustment factor for the weights of
respondents is then computed from the reciprocal of the predicted probabilities of
response. We think that these weights perform well and their application provides an
additional method of estimating the extent of response bias in PISA scores. Their use
also allows response bias to be taken into account in other analyses of the data.
However, the experimental nature of these weights represents another limitation of
our study.
A final limitation is that we have not tried to comment in detail on the OECD’s
criteria for inclusion or exclusion of countries from the published PISA reports. We
do have something to say on the matter at the end of Chapter 3, where the reason for
the decision to exclude the UK data from the 2003 report is deduced. (This seems to
be based on a consideration of the components of ‘total survey error’.) We comment
again briefly on the subject in Chapter 6 where we note that criteria for exclusion need
to be spelt out clearly in future OECD reports.
1.4
Outline of the report
Our terms of reference asked us to compare five different groups of pupils.
i)
ii)
iii)
iv)
v)
the population of all 15 year old pupils in England schools
all pupils in sampled schools
all pupils in responding schools
all sampled pupils in responding schools
responding pupils (the final sample of pupils in the OECD database)
Chapter 2 describes the data we use to do this, including the available weights.
Chapters 3 and 4 provide results for KS3 and KS4 scores for the five groups listed
above for PISA 2003 and PISA 2000 data respectively. Comparison of groups (ii) and
(iii) and of groups (iv) and (v) provides implicitly a comparison of respondents and
non-respondents. We also explicitly compare results for the samples of responding
and non-responding schools and for the samples of responding and non-responding
pupils. These two types of comparison are represented by the vertical and horizontal
lines in Figure 1.1. Chapters 3 and 4 each conclude with our estimates of the extent of
response bias in summary measures of PISA scores in the year concerned. We also
comment on the decision taken to include or exclude the data for the UK in the
international reports.
-6 -
Chapter 5 compares results for the two years and investigates whether there was a
change in the relationship between PISA scores and domestic English achievement
scores between 2000 and 2003.
Chapter 6 concludes and gives our recommendations. These relate to future analyses
of response bias, the use of the existing data from 2000 and 2003, and the design of
the survey in England in the future.
Figure 1.1: PISA survey design and groups of pupils analysed
All 15 year old pupils in England
schools (i)
Sampling variation
All pupils in sampled
PISA schools (ii)
School response
All pupils in responding
schools (iii)
All pupils in nonresponding schools
Sampling variation
Sampled pupils (iv)
Pupil response
Responding pupils (v)
Non-responding
pupils
-7 -
2
Data
We first describe the various data sets we use in the report and then discuss the
weights that can be applied to the data.
2.1
Data sets used for the analysis
2.1.1 Different data sets received
Data on pupils in all English schools derive from two sources of national pupil data.
The first is the Pupil Level Annual School Census (PLASC), a census of pupils in
English schools conducted in January of each year since 2001. PLASC covers
information such as pupils’ dates of birth, ethnicity, free school meal (FSM)
eligibility, entry to GCSEs, statementing for special educational needs (SEN) etc, as
well as identifying their school. However, the PLASC data do not contain any
information on results of achievement tests and exams – KS3 and KS4. Information
on test and exam results are contained in the national pupil database, the second
source of national pupil data that we use. The main data base used in our analysis,
which we refer to as the ‘national database’, is a product of merging PLASC data with
achievement data from the national pupil database.
Pupils eligible for PISA 2000 were all children born in calendar year 1984. Pupils
eligible for PISA 2003 were all children born in calendar year 1987. However, in the
case of 2003, the PISA testing window in England was extended by two months. In
line with this, the national database with which we have worked for that year
constitutes all pupils born in a 14 month window, January 1987 to February 1988. In
this national database, 14 percent of pupils have a 1988 birthdate. However, just under
3 percent of pupils in the PISA sample were born in 1988.
The population level data and PISA data we have used were supplied to us on behalf
of DfES by the Fischer Family Trust and the Office of National Statistics (ONS).
Table 2.1 Data used in the analysis
Level of
analysis
2000
2003
All 15 year
olds in
England
schools
All sampled
schools
All sampled
students
School
average of
pupils eligible
for free school
meals
Pupil
School
Pupil
School
X
X
X
X
X
X
X
We have a number of data sets for the two years.
First, the national database for each year (supplied by the Fischer Family Trust)
contains information on all 15 year old students (less exclusions discussed below).
-8 Since PLASC was first collected in 2001, data on FSM eligibility and on pupils with
SEN are not in the 2000 file. Only pupils with at least one GCSE (or equivalent) entry
(not necessarily a pass) are in the file. This means that we do not have a national pupil
data set that is the same for the two years we want to analyse, 2000 and 2003. The
national database includes unique student and school identifiers allowing the file to be
merged with the other two data files (or three in 2000).
Second, the school file (supplied by ONS) contains information on all schools that
were sampled for PISA. The file includes the same school identifier as in the national
pupil database. It gives information on whether the school was an initial school in the
PISA sampling procedure, a first replacement school, or a second replacement,
whether the school was actually approached to participate, and, if approached,
whether the school responded or not. The file also contains a variable indicating the
level of within-school pupil response, since schools with pupil response rates of below
25% are treated as non-responding schools in PISA.
Third, the PISA student file (supplied by ONS) contains information on all sampled
pupils in schools that responded to PISA. The file contains a unique student identifier,
so that it can be merged with the national database in order to add information on
pupils’ domestic achievement scores etc. The file also includes a variable indicating
whether the student responded to PISA.
Since we compare domestic achievement scores with the PISA scores of the
responding pupils, we needed in addition to be able to merge the file of sampled PISA
students supplied by ONS with the OECD data set for responding pupils available on
the OECD website. The 2000 data for all sampled students contained the appropriate
unique PISA pupil identifier, so that it was possible to merge these data with the
OECD data. In 2003, the file of sampled students provided by ONS did not contain
this identifier. Instead, an additional file on responding pupils was provided that
included their PISA scores and that contained the unique ONS pupil identifier.
As noted earlier, the 2000 national database lacked information on FSM, GCSE
entries and statementing of pupils. However, we were supplied with an additional
school file for 2000 giving the percentage of all pupils in the school eligible for free
school meals (FSM). This meant that while we still lacked information for 2000 for
individual pupils on FSM receipt, we at least had available information at the school
level. (Note that this information is based on all pupils in the schools concerned and
not just the 15 year olds, while our school level analysis involving FSM for 2003 is
based on the percentage of 15 year olds in the school in receipt of FSM.)
The process of obtaining, checking, cleaning and merging the data files took about
three months, which was much longer than planned. In total we were sent over 20
different versions of one or other of the data files. Files were often inadequately
documented, with insufficient information on the rows and/or the columns of data, i.e.
information on which schools or pupils the file contained (rows), and on the
definitions of the variables in the data (the columns). Critical variables (e.g. on
response) were sometimes missing, or a file turned out to be composed only of a
certain type of student. We conclude that the problems of using administrative
databases (PLASC and the national pupil data base) as a research resource were
underestimated by both ourselves and DfES. Staff turnover at ONS meant that people
-9 who had been responsible for the 2000 and 2003 PISA sampling had left the
organisation with obvious consequences for knowledge of the relevant files for our
analysis.
2.1.2 Target and survey populations
The ideal national data set on 15 year olds in England schools for our analysis would
include all those pupils and schools and only those pupils and schools in the PISA
‘survey population’, i.e. the target population less permitted exclusions. PISA allows
the exclusion of certain schools and pupils so that the survey population is smaller
than the target population.
Such an ideal is very difficult to achieve in practice. First, there is the issue of whether
the national database includes all those pupils and schools in the target population. In
the case of the 2003 national database, this seems to be the case. However, the 2000
national database excluded the small minority of students who had not been entered
for any GCSEs (or their equivalent).
Second, there is the issue of whether the national database includes only those in the
survey population. The identification of pupils who met PISA exclusion criteria is not
straightforward. Exclusions of pupils in PISA took place after sampling of pupils.
That is, sampled pupils who met the criteria for exclusion were discarded. Our
problem was to try to exclude pupils from the national database who, if sampled,
would have been excluded from PISA.
Exclusion criteria
Countries were advised to keep the overall exclusion rate below 5% of their pupil
population (OECD 2005a: 47). Pupils or entire schools could be excluded if the
following criteria were met1:
•
Exclusion of schools from the target population
Schools could be excluded if they were in particularly remote area, if they had very
few eligible students or if it was not feasible to conduct the assessment for whatever
reason. However, an entire school could be excluded if it catered exclusively for
students with special educational needs or for students who were non-native speakers
of the language of assessment.
•
Exclusion of pupils from the target population
Pupils within a sampled school could be excluded if they had limited proficiency in
English, if they had special educational needs or learning difficulties or if they were
permanently functionally disabled such that they could not perform in the PISA test.
1
We refer here to the criteria described in the 2003 report (OECD 2004:320 -322). Exclusion criteria
described in the 2000 report (OECD 2001a: 232) seem to be similar but in some cases are formulated
somewhat differently.
-10However, students could not be excluded due to their poor academic performance
alone, discipline problems or temporary physical disability.
Table 2.2: Reported exclusions from PISA for the UK (2003) and England (2000)
2003 (UK)
As % of
Obs.
population
Total population of 15
year olds
School exclusions
Weighted pupil
exclusions
School and pupil
exclusions
Final survey population
2000 (England)
As % of
Obs.
population
736,785
24,773
100
3.36
580,846
16,121
100
2.78
15,062
2.04
15,309
2.64
39,835
696,950
5.41
94.59
31,430
549,416
5.41
94.59
Source: OECD (2001b: 136), OECD (2005a: 168). Figures for 2003 refer to the UK. Figures for 2000
refer to England only. Note: The calculation of percentages in the reports differs slightly. In the 2003
report the percent of weighted pupil exclusion is estimated on the basis of the number of total
population minus school exclusions. In 2000, percentages are estimated based on weighted numbers of
included and excluded pupils. The population total refers to the number of 15 year olds enrolled in
schools in grades 7 or above, which is referred to as the eligible population. The student exclusion took
place after PISA sampling. The weighted number of excluded students (15,062) gives the overall
number of students in the national defined target population represented by the number of students
excluded (270) from the sample.
The school-level exclusions in the UK in 2003 accounted for 3.36 percent and the
pupil exclusion for 2.04 percent of the target population.2 The overall exclusion rate
of pupils was 5.4 percent in both 2003 and 2000.
Adaptation of the national database to the PISA survey population
Our aim was to mirror as far as possible these exclusions in our national database
files. The school exclusion was done by omitting all schools from the analysis that
were marked as special schools. In the case of the 2000 data, this was done by the
Fischer Family Trust before we received the data.3 In the case of the 2003 data the
exclusion was done by us. Proxying the pupil level exclusions was harder to effect.
The decision was made with DfES counterparts that we would exclude all statemented
pupils. However, this could only be done (and was done by us with the data we
received) for 2003, since the 2000 national database lacked SEN status of pupils.
Another possibility would have been to exclude also pupils who have no GCSE
entries. Indeed as noted the 2000 national database already excluded such pupils.
However, these pupils were retained for the 2003 analyses, except when we wished to
put the 2003 data on the same basis as those for 2000 (see below).
2
Exclusion of pupils took place after pupils were sampled (hence at the second stage of sampling).
Hence, the exclusion rate for the population derives from weighting these pupils in responding schools
up in terms of how many pupils they represent in the population.
3
Other than 101 pupils in schools remaining in the database that were identified as special schools,
which we excluded.
-11Table 2.3 shows the effect of our 2003 exclusions of schools and pupils, which led to
the dropping of 2.41 percent (special schools criterion) and 2.31 percent (statemented
criterion) of pupils respectively, leading to an overall exclusion rate of 4.73 percent.
This rate is almost as high as the actual rate of 5.40 percent in 2003 described above.
The last row of the table shows how many pupils would have been excluded in 2003
if, in addition, pupils with no GCSE entries had been excluded.
Table 2.3: Exclusions of pupils from the national database4
2003
Obs
All 15 year olds in
educational
institutions
School exclusion
(special schools)
Pupils exclusion
(statemented)
School and pupil
exclusion
Pupil exclusion (no
GCSE entry)
School and noGCSE exclusion
Estimated final
survey population
School exclusion,
pupils exclusion
(statemented, no
GCSEs)
% of
population
732,550
732,550
17,664
2.41
16,951
2.31
34,615
2003
(Ch. 5 trend analysis)
% of
Obs
population
17,664
2000
Obs
% of
population
556,645
2.41
101
0.02
Not possible to identify
and exclude
4.73
697,935
95.27
50,206
6.85
17,559
2.40
35,223
4.81
697,327
95.19
Not included in the data
set
556,544
To summarise, the 2003 and 2000 national databases have important differences. The
2000 data set on the one hand excludes pupils with no GCSE entries and on the other
includes statemented pupils (who cannot be identified). These differences would make
a direct comparison of 2003 with 2000 inconsistent. We therefore constructed another
2003 data set that was adjusted to the constraints of the 2000 data. For this data set,
we excluded pupils with no GCSE entries from the 2003 data but included
statemented pupils. (All pupils in special schools remained excluded.) This dataset
was used only when we directly compare 2000 and 2003 results in Chapter 5.
4
The data refers to the national data set as we received it, not to the cleaned national data set on which
calculations are based. However, only slight cleaning was undertaken, so that results should change
little. (The data merging important for cleaning would probably increase artificially the population size
(due to observations that could not be matched), which would lead to an underestimation of exclusion
rates.)
?
-12Finally, we have to recognise that 91 statemented pupils were included in the 2003
PISA sample. We retain these pupils in our analyses of 2003 data.5
2.1.3 Exclusion of sampled pupils from the analysis
Responding schools with a within-school pupil response rate of less than 25 percent
were excluded from the PISA sample. Consequently, responding pupils in these
schools were not included in our analyses of non-response at the pupil level.
However, these pupils were included in our analyses up to that stage in the survey
process (sampling and response of schools and sampling of students). In total, 105
pupils in 2003 were excluded from the pupil response analysis; which is about 2
percent of all sampled pupils. The 2000 pupil file supplied by ONS did not contain
any pupils in schools with less than a 25 % within-school response rate. Hence we do
not know how many pupils were excluded in that year.
2.1.4 Merging and cleaning of data sets
The different data sets at the pupil and school level described in Section 2.1.1 were
merged for the analysis. This merging served also to help check the data in so far as
overlapping information was present in the different data sets. (For example a pupil
who sat the PISA test and is included in the OECD file should also be marked as a
‘participating student’ in the ONS file.)
Over a process of about three months we obtained and checked files that included
conflicting information and therefore needed replacement. At the end of the process,
the 2000 data sets were free of any conflicting information. The 2003 data files
however still contained some problems, so some moderate cleaning of the data was
conducted.
2000
In 2000, 543 schools were sampled (including replacement schools) of which 155
schools both responded themselves and then had pupil response rate of greater than or
equal to 25 percent (and were therefore defined to be eligible schools). 5,406 pupils in
these schools were sampled and 4,120 pupils responded.
In a first step, we merged the original OECD PISA data file (downloadable from the
OECD website and containing pupils’ achievement scores, OECD weights etc.) with
the ONS data file on all sampled PISA pupils. We could attribute successfully PISA
scores and weights from the OECD file for England to all 4,120 responding pupils in
the ONS pupil file.
5
There are two possible explanations why some statemented pupils who were not part of the PISA
target population took part in the PISA assessment. Schools send a list of all 15 year old students to
ONS where the random sample of pupils was selected and send back to schools. ONS advised schools
not to include those sampled pupils who were statemented. Some schools might not have followed this
advice. Another explanation could be that pupils were statemented after they sat the PISA test.
-13In a second step, we merged the resulting pupil file from the first step with the ONS
school file giving information on all sampled PISA schools. All 155 responding
schools of the pupil file could be merged successfully with the ONS school file.
Overlapping information provided in the two files was coherent.
The resulting file from step two was then merged with the national data set
(containing national achievement scores for all 15 year old pupils in England)
provided by the Fischer Family Trust. As a result, 340 pupils sampled for PISA could
not be merged because their student identifier was not included in the national data
base. In practical terms this means that it was impossible to obtain any information on
domestic achievement scores (KS3 and KS4) for this group of PISA pupils. As a
consequence, these pupils had to be excluded from the subsequent analysis.
Another problem was found in relation to the school identifiers. In the national data
set, the school identifiers refer to the school where the pupil took the KS4 public
exams (GCSEs and GNVQs). This happened a few months after the PISA test was
conducted. Some pupils might have changed schools before taking their KS4 exams.
First, pupils could have moved to a PISA school after the PISA test was conducted.
Those pupils cannot be identified, although their numbers should be low. Second,
pupils sampled for PISA might have moved to another school for the KS4 exams. As
far as we could identify students with conflicting information on school identifiers we
corrected their school identifier in the national file.
539 out of 543 sampled PISA schools could be identified and therefore successfully
merged with the national pupil data set. The final school file for the school analysis
included therefore 539 sampled PISA schools and 3,531 other schools.
The proportion of all pupils with Free School Meal eligibility in schools (provided by
an additional file from DfES in 2000) could be successfully attributed to all
responding and non-responding PISA schools.
2003
In 2003, 570 PISA schools were sampled (including replacement schools). 159
schools (with 5,123 sampled pupils) responded and had pupil response greater than or
equal to 25 percent. In these schools, 3,766 pupils responded and were therefore
included into the final OECD PISA pupil file.
In a first step, the ONS file containing sampled pupils was merged with the OECD
file on PISA pupils for England (downloadable from the OECD website). 14 out of
3,766 responding pupils could not be merged on basis of the student identifiers
provided. For a small number of pupils, the data files provided conflicting information
regarding pupils’ response status and we changed some values as we saw fit.
In a second step, the pupil data from the first step were merged with the ONS data file
on sampled PISA schools. One school identifier was wrongly coded in the file of
sampled pupils. After cleaning all 175 schools of the pupil file could be matched with
the file for sampled PISA schools.
-14In a third step, we merged the file on sampled PISA pupils and schools with the
national file by using the supplied student identifiers. As a result, 71 out of 3,776
responding pupils could not be matched with the national file and therefore had to be
excluded from the analysis. The merged data set also revealed that 50 responding
PISA pupils were either statemented or in special schools and should therefore in
principle not have been part of the survey. However, given that these pupils had been
included in the PISA sampling we also included them into the analysis.
The national file was missing information for 6 out of 570 schools given in the ONS
school sample file. The final school file for the school analysis included therefore 564
sampled PISA schools and 3,637 other schools.
Final sample sizes and missing values for the entire data set are given in Table A1.1
of Appendix 1 for 2003 and Table A2.1 of Appendix 2 for 2000.
The purpose of the merging of data from PISA files and from national pupil files is to
be able to compare characteristics of respondents and non-respondents, whether
schools or pupils. Appendix 6 gives an example of a further use of the merged data,
which is to investigate whether peer group effects estimated with PISA data are biased
downwards.
2.1.5 Information on students
Table 2.4 lists key information we have on students from the national databases.
There is a large array of measures of achievement provided by KS3 and KS4 (GCSE
or equivalent) scores. In subsequent chapters we focus in particular on ‘Ks3_aps’,
‘Ks4_pts’, and ‘Ks4_mean’. These are respectively (i) the average points score across
the three KS3 subjects – English, Maths and Science, (ii) the total points scored in
GCSEs (or equivalent), and (iii) the average points per subject scored in those GCSEs
(or equivalent) that were taken by the student. Other measures listed in the table
include the popular summary measure ‘Ks4_ac’, which is a 0/1 variable indicating
whether the student has 5 or more GCSE passes at grades A* to C. (Students in
general will have taken their GCSEs after the period when the PISA assessment was
carried out – the database includes all GCSE results irrespective of when taken.)
-15-
Table 2.4: Information on students from the national databases
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
Ks4_5ac
= 1 if male (0= female)
= 1 if pupil is eligible for free school meals (0 otherwise)
= 1 if pupil achieved English KS3 level 5 or above (simple level data)
= 1 if pupil achieved Maths KS3 level 5 or above (simple level data)
= 1 if pupil achieved Science KS3 level 5 or above (simple level data)
= 1 pupil achieved English KS3 level 5 or above (fine grade data)
= 1 pupil achieved Maths KS3 level 5 or above (fine grade data)
= 1 pupil achieved Science KS3 level 5 or above (fine grade data)
= 1 if pupil has 5 or more GCSE/GNVQ A*-C passes
Ks3Efg
Ks3Mfg
Ks3Sfg
Ks3_aps
Ks4_pts
Ks4_ptc
Ks4_mean
Ks4_ac
English Key Stage 3 level (fine grade data)
Maths Key Stage 3 level (fine grade data)
Science Key Stage 3 level (fine grade data)
KS3 average point score over English, Maths and Science
Total GCSE/GNVQ point scores
Capped GCSE\ GNVQ point scores (best 8 subjects)
Mean GCSE / GNVQ point score across all subjects where GCSE/GNVQ taken
Number of GCSE/GNVQ A*-C passes
2.2
Weights
PISA samples schools with probability proportional to their size and then samples 35
pupils aged 15 with equal probability within each school. All 15 year old pupils in the
school are selected if a school has less than 35 pupils of this age.
In principle, such a sample design is self-weighting. The higher probability of being
selected of schools with many pupils is balanced by the lower probability of any
individual pupil within a larger school to be among the 35 selected for that school.
The lower probability of small schools to be selected is balanced by the higher
probability of any individual pupil within these schools to be among the 35 selected.
However, in practice the self-weighting is not perfect. School size may turn out to
differ from that indicated in the school sampling frame when the sampling
probabilities were determined. And there is the issue of the very small schools with
less than 35 pupils. Hence in practice weights are needed at both school and pupil
levels to take account of the survey’s design. These are ‘design weights’.
In addition, weights may be constructed to allow for the levels of response by schools
and by pupils, and also for the pattern of this response.
Our terms of reference asked us to compare five different groups of pupils at different
stages of the sample design.
i)
ii)
iii)
iv)
v)
the population of all 15 year old pupils in England schools
all pupils in sampled schools
all pupils in the sample of responding schools
all sampled pupils in responding schools
responding pupils (the final sample of pupils in the OECD database)
-16-
The issue of the appropriate weights to apply at each stage (ii)-(v) arises.
The data sets we received from ONS contained various weights at the school and
individual level. However, the files lacked sufficient documentation on what these
weights are intended to be and how they were constructed. For example, we were
unclear whether the school weights are intended to be pure design weights or whether
they also take into account response levels within the various strata, most notably
average levels of achievement within schools.
We therefore decided to investigate the weights supplied with the data and to compare
them with suitable benchmarks.
There are two sets of weights that we can use as benchmarks. First, the final OECD
samples of schools and pupils in England in 2000 and 2003 (i.e. the data available
from the OECD website) contain both school and pupil weights, the construction of
which is described in the international reports for PISA published by the OECD.
Second, given that we have population data available, we can construct what we call
‘theoretical’ design weights in the light of the PISA sample design.
2.2.1 Theoretical design weights in the PISA two stage design
2.2.1.1 School weights
Let N be the number of schools in the population and n the number of sampled
schools. Let Mi be the total number of 15 year olds in school i and mi the number of
sampled pupils in the school (which should be 35, unless the school has fewer than
this number of 15 year olds).
The design weight for any school i should be the reciprocal of the school’s selection
probability, pi. The latter is given by:
(1)
pi =
n*Mi
N
∑M
i =1
i
Hence, the school design weight, wi, should be:
N
(2)
wi =
∑Mi
i =1
n*Mi
We call this weight wi the “theoretical school weight”. We do not apply it in any of
the analyses that follow in subsequent chapters of this report but it serves as a
benchmark for examining the weights supplied by ONS.
-17The other benchmark is the OECD school weights. These adjust for all other relevant
factors besides sample design, notably the difference between actual school size and
the size indicated in the frame and average levels of response within the strata.
Some of our analysis at stages (ii) and (iii) uses the individual pupils within schools as
the unit of analysis. The question arises of which weight to apply. The correct weight
is the product of the reciprocals of the school selection probability and the pupil
selection probability. Since the analyses at stages (ii) and (iii) comprise all pupils in
the selected schools, a pupil’s probability of being included given their school is
included equals 1.0 in all cases. Hence, when conducting analyses at these stages, we
can simply apply whatever school weight we use to each pupil.
2.2.1.2 Pupil weights
The probability of pupil j in school i being selected for the sample, pij, is the product
of the probability of his/her school being selected, pi, and the probability of the pupil
being among the (in general) 35 pupils sampled in the pupil’s school. The latter is
equal (in general) to 35 divided by the total number of 15 year olds in the school:
(3)
pij = pi *
mi
n * M i mi
n*m
*
= N
= N i
Mi
M
∑Mi i ∑Mi
i =1
i =1
The pupil weight, wij, is then the reciprocal of pij:
N
(4)
wij =
∑M
i =1
i
n * mi
We call this weight the ‘theoretical student weight’. Note that given the PISA sample
design, the theoretical student weight, wij, is a constant for almost all pupils, since mi
equals 35 in almost all schools. This reflects the ‘self-weighting’ principle of the two
stage design. However, in small schools with less than 35 pupils aged 15, wij will be
bigger than in other schools, producing some variation in the values of the theoretical
student weight. Given that small schools had a lower probability of being sampled,
only a small number of pupils in the sample will have a weight different from that of
the great majority of pupils in the sample, who are in schools with 35 or more 15 year
olds. This means that applying the theoretical student weight will yield very similar
results to not applying any weight at all.
Again, as with the theoretical school weight, we do not use the theoretical student
weight in any analyses in subsequent chapters but it does give us a comparator for the
values of the pupil weights supplied by ONS.
-18-
2.2.2 ONS weights received with the data
Table 2.5 describes the ONS weights we received in the 2000 and 2003 data bases. 6
Table 2.5: ONS school and pupil weights in 2000 and 2003
School weight
sch_bwt
sch_wt
All schools sampled for
PISA
Name of variable
Availability
Pupil weight
STU_WT in 2003
stu_wt in 2000
All sampled pupils
In both years two school weight variables, ‘sch_wt’ and ‘sch_bwt’, were included in
the ONS files. These weights were present for all sampled schools, including all
initial, first and second-replacement schools, irrespective on whether the school was
actually approached. The ONS pupil weight (‘STU_WT’ in 2003 and ‘stu_wt’ in
2000) was present for all sampled pupils (whether responding or not responding) in
responding schools. We were not provided with a description of these weights.
The school weight variable sch_wt was equal to 1.0 for almost all schools in both
2000 and 2003. This variable could not have been constructed as described in
equation (2). The alternative was the other variable that was supplied with the data,
sch_bwt. In both 2000 and 2003 we could reproduce its values for each school, giving
us confidence that it is a suitable weighting variable.
2.2.2.1 School weight
Although we could reproduce the values of the ONS school weight that we received
with the data, its calculation did not in fact follow exactly that of the theoretical
weight described above in equation (2). Its calculation is shown in equation (3)
(5)
sch _ bwt i =
1
Mi
Values of Mi were provided in the database in a variable called ‘mos’ which gives the
estimated number of 15 year old pupils in schools at the time of the construction of
the sampling frame. Our calculation of the theoretical school weight is also based on
this ‘mos’ variable.
(5) differs from (2) only by a constant factor: the number of pupils in the population
divided by the number of sampled schools. (We do not know why this constant factor
was not included in the ONS calculation.) As a consequence, the values of sch_bwt
are very small and do not correspond to the inverse of a schools’ probability of being
sampled. However, this difference by a constant factor only means that sch_bwt,
6
The school weight was provided in the school file including all sampled PISA schools. The pupil
weight was provided in the student file containing all sampled PISA pupils.
-19based on (2), is perfectly correlated with the theoretical weight, based on (5). We
therefore used sch_bwt in most of our school level analyses.
2.2.2.2 Student weight
The ONS student weight is meant to be the product of the ONS school weight and the
reciprocal of the within-school selection probability (the number of 15 year old pupils
sampled divided by the number of 15 year old pupils attending the school):
(6)
sch _ bwt i *
Mi
1
=
mi mi
In general the same number of pupils was sampled in each school: 35. Hence, given
(6) the student weight for each pupil in each school is in general the same.
Using equation (6), we were able to reproduce all the values of the supplied ONS
student weight for 2000, ‘stu_wt’. However, we could not reproduce the values of the
2003 weight, ‘STU_WT’. This leads to some uncertainty as to how the 2003 variable
was calculated. For example, we do not know whether the student weight for 2003
takes the level of response within schools into account.
Figure 2.1 and Table 2.6 shed light on the differences between a student weight
calculated with (6) and the student weight variable actually provided for 2003,
‘STU_WT’.
Figure 2.1: Scatter plot of student weight ‘STU_WT’ provided by ONS for 2003
and a student weight estimated with equation (6)
Values up to 95th percentile of eq(6) estimates
Values above 95th percentile of eq (6) estimates
0.055
weight based on equation (6)
weight based on equation (6)
0.035
0.03
0.025
0.05
0.045
0.04
0.035
0.03
0.025
0
0.01
0.02
0.03
0.04
ONS student weight (STU_WT)
0.05
0.06
0
0.01
0.02
0.03
0.04
0.05
ONS student weight (STU_WT)
The first scatter plot shows values up to the 95th percentile of a variable based on
equation (6). This weight is a constant within this range of values, equal to 1/35.
However, the student weight ‘STU_WT’ provided by ONS varies for these pupils.
Hence it must take account of other things than those entering equation (6).
0.06
0.07
-20The second scatter plot is restricted to those pupils with values of the weight based on
equation (6) that are above the 95th percentile. These are students in schools with less
than 35 pupils aged 15.
Table 2.6: Summary statistics of 2003 student weight variables
Variable
STU_WT
Values based
on equation
(6)
5653
Std.
Min Max
Dev.
.0302 .0048 .0092 .0643
0.024
5653
.0290 .0029 .0286
0.0286 0.0286 0.0286 0.0286
Obs
Mean
.05
5th
25th
75th
95th
0.0285 0.0314 0.0355
Faced with these findings, our first guess was that the values of STU_WT were
perhaps the same as those of the OECD student weight, that takes non-response at
both school and pupil levels into account (see below). However, these two weights,
STU_WT and the OECD student weight, have a correlation of only 0.62.
In our student level analyses we have used the supplied 2000 weight, stu_wt, which
we can replicate, and the supplied 2003 weight, STU_WT, which we cannot replicate.
The fact that we could not replicate the 2003 weight causes us some disquiet.
2.2.3 OECD weights
Table 2.7 presents the school weights and pupil weights available in the OECD
database (available for download from the OECD website).
Table 2.7: OECD school and pupil weights in 2000 and 2003
School weight
Pupil weight
Name of variable
scweight in 2003
w_fstuwt
wnrschbw in 2000
Availability
All schools responding in
All responding students
PISA
The calculation of the OECD weights differ from the theoretical design weights in
equations (2) and (4). The details of the calculations are given in OECD (2005a: 107117 and 2001b: 89-98). We focus here only on those additional factors included in the
OECD weight that adjust for non-response.
School non-response adjustment
The OECD school weight adjusts for school non-response. We give only the gist of
the calculation here (details are given in OECD 2005a: 111-113). Schools were
grouped into LEA maintained, grant maintained and independent schools. (The
situation described is that for 2003.) The LEA maintained schools were further
stratified into six groups and grant maintained schools into three groups based on the
proportion of students in the school gaining five or more GCSEs at grades A*-C.
-21Independent schools were divided more simply into coeducational schools, boys only
schools and girls only schools (DfES 2005). (These groups often corresponded to the
strata that were used in sampling.)
For each of these groups, or ‘weighting classes’, the school non-response adjustment
factor in the OECD weight is the number of pupils in (a) participating schools and (b)
non-responding schools in the initial sample that were not replaced divided by the
number of pupils in the participating schools.
Hence, if a school in the initial school sample could not be replaced by a first or
second-replacement school, schools in the same weighting class as this school
received a greater weight. The weighting factor depends on the number of pupils in
the school that could not be replaced.
Given that these weighting classes were constructed on the basis of GCSE results this
school non-response adjustment is de facto an adjustment based on schools’ average
achievement levels.
Student non-response adjustment
Achievement levels of students were not taken into account for the OECD student
non-response adjustment. In general, the student response adjustment factor was the
ratio of the number of students who were sampled to the number of students who
responded. For example, in a school were 35 pupils were sampled but only 25 pupils
responded, the OECD student weight would include a factor equal to 35/25. All pupils
who responded in the school would receive the same weight. The final OECD student
weights take into account both the non-response adjustment at the school level and at
the student level.
2.2.4 Comparison of ONS weights, OECD weights and theoretical
design weights
How do the weights we have described compare with each other? The theoretical
student weight is equal to 1.0 for all pupils in schools with 35 or more 15 year old
pupils. The ONS student weight should be perfectly correlated with this theoretical
student weight, and we have found that to be the case for 2000 but not 2003. The
OECD weights takes non-response at school and student level into account and we
therefore expect them to show more variation than the other weights.
2.2.4.1 Weights in 2003
Table 2.8 shows values of the school weights for all schools sampled in 2003 (original
sample of 190 together with the first and second replacement schools) and Table 2.9
shows values for all responding schools. The correlations are shown in Table 2.10.7
7
For the estimation of the theoretical student weight, the population size was set to a value of 697,935
which equals the final survey population minus exclusions (see Table 2.3). The number of sampled
schools was set to 190.
-22-
Table 2.8: Summary statistics of weights for all sampled schools (initial, first
replacement, second replacement), 2003
Variable
Theoretical
weight (wi)
ONS school
weight
(sch_bwt)
Obs8
Mean
Std. Dev.
Min
Max
5th
25th
75th
95th
560
25.83
17.45
6.09
104.95
12.49
16.15
28.04
61.24
560
.0070
.0048
.0017
.0286
0.0034
0.0044
0.0076
0.0167
Table 2.9: Summary statistics of weights for responding schools, 2003
Variable
Theoretical
weight (wi)
ONS school
weight
(sch_bwt)
OECD
school
weight
(scweight)
Obs
Mean
Std. Dev.
Min
Max
5th
25th
75th
95th
161
23.82
16.52
6.34
104.95
12.67
15.97
24.01
44.80
161
.0065
.0045
.0017
.0286
0.0034
0.0043
0.0065
0.0122
161
23.72
17.13
5.61
125.66
12.34
15.10
25.30
43.73
Table 2.10: Correlation matrix, school weights, 2003
ONS school
weight
(sch_bwt)
ONS school
weight (sch_bwt)
Theoretical school
weight (wi)
OECD school
weight (scweight)
Theoretical
school
weight (wi)
OECD
school
weight
(scweight)
1.000
1.000
1.000
0.848
0.848
1.000
Note: the size of the sample used in the calculation of the correlation coefficients varies according to
the number of valid values for the variables concerned.
The values of all three weights vary substantially due to differences in the sizes of the
sampled schools. The values of the ONS weight are very small since it lacks the
constant factor in equation (2), as discussed above. However, it is perfectly correlated
8
In the original OECD data base, the number of schools is 159 and not 161. The higher school number
derives from the merging process. Some pupils have changed schools after sitting the PISA test which
impacts upon the attribution of pupils to schools. However, results of using the OECD weight from
OECD data base are very similar to those given here with a mean of 23.83 and a standard deviation of
17.20.
-23with the theoretical school weight. The mean and standard deviation of the OECD
school weight are similar to those of the theoretical school weight. The correlation
between the two is relatively high, 0.85. The left hand scatter plot in Figure 2.2 shows
the pattern of association behind this correlation, which is reduced by a few outliers.
One factor that leads to differences between the theoretical school weight and the
OECD weight will be the adjustment of school non-response taken into account in the
latter.
The right hand scatter plot shows the association of the OECD school weight with
school size. (The theoretical school weight and the ONS weight have an even closer
inverse relationship with school size.)
Figure 2.2: ONS and OECD school weights and school size, 2003
OECD weight and school size
0.03
140
OECD school weight (scweight)
ONS school weight (sch_bwt)
OECD school weight and ONS school weight
0.025
0.02
0.015
0.01
r=0.85
0.005
0
0
50
100
150
120
100
80
60
40
20
0
0
OECD school weight (scweight)
200
400
600
School size (mos)
800
Note: the unit of analysis is the school.
Summary statistics for the student weights in 2003 are given in Table 2.11 and
correlations in Table 2.12.
Table 2.11: Summary statistics for student weights in the final student sample,
2003
Variable
Theoretical
pupil weight
wij
ONS weight
(STU_WT)
OECD weight
(w_fstuwt)
Obs
Mean
Std.
Dev.
Min
Max
5th
25th
75th
95th
3631
106.96
11.61
104.95
183.67
104.95
104.95
104.95
104.95
3617
.0302
.0052
.0092
.0643
0.0241
0.0285
0.0313
0.0347
3631
154.86
50.34
61.46
579.69
97.58
124.11
176.48
236.71
As discussed earlier, the theoretical student weight is a constant up to the 95th
percentile. By contrast, both the ONS and OECD weights show considerable
-24variation. As discussed above, we do not know why the ONS student weight varies in
this way. However, for the OECD student weight we know that non-response at
school and student level will help generate variation.
Correlation coefficients between the three weights are relatively low: 0.58 between
ONS and theoretical student weight and 0.62 between the OECD and ONS student
weights.
Table 2.12: Correlation matrix, student weights, 2003
ONS student
weight STU_WT
ONS student
weight STU_WT
Theoretical student
weight wij
OECD student
weight w_fstuwt
Theoretical student
weight wij
OECD student
weight w_fstuwt
1.000
0.578
1.000
0.618
0.306
1.000
Note: the size of the sample used in the calculation of the correlation coefficients varies according to
the number of valid values for the variables concerned.
The association between the ONS weight and the OECD weight is shown in Figure
2.3.
Figure 2.3: OECD student weight and ONS student weight, 2003
ONS student weight (STU_WT)
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
100
200
300
400
500
600
700
OECD student weight (w_fstuwt)
2.2.4.2 Weights in 2000
Tables 2.13 and 2.14 present summary statistics for school weights in 2000 for all
sampled schools and for all responding schools.
-25-
Table 2.13: Summary statistics of weights for all sampled schools (initial, first
replacement, second replacement), 2000
Variable
Theoretical
weight wi
ONS school
weight
(sch_bwt)
Obs
Mean
Std.
Dev.
Min
Max
5th
25th
75th
95th
543
22.36
14.25
5.08
87.85
11.06
14.37
25.00
51.25
543
.0073
.0046
.0017
.0286
0.0036
0.0047
0.0081
0.0167
Table 2.14: Summary statistics of weights for responding schools, 2000
Variable
Theoretical
weight wi
ONS school
weight
(sch_bwt)
OECD school
weight
(wnrschbw)
Obs
Mean
Std.
Dev.
Min
Max
5th
25th
75th
95th
153
20.66
11.55
5.08
87.85
10.60
13.85
25.00
38.44
153
.0067
.0037
.0017
.0286
0.0034
0.0045
0.0081
0.0125
154
27.35
42.70
0
523.17
8.84
15.83
29.08
51.15
Note: for 105 pupils in the national data set no pupil identifiers are available so that they cannot be
merged with the OECD data. Data of all pupils in one school are not available, leading to the smaller
sample of schools with 154 in the England data set. The merging of the databases is not perfect due to
missing school identifiers. The mean of the OECD school weight (wnrschbw) is 27.26 and the standard
deviation 42.55 if the original OECD data file is used.
As for the 2003, we see considerable variation in the values of all three weights.9
Table 2.15 presents the correlation coefficients. The ONS school weight and the
theoretical school weight are perfectly correlated. The OECD school weight has a
correlation of 0.72 with both other weights, an association that is demonstrated
graphically in Figure 2.4 and which is affected by one obvious outlier.
Figure 2.4 illustrates the inverse relationship of the OECD weights with school size.
9
Oddly, five schools in the 2000 data have a value of 0 for the OECD weight, meaning that those
schools are excluded from the analysis if school weights are applied. Pupils in these schools however
have an OECD student weight greater than 0.
-26-
Table 2.15: Correlation matrix, school weights, 2000
sch_bwt
wi
Sch_bwt
wi
OECD school
weight
1.0000
1.0000
0.7201
OECD school
weight
1.0000
0.7201
1.0000
Note: the size of the sample used in the calculation of the correlation coefficients varies according to
the number of valid values for the variables concerned.
ONS school weight (sch_bwt)
0.03
0.025
0.02
0.015
r=0.72
0.01
0.005
0
0
100
200
300
400
500
600
OECD school weight (wnrschbw)
Figure 2.4: ONS and OECD school weights and school size, 2000
OECD school weight and ONS school weight
OECD weight and school size
600
500
400
300
200
100
0
0
OECD school weight (wnrschbw)
200
400
600
School size (mos)
800
Tables 2.16 and 2.17 give summary statistics of the student weights in 2000. The
theoretical student weight is constant for most pupils. The ONS student weight is also
almost a constant in 2000 and is correlated perfectly with the theoretical student
weight. However, the OECD weight shows considerable variation. Figure 2.5
compares ONS and OECD weights. The ONS student weight appears constant but for
two data points, which represent two schools with 41 pupils in total. The OECD
student weight displays considerable variation.
Table 2.16: Summary statistics of student weights in final student sample, 2000
Variable
ONS weight
(stu_wt)
Theoretical
weight wij
OECD weight
(w_fstuwt)
Obs
Mean
Std.
Dev.
Min
Max
5th
25th
75th
95th
4015
.0287
.0012
.0286
.05
0.0286
0.0286
0.0286
0.0286
4015
88.13
3.84
87.85
153.74
87.85
87.85
87.85
87.85
4120
135.98
48.61
33.14
588.57
54.58
115.91
155.56
206.89
-27-
Table 2.17: Correlation matrix, pupil weights, 2000
ONS weight
(stu_wt)
Theoretical weight
wij
OECD weight
(w_fstuwt)
ONS weight
(stu_wt)
1.0000
ONS weight
(stu_wt)
1.0000
1.0000
0.5008
0.5008
ONS weight
(stu_wt)
1.0000
Figure 2.5: OECD student weight and ONS student weight, 2000
ONS student weight (STU_WT)
0.06
0.05
0.04
0.03
0.02
0.01
0
0
100
200
300
400
500
600
700
OECD student weight (w_fstuwt)
Note: the unit of analysis is the pupil.
2.2.5 Summary of available weights and their properties
•
The PISA sample design is approximately self-weighting as far as analysis at the
student level is concerned. Weights that are the inverse of the product of the exante school and pupil selection probabilities generally take the same value, with
departures only for the small minority of pupils in schools that are attended by less
than 35 15 year olds.
•
In both 2000 and 2003 we could reproduce the ONS school weights which were
supplied with the data. These weights are pure sampling weights that take no
allowance of levels of school response.
•
We could reproduce the ONS student weight in 2000. Bar a constant factor, the
values of this weight are identical to the student design weight that we calculated
for the survey in that year, which displays very little variation due to the
approximate self-weighting nature of the survey design. However, we could not
reproduce the ONS 2003 student weight, which displays more variation.
-28-
•
The values of the OECD school and student weights differ considerably from the
ONS weights. One reason for this is that the OECD weights incorporate
adjustment for non-response at the school level (an adjustment that varies with the
school GCSE results) and at the pupil level (where the adjustment does not vary
with individual pupil achievement).
2.3
Summary
We use a variety of datasets that have come from different sources. The reader is
reminded of the following features of these data:
•
For both 2000 and 2003 we have national databases that attempt to mirror the
PISA survey population – the target population less exclusions. The exclusions
cannot be perfectly implemented but we presume a greater degree of success with
the 2003 data. The 2000 dataset includes statemented pupils. It excludes all pupils
with no GCSE (or equivalent) entries (who are in the PISA survey population).
•
The 2003 national database is composed of pupils who had their 15th birthday
during a 14 month period. The additional two months beyond the 12 months taken
for the 2000 data was to allow for the extension of the PISA study window in
England in 2003. While 14 percent of the national database had birthdays in the
additional two months, this was the case for less than 3 percent of the PISA
sample of responding pupils.
•
The national databases for both 2000 and 2003 included information both on KS3
and KS4 scores. KS3 tests are not typically conducted in private schools hence
this information is missing for most private school pupils in both years (see
Appendix 3).
We explored the school and student weights supplied with the files of PISA data at
some length. These ‘ONS weights’ are available for all sampled schools and pupils,
irrespective of response. The purpose and construction of these weights was unclear.
Two school weight variables were provided with the data. We discarded one of these
as being unsuitable. The other seemed to be a suitable design weight that we were
able to reproduce. We could also reproduce the ONS student weight for 2000, which
also appears to be a pure design weight, but we could not reproduce the ONS student
weight for 2003.
Given the PISA sampling design, the ONS student weight in 2000 displays virtually
no variation and hence the application of this weight in the student level analyses –
stages (iv) and (v) in the list at the start of this chapter – has virtually no effect. The
student weight provided for the 2003 database displays more variation. We apply this
weight for the 2003 analyses but we also report unweighted results.
Our terms of reference did not include any comparison of the impact of the ONS
weights and the OECD weights. The latter incorporate adjustment for non-response –
an adjustment that in the case of schools varies with their GCSE exam results. Hence
-29there is a good case for comparing the use of ONS and OECD weights when
calculating summary statistics of achievement scores and assessing the extent of
possible response bias in estimates of key population parameters. We include some
limited comparisons of this type in the chapters that follow.
We also compute student response adjustment factors that go beyond those that enter
the OECD weight calculations, drawing on the results of logistic regression models of
the probability of pupil response. These logistic regression models are described in
Chapters 3 and 4 for the 2003 and 2000 data respectively. This gives us an alternative
set of weights that includes a more detailed adjustment for pupil response.
-30-
-31-
3
Bias analysis – 2003 data
Our analysis for each year of data, 2000 and 2003, is in four parts. First, following our
terms of reference, we summarise key domestic achievement variables (KS3 and
KS4) at five stages of the PISA survey process, from the full population of all pupils
in all schools through to the final sample of responding pupils in responding schools.
Second, we conduct a more detailed analysis of differences in domestic achievement
scores between responding schools and non-responding schools, and between
responding pupils and non-responding pupils. This includes tests for statistically
significant differences between respondents and non-respondents.
Third, we investigate the impact of weights that adjust for response on results for the
domestic achievement variables. In doing so we briefly examine the extent to which
results using the OECD weights differ from results using the ONS weights, which do
not adjust for non-response. This sheds light on whether the OECD weights help
resolve any problem of response bias in the data. We also show results with a
weighting variable that we construct ourselves which goes further than the OECD
weights in its adjustment for the pattern of response at the pupil level.
Finally, we estimate the extent of response bias in mean PISA scores in the final
sample of responding pupils with three different methods, which include the
application of our weights. In the light of these results, we then comment on the
decision to exclude the UK from the international report on PISA 2003.
3.1
Analysis at five stages of sampling and response
Table 3.1 lists the five stages in the PISA survey process (see Chapter 1, Figure 1.1).
Table 3.1: Description of five stages of PISA
Stage
i) All 15 year olds in schools in
England
ii) All pupils in all sampled schools
iii) All pupils in responding schools
iv) All sampled pupils
v) All responding pupils
Description
All pupils in the population who were not
excluded (see Chapter 2)
All sampled schools, including initial
schools, first and second replacement schools
All pupils in all responding schools.
(Responding schools with fewer than 25% of
pupils responding were excluded.)
All sampled pupils in responding schools
covered at (iii) (Statemented pupils in the
PISA sample are included.)
All responding pupils. (Pupils in the final
PISA data set. (Statemented pupils in the
PISA sample are included.)
-32The description of each stage summarises how we implemented empirically the
concepts shown in the first column. We have commented on some of the issues in
Chapter 2. But we have yet to mention a key choice concerning group (ii), ‘all pupils
in all sampled schools’.
PISA has a system of replacement of schools that do not respond, as described in
Chapter 1. Hence there is the issue of whether ‘sampled schools’ should refer to just
the schools in the initial sample (all of which were asked to participate in the survey),
all schools that were sampled irrespective of whether they were approached or not
(including replacement schools never approached because the initial school agreed to
participate), or just the ‘last’ school in each case to be approached – an initial school if
it participated, a first replacement school if it in turn participated after a refusal, or a
second replacement school whether it participated or not.
We chose to define group (ii) as all schools that were sampled, irrespective of whether
they were approached or not. In retrospect, this was probably a bad decision. It means
that results for groups (ii) and (iii) will differ not only due to the impact of school
response. They will also differ due to sampling variation and replacement, since group
(iii) contains only schools that were actually approached (and responded) while group
(ii) contains in addition schools never approached. As noted by Sturgis et al (2006),
the issue of the impact of replacement on the results is not clear-cut and it would have
been sensible to have tried to isolate it. We partly make up for this decision in the
subsequent analysis of non-response, where we do compare directly the responding
and the non-responding schools (with various treatments of replacement). But as far
as our comparison of groups (i) to (v) is concerned, it is important to realise that the
difference in results between groups (ii) and (iii) does not give a clean estimate of the
effect of non-response by schools, although this is probably the major influence on
any differences.
Table 3.2 lists the variables that we analyse.
Table 3.2: Variables relating to individual students used in the analysis
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
Ks4_5ac
= 1 if male (0= female)
= 1 if pupil is eligible for free school meals (0 otherwise)
= 1 if pupil achieved English KS3 level 5 or above (simple level data)
= 1 if pupil achieved Maths KS3 level 5 or above (simple level data)
= 1 if pupil achieved Science KS3 level 5 or above (simple level data)
= 1 pupil achieved English KS3 level 5 or above (fine grade data)
= 1 pupil achieved Maths KS3 level 5 or above (fine grade data)
= 1 pupil achieved Science KS3 level 5 or above (fine grade data)
= 1 if pupil has 5 or more GCSE/GNVQ A*-C passes
Ks3Efg
Ks3Mfg
Ks3Sfg
Ks3_aps
Ks4_pts
Ks4_ptc
Ks4_mean
Ks4_ac
English Key Stage 3 level (fine grade data)
Maths Key Stage 3 level (fine grade data)
Science Key Stage 3 level (fine grade data)
KS3 average point score over English, Maths and Science
Total GCSE/GNVQ point scores
Capped GCSE\ GNVQ point scores (best 8 subjects)
Mean GCSE / GNVQ point score across all subjects where GCSE/GNVQ taken
Number of GCSE/GNVQ A*-C passes
-33-
We start with a summary of mean values of each variable for the five different groups
in Table 3.3. (More details are given in Appendix 1 including the number of missing
values and the standard deviations.) We remind the reader that KS3 variables are
typically missing for private school pupils (see Appendix 3). Figures 3.1 to 3.3 show
the changes in both the means and in standard deviations between the five groups for
three continuous achievement variables, one for KS3 (which averages across the three
subjects for each student) and two for GCSE (a ‘quality’ variable, ks4_mean, and a
‘quantity’ variable, ks4_pts). A dashed line is shown between values for groups (iii)
and (iv) as a reminder of a small change in the exclusion criteria of statemented
pupils. We included 91 statemented and sampled pupils in all groups (i) to (v). In
practice this means that we included all statemented pupils present in groups (iv) and
(v) but only a very small fraction of all statemented pupils in groups (i) to (iii).
At stage (i) we analyse values across all pupils in the population and hence no weights
are required. At stages (ii) and (iii) we apply the school weight provided by ONS
(‘sch_bwt’). At stages (iv) and (v) we use the ONS student weight (‘STU_WT)’. As
noted in Chapter 2, we are unable to reproduce the ONS student weight for 2003.
Looking first at the bottom half of Table 3.3, the mean values of the continuous
achievement variables in the final sample of responding students (v) are clearly all
higher than the mean values for the population (i). For example, the mean of ks3_aps
is 1.7% higher and that of ks4_pts is 6.8% higher. When comparing these figures one
needs to bear in mind that the degree of dispersion in the two measures differs. Using
the standard deviations of the population, group (i), as the units, the ks3_aps mean
rises by 0.09 units between groups (i) and (v) and the ks4_pts mean by 0.14 units, i.e.
by a fairly similar amount. The main change comes between columns (iv) and (v), i.e.
at the last stage of the sampling process: student response.10
Looking at the top half of the table, as far as FSM eligibility is concerned, the sample
of pupils appears to get less and less representative from the population of all pupils in
English schools the more we proceed in the stages of the PISA survey process. There
is a fall of three percentage points in the FSM rate between stages (i) and (v). Note
also the quite large differences between the percentage of students above given levels
of achievement: level 5+ in KS3 subjects or with 5 or more GCSEs at A*-C (or
equivalent) – typically a 4 to 5 percent point difference between cols (i) and (v); 61
percent of the PISA sample of responding pupils have 5 or more GCSEs at A*-C
compared to 56 percent of the population. Again, the big increase tends to come
between cols (iv) and (v), but there are often smaller increases at earlier stages.
Figures 3.1 to 3.3 show clearly that with all three of the achievement variables
concerned, there is:
•
10
a clear rise in the mean between groups (iv) and (v) as a result of pupil response;
We noted in Chapter 2 that our definition of group (i), the population, included all persons born in
the 14 months January 1987-February 1988 rather than the 12 months of the 1987 calendar year. If we
restrict attention just to those persons born in 1987, the achievement score means for group (i) in fact
change very little. Most are very slightly lower e.g. 55.70% of pupils have 5 or more GCSEs at A*-C,
compared to the figure of 55.79% in Table 3.3, and the mean ks3_aps score is 34.128 compared to the
figure in the table of 34.162.
-34-
•
a slight decline in the mean between (ii) and (iii), which could be due to school
response or school replacement.
•
less dispersion of scores among responding pupils (v) than in the population (i).
The standard deviation falls to greater or smaller degrees as a result of school
response, the sampling of pupils within schools, and the response by pupils.
Overall, and especially as a result of pupil response, the final sample of responding
pupils has higher mean achievement and less dispersion of achievement than is found
in the population.
Table 3.3: Mean values of achievement variables at five stages, 2003
i
All pupils in
English
schools
Ii
All pupils in
all sampled
schools
iii
All pupils in
responding
schools
iv
Sampled
pupils in
responding
schools
v
Responding
pupils
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
Mean *100
50.017
13.778
69.674
70.537
69.825
71.079
72.026
71.400
55.788
Mean *100
49.332
12.563
70.073
71.075
70.422
71.444
72.616
72.026
55.912
Mean *100
47.395
11.861
70.538
71.135
70.805
71.958
72.715
72.487
55.309
Mean *100
46.28
11.357
71.190
71.681
70.991
72.308
73.007
72.340
56.354
Mean *100
46.267
10.399
74.406
75.280
74.276
75.495
76.632
75.487
60.948
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
Mean
5.512
5.785
5.541
34.162
42.854
36.264
4.350
5.377
Mean
5.531
5.810
5.564
34.301
42.879
36.451
4.373
5.370
Mean
5.521
5.799
5.556
34.214
42.627
36.172
4.334
5.317
Mean
5.534
5.802
5.555
34.232
43.496
36.975
4.414
5.396
Mean
5.619
5.908
5.636
34.757
45.769
38.744
4.605
5.799
Note: data behind the results in columns (ii) and (iii) are weighted by ONS school weight (sch_bwt)
and for columns (iv) and (v) by ONS student weight (STU_WT).
-35-
34.9
6.9
34.8
6.8
34.7
6.7
34.6
6.6
34.5
6.5
34.4
6.4
34.3
6.3
34.2
6.2
34.1
6.1
34
KS3 standard deviation
KS3 average score (KS3_aps)
Figure 3.1: Mean and standard deviation of average KS3 score (ks3_aps), 2003
6
average score
33.9
standard deviation
33.8
5.9
5.8
i
ii
iii
iv
v
Group of pupils
4.65
1.9
4.6
1.85
4.55
1.8
4.5
1.75
4.45
1.7
4.4
1.65
4.35
1.6
4.3
1.55
4.25
1.5
average score
4.2
standard deviation
4.15
1.45
1.4
i
ii
iii
Group of pupils
iv
v
Mean GCSE score standard deviation
Mean GCSE point score (KS4_mean)
Figure 3.2: Mean and standard deviation of GCSE point score per subject
(ks4_mean), 2003
-36-
Figure 3.3: Mean and standard deviation of total GCSE point score (ks4_pts),
2003
46.5
21.5
21
45.5
20.5
45
20
44.5
19.5
44
19
43.5
18.5
43
18
42.5
17.5
42
17
average score
41.5
standard deviation
41
Total GCSE score standard deviation
Mean of total GCSE point score
(KS4_pts)
46
16.5
16
i
ii
iii
iv
v
Group of pupils
What implications does this have for the percentages of pupils in each group that are
above (or below) given thresholds of achievement e.g. level 5 in KS3 or 5 or more
GCSEs at A*-C? These percentages are affected by both the changes in the mean and
in the standard deviation (SD).
Assume for sake of argument that changes in the SD come about through both tails in
the distribution shrinking in towards the mean. This will tend to reduce the proportion
of pupils found below any given score level that is below the mean. Hence it will tend
to increase the proportion of pupils above KS3 level 5, since this is a level of
achievement that comes below the mean level. If the mean score is also increasing –
as it clearly does between (iv) and (v) – this will tend to further push up the proportion
of students above KS3 level 5. By contrast, at score levels above the mean, the effects
of the higher mean and lower SD will tend to cancel each other out. The distribution
shifts to the right (higher mean) but if the top tail is pulled in (lower SD) then it is
possible that the percentage of students above a given level could even be unchanged.
Tables 3.4 and 3.5 shed more light on this. We take four threshold levels of score
from across the distribution, chosen to divide the population into five slices: at or
below the bottom decile, between the bottom decile and the lower quartile, between
the lower and upper quartile, between the upper quartile and the top decile and above
the top decile. In the case of the variable ks4_mean, the population divides almost
exactly in this way. But for variable ks3_aps the population does not divide quite so
neatly due to lumpiness in the distribution.
-37The results are clearest for the GCSE measure, ks4_mean. In group (v), almost
exactly the same percentage of pupils is found in the top group as in the population:
9.9% compared to 10.1%. The effects of higher mean and lower SD have cancelled
out. But only 4.9% of group (v) are in the bottom tail of lowest achievement,
compared to 10.0% in the population. Restricting attention to the change between
stages (iv) and (v) as a result of pupil response, the percentage in the bottom range of
GCSE scores falls by one third. The changes are smaller for the KS3 variable. For
example, the percentage in the bottom range falls by about one seventh between
stages (iv) and (v).
Table 3.4: Percent of pupils in KS3 (ks3_aps) percentile range of the population,
2003
Percentile
Range
Point
score
<=10
>10 – 25
>25-75
>75-90
>90
2003
25
>25-29
>29-39
>39-43
>43
i
All pupils
in
England
Schools
11.65
15.01
53.36
13.40
6.59
ii
All pupils in
all sampled
schools
iii
All pupils in
responding
schools
11.33
14.86
53.02
13.64
7.15
11.09
14.91
54.29
13.56
6.15
iv
All sampled
pupils in
responding
schools
11.40
14.74
54.25
13.61
6.00
v
Responding
pupils
9.59
12.93
56.33
14.58
6.56
Note: data are weighted with ONS school weight (sch_bwt) for (ii) and (iii), and with ONS student
weight (STU_WT) in (iv) and (v).
Table 3.5: Percent of pupils in GCSE points per subject (ks4_mean) percentile
range of the population, 2003
Percentile
range
<=10
>10 – 25
>25-75
>75-90
>90
Point score
2003
1.769
>1.769-3.158
>3.158-5.684
>5.684-6.636
>6.636
i
All pupils
in England
Schools
ii
All pupils
in all
sampled
schools
9.99
15.12
49.81
15.04
10.05
9.74
15.02
49.51
15.25
10.48
v
iv
iii
responding
All
All pupils
pupils
sampled
in
responding pupils in
responding
schools
schools
9.51
7.57
4.92
15.60
16.80
14.93
50.67
51.11
53.56
14.93
15.24
16.67
9.30
9.27
9.92
Note: percentages are weighted with ONS school weight (sch_bwt) for (ii) and (iii), and with ONS
pupil weight in (iv) and (v).
Another way of summarising what is happening at the final stage of response by
pupils is to compare the figures for group (v) in these two tables with the figures for
non-responding pupils. The latter are shown in Tables 3.6 and 3.7. For example,
while, 9.6% of respondents are in the lowest range on the basis of the KS3 variable,
this is the case for 16.1% of non-respondents. The figures are 4.9% and 14.6%
respectively for the GCSE variable.
-38-
Table 3.6: Percent of non-responding pupils in KS3 (ks3_aps) population
percentile range, 2003
Percentile range
<=10
>10 – 25
>25-75
>75-90
>90
%
16.14
19.48
48.79
11.08
4.51
Table 3.7: Percent of non-responding pupils in GCSE (ks4_mean) population
percentile range, 2003
Percentile range
<=10
>10 – 25
>25-75
>75-90
>90
%
14.59
21.76
44.63
11.46
7.56
Figure 3.4 shows the shape of the distribution of ks4_mean in each group (i)-(v). The
distribution for responding students (v) is clearly seen to be different from that for the
population (i). The combination of higher mean and lower SD can be seen to have
contrasting effects at the two ends of the distribution: producing under-representation
among respondents of pupils at or below low levels of ks4_mean (e.g. a value of 2)
but with the distributions coinciding at higher levels (from about 6.5 upwards).
In the next section we restrict attention to the impact of response, whether by schools
or by pupils. We do not investigate any further the impact of sampling. The issue of
sampling is not without interest. We have seen that means and standard deviations do
change between stages (i) and (ii) due to the sampling of schools and between stages
(iii) and (iv) due to the sampling of pupils. Of course, change is entirely to be
expected – sampling error results in values of sample statistics that depart from the
population values. However, the extent of the changes in means and standard
deviations could be investigated further. For example, do population values lie within
the 95 or 99 percent confidence intervals that can be estimated from the sample
information? If not then questions could be asked as to whether the sampling was
done in an appropriate manner and whether, for example, exclusions of pupils from
the PISA survey population were being appropriately carried out.
-39-
Figure 3.4: Kernel density estimates of ks4_mean distribution for all 5 groups of children in 2003
All 15 year olds in population
0.3
Pupils in all sampled PISA schools
Pupils in responding schools
Sampled pupils in responding schools
Responding pupils
0.25
Density
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
KS4 mean score
6
7
8
9
Note: Pupil groups (ii) and (iii) weighted with ONS school weight and groups (iv) and (v) with ONS pupil weight.
-40-
3.2
Response bias at school and pupil levels
We now conduct a more formal analysis of possible biases due to response at the
school and pupil levels for a selection of achievement variables. In terms of Figure 1.1
at the end of Chapter 1, we are now looking at the comparisons horizontally rather
than vertically.
a) School response bias. We first compare responding and non-responding
schools (the school as the unit of analysis) and then pupils in responding and
non-responding schools (the pupil as the unit of analysis). We allow for the
phenomenon of school replacement in various ways.
b) Student response bias. We compare responding and non-responding pupils.
We consider possible biases in both average achievement (as measured by the mean)
and in the dispersion of achievement (as measured by the standard deviation). We
also consider biases in the percentage of students above (or below) given score levels,
which are affected by any biases in either average scores or the dispersion of scores.
The issue of the appropriate weights to apply to the data is less important than in the
previous section. For example, large schools are over-represented in the sample of
schools and this was important to adjust for when comparing pupils in sampled
schools with pupils in all schools. But, conditional on having drawn such a sample, it
is less important to apply weights when we are comparing the responding with the
non-responding schools. Some of the results in this section are based on weighted data
and some are not.
In most of statistical tests involving the pupil as the unit of analysis we adjust the
estimated standard errors for the clustering of the pupils within schools, recognising
that the PISA sample design is not one of a simple random sample. However, we do
not allow for stratification and for this reason our estimated standard errors are likely
to be somewhat too large.11 This means that we may sometimes fail to reject the null
hypothesis of no difference between responding and non-responding schools or pupils
when it is in fact false (Type 2 error). In some tests (especially of differences in
variances between groups of pupils) we do not allow for clustering of pupils within
schools and run the risk of rejecting the null hypothesis when it is true (Type 1 error)
on account of the resulting underestimation of the standard errors.
3.2.1 Comparison of responding and non-responding schools
Arguably, the natural unit of analysis when analysing school response is the school
itself. It is schools that decide whether or not to respond at this stage of the survey.
For each school we therefore compute school values of all the variables by taking
their averages across each school’s 15 year old students. However, since PISA is
11
We use the ‘svy’ command in Stata and declare the schools as clusters. See Brown and Micklewright
(2004:25) for a comparison of standard errors of PISA scores (i) estimated by the OECD, (ii) allowing
for clustering but not stratification with the Stata ‘svy’ commands, and (iii) assuming simple random
sampling.
-41ultimately a survey of pupils’ learning achievement, one can make the argument for
using the pupil as the unit of analysis and we include some results on this basis. (Our
analysis in the previous section of groups (ii) and (iii) was also based on the pupil as
the unit of analysis.)
A total of 570 schools were sampled for PISA: 190 ‘initial’ schools, 190 first
replacement schools, and 190 second replacement schools. The replacement schools
may or may not have been actually approached, depending on the response of the
initial school for which they were a replacement. Of the 570 schools, a total of 281
were approached at some stage; all the 190 initial schools (187 effectively), 55 of the
first replacement schools and 39 of the second replacements. Table 3.8 shows how
they responded:
Table 3.8: School response (numbers of schools), 2003
Refused
Participated (>50% student rate)
Participated (25 to 50% student
rate)
Participated (<25% student rate)
Total
Initial
school
54
118
First
Second
replacement replacement
38
27
14
12
Total
119
144
13
2
0
15
2
187
1
55
0
39
3
281
Schools that participated but achieved a students response rate below 25% are treated
as non-responding schools in the analysis.
There are several possible ways to allow for the phenomenon of replacement.
a) Comparison of responding with non-responding schools in the initial
sample (without taking first and second replacement into account). We can
focus on the initial school sample of 187 schools. (This was the approach
we were asked to follow in the bias analysis we conducted for DfES in
2004.) This means we ignore any resolution to response bias that may
come from replacement of a non-responding school from another school
from within the same achievement stratum.
b) Comparison of all schools that responded with all schools that did not
respond, irrespective of stage. Another approach is to compare all nonresponding schools, whether initial, first or second replacement, with all
responding schools, whether initial, first or second replacement. This
analysis compares the 159 responding PISA schools (the 144 with student
rates of >50% and the 15 with rates of 25-50%) with the 122 nonresponding schools (the 119 with refusals and the 3 with pupil response
rates of <25%). The advantage this approach is that we include all schools
that were asked to respond into the analysis. It also boosts sample size.
c) Comparison of all schools that responded with second replacement
schools that did not respond. One might argue that it is the characteristics
of non-responding schools that were not replaced that are of most
-42importance. These are the non-responding second-replacement schools.
The final approach we adopt is therefore to compare the 187 responding
schools (at whatever stage – initial, first or second replacement) with the
27 non-responding second replacement schools. (One problem of this
analysis is that schools with low within-school student response which
were not replaced cannot be identified. Hence, this analysis does not
include schools with pupil response rates below 25%.)
Table 3.9 shows the characteristics of pupils in schools with low student response
rates – pupils included in approaches (a) and (b) but not (c). Pupils in these schools
have much lower average achievement e.g. only 36 percent have 5 or more GCSEs at
grades A*-C compared with 56 percent in the population of all 15 year old pupils .
Their schools were treated as not responding in the analyses that follow.
Table 3.9: Characteristics of pupils in schools with low pupil response rates (<
25%), 2003
Variable
Male
FSM
Elevel5
MLevel5
Slevel5
Elevel5fg
MLevel5fg
Slevel5fg
ks4_5ac
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
n
624
621
611
610
612
604
604
600
624
n
604
604
600
591
624
624
624
624
Missing
0
3
13
14
12
20
20
24
0
Missing
20
20
24
33
0
0
0
0
Mean*1000
50.439
25.427
49.002
53.479
48.561
50.093
55.562
50.572
36.074
Mean
4.903
5.201
4.970
30.578
31.430
28.012
3.387
3.524
SD
1.188
1.382
1.181
7.174
21.189
17.605
2.044
3.830
Note: ONS school weights (sch_bwt) applied.
a)
All schools in the initial school sample
Table 3.10 summarises the sample of schools used in this approach.
Table 3.10: School type by response (number of schools): approach (a), 2003
School
LEA maintained
Independent
Total
Non-responding
49
7
56
responding
123
6
129
Note: for two responding schools no information on school type is available.
Total
172
13
185
-43-
Differences in means and differences in standard deviations are shown in Tables 3.11
and 3.12 respectively.
We find no significant differences in mean values between responding and nonresponding schools, other than in the FSM eligibility rate (at the 5% level) where
responding schools have a lower rate. Differences in dispersion are statistically
significant however: responding schools display lower variation of KS3 and KS4
results (NB this relates to variation in their school-average values of pupils’ results).
The differences in variance of KS4_mean are clearly visible in the distributions
shown in Figure 3.5.
Table 3.11: Difference in means: approach (a), 2003
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Mean*100
47.79
11.42
72.87
73.44
73.23
53.58
Nonresponding
school
Mean*100
45.96
19.92
72.86
72.38
73.40
57.47
KS3_aps
KS4_mean
Ks4_pts
Mean
34.53
4.52
41.10
Mean
34.54
4.53
42.63
Responding
school
Difference
p
Observations
-1.83
8.5
-0.01
-1.06
0.17
3.89
0.77
0.04
0.99
0.82
0.98
0.56
185
175
182
183
182
185
0.01
0.01
1.53
0.99
0.97
0.66
182
185
185
Note: p gives the significance level at which we can just reject the null hypothesis that the values are
the same between responding and non-responding schools (adjusted Wald-test). Results are weighted
by ONS school weight sch_bwt.
-44-
Table 3.12: Differences in variances: approach (a), 2003
Responding
school
Nonresponding
school
Test of
SD1=SD2
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
16.32
8.94
15.14
13.75
14.85
18.87
SD2
25.66
16.51
20.67
19.65
21.89
26.57
p
0.00
0.00
0.01
0.00
0.00
0.00
Responding
& nonresponding
schools
SD
19.55
11.72
16.94
15.71
17.19
21.43
KS3_aps
KS4_mean
SD1
2.99
0.91
SD2
4.63
1.37
p
0.00
0.00
SD
3.54
1.07
Note: results are unweighted.
Figure 3.5: Distributions of KS3 and KS4 scores: approach (a), 2003
KS3 points score (ks3_aps)
Mean GCSE point score (ks4_mean)
Not responding school
Responding schools
Not responding school
Responding schools
0
5
10
15
20
25
30
35
40
45
50
55
60
KS3 average point score
0
1
2
3
4
5
Mean GCSE point score
6
Note: unweighted results. Kernel density estimates.
b)
All responding and non-responding schools (initial, first, second replacement)
Table 3.13 summarises the sample of schools used in this second approach.
7
8
-45-
Table 3.13: School type by response (number of schools): approach (b), 2003
School type
LEA maintained
school
Independent school
Total
Non-responding
106
Responding
148
Total
254
15
121
9
157
24
278
Note: for two non-responding and one responding school no information is available on school type.
Table 3.14: Differences in means: approach (b), 2003
Responding
school
Mean*100
Male
48.73
FSM
11.58
ELevel5fg
72.15
MLevel5fg
73.19
SLevel5fg
73.52
KS4_5ac
55.52
KS3_aps
KS4_mean
Ks4_pts
Mean
34.54
4.55
42.09
Non-responding
school
Mean*100
50.24
20.32
64.92
66.29
65.97
53.67
Difference
Mean
33.38
4.31
39.81
P
Observations
1.51
8.74
-7.23
-6.9
-7.55
-1.85
0.73
0.00
0.10
0.09
0.10
0.72
278
258
271
272
271
278
-1.16
-0.24
-2.28
0.15
0.36
0.42
271
278
278
Note: ONS school weights applied. Adjusted Wald-test.
Table 3.15 Differences in variances: approach (b), 2003
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
17.40
9.49
15.46
13.81
15.17
19.25
SD2
26.11
17.91
23.43
21.96
24.00
26.99
p
0.00
0.00
0.00
0.00
0.00
0.00
Responding
& nonresponding
school
SD
21.65
13.88
19.39
17.83
19.53
22.89
KS3_aps
KS4_mean
SD
3.08
0.94
SD
4.63
1.43
P
0.00
0.00
SD
3.83
1.18
NonResponding
Test of
responding
school
SD1=SD2
school
Note: results are unweighted.
-46The conclusions seem pretty much unchanged from approach (a), both as regards
mean and variance. There seems no significant difference in means (except for FSM)
but significant differences in the variance. The larger sample size does however begin
to push some of the differences in means towards being statistically significant at the
10% level.
c)
All responding schools and non-responding second replacement schools
Table 3.16 summarises the sample of schools used in this third approach.
Table 3.16: School type by response: approach (c), 2003
School type
LEA maintained
Independent
Total
Non-responding
22
4
26
Responding
148
9
157
Total
170
13
183
Note: for one non-responding and two responding schools school type information is missing.
Table 3.17: Differences in means: approach (c), 2003
Responding
school
Mean*100
Male
48.73
FSM
11.58
ELevel5fg
72.15
MLevel5fg
73.19
SLevel5fg
73.52
KS4_5ac
55.52
KS3_aps
KS4_mean
Ks4_pts
34.54
4.55
42.09
Non-responding
school
Mean*100
52.33
15.82
58.22
59.17
59.14
48.91
32.32
4.22
37.01
Note: ONS school weights applied. Adjusted Wald-test.
Difference
p
Observations
3.60
4.24
-13.93
-14.02
-14.38
-6.61
0.57
0.36
0.12
0.14
0.15
0.50
183
175
179
179
179
183
-2.22
-0.33
-5.08
0.16
0.47
0.38
179
183
183
-47-
Table 3.18: Differences in variances: approach (c), 2003
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
17.40
9.49
15.46
13.81
15.17
19.25
SD2
25.33
19.31
23.43
22.89
24.49
26.56
p
0.02
0.00
0.00
0.00
0.00
0.00
Responding
& nonresponding
school
SD
18.76
11.35
16.85
15.41
16.77
20.36
KS3_aps
KS4_mean
SD2
3.08
0.94
SD1
4.45
1.35
p
0.00
0.00
SD
3.30
1.01
NonResponding
Test of
responding
school
SD1=SD2
school
Note: results are unweighted.
With the smaller sample size in approach (c) we cannot reject the hypothesis that
mean values are the same for the populations from which the responding and nonresponding schools are drawn. But we can once again reject the hypothesis that the
SDs are the same.
Note that the size of the differences in means in Table 3.17 is much larger than in
Table 3.11 with approach (a), although they are still insignificant. The key thing to
note here is that the non-responding schools in approach (c) are all different schools to
those non-responding in approach (a). In approach (c), non-responding schools are all
second replacement schools. In approach (a) they are all initial schools. The values of
the means for non-responding schools in Table 3.17 are notably lower than those in
Table 3.11.
In analyses in all three approaches, (a)-(c), we have so far used the school as the unit
of analysis, on the grounds that we are investigating response by schools. However,
one might argue that what matters are the characteristics of the pupils in the
responding schools compared to those of the pupils in the non-responding schools.
This would imply using the pupil as the unit of analysis. In Table 3.19 we therefore
compare means for pupils in responding schools and with those for pupils in the
second replacement non-responding schools (approach (c)).
-48-
Table 3.19: Differences in means: approach (c) pupil as unit of analysis, 2003
Responding
school
Mean*100
Male
47.39
FSM
11.86
ELevel5fg
71.96
MLevel5fg
72.71
SLevel5fg
72.49
KS4_5ac
55.31
KS3_aps
KS4_mean
KS4_pts
Mean
34.21
4.33
42.63
Non-responding
school
Mean*100
53.68
16.93
65.27
69.47
67.30
55.80
Mean
33.78
4.40
42.74
Difference
P
Observations
6.29
5.07
-6.69
-3.24
-5.19
0.49
0.00
0.00
0.00
0.00
0.00
0.57
40,095
38,622
38,325
38,475
38,376
40,095
-0.43
0.07
0.11
0.00
0.07
0.78
38,018
40,095
40,095
Note: ONS school weights applied. Adjusted Wald-test. The clustering of the students within schools is
taken into account.
With the pupil as the unit of analysis there is a huge increase in sample size – from
about 180 in the case of the school to around 40,000. Not surprisingly, significant
differences are easier to find e.g. the five percentage point difference in FSM receipt
between pupils in responding and non-responding schools. Nevertheless, it is still the
case that for all three GCSE variables, we find no significant difference at the 5%
level between pupils in responding and non-responding schools.
Note that the significant difference in the means for the continuous KS3 variable,
ks3_aps, is the opposite of what one would expect from Figure 3.1. That graph
showed group (iii), respondents, to have a lower mean for this variable than group (ii),
all pupils in all sampled schools. But Table 3.19 shows the respondents to have a
higher mean than the non-respondents. The explanation is linked to the point made
above that the non-responding schools in approach (c) are different schools to those in
approach (a). All sampled pupils in all sampled schools, group (ii), include pupils in
schools not responding to the initial sample and in schools not approached at all.
Against that base, the pupils in responding schools have a lower mean score on
ks3_aps (Figure 3.1). But compared to the pupils in second replacement schools
which do not respond, their mean is higher. All this underlines the need for a separate
investigation of the phenomenon of replacement, which we have not undertaken.
-49-
3.2.2 Comparison of responding and non-responding pupils
The analysis of response bias at the student level includes 5,213 sampled pupils in 159
schools of which 3,766 responded (72.2%) and 1,447 did not (27.8%).12
The differences in means and in percentages in different categories are now clearly
significant (with the exception of the difference in the percentage male).13 The
differences in dispersion are also significant, although the lack of adjustment for
clustering when testing for differences in the standard deviations implies that the p
value of 0.02 for ks3_aps should be viewed as a sizeable underestimate – it is
plausible that there is in fact no significant difference at the 5% level. Respondents
display a significantly higher mean score and a significantly lower dispersion in
scores, at least in KS4 variables. These differences both contribute to results in large
differences between respondents and non-respondents in the percentages at or above
KS3 level 5 or with 5+ GCSEs A*-C, around 11 and 17 percentage points
respectively.
The higher mean and lower dispersion of respondents is particularly clear in the graph
of the distribution on KS4_mean scores – the right hand side of Figure 3.6.
Table 3.20: Differences in means and percentages (weighted), 2003
Responding pupil
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Obs.
3695
3475
3481
3481
3481
3695
mean*100
46.27
10.40
75.49
76.63
75.49
60.95
KS3_aps
KS4_mean
KS4_pts
3455
3695
3695
Mean
34.76
4.60
45.77
Non-responding
pupil
Obs. Mean*100 Difference
P
1396
46.32
0.05
0.98
1341
13.84
3.44
0.00
1328
64.0
-11.49
0.00
1332
63.53
-13.10
0.00
1331
64.11
-11.38
0.00
1396
44.19
-16.76
0.00
1317
1396
1396
Mean
32.85
3.91
37.48
-1.91
-0.69
-8.29
0.00
0.00
0.00
All pupils
Obs.
5091
4816
4809
4813
4812
5091
4772
5091
5091
Note: ONS student weights (STU_WT) are applied to the data. We use the adjusted Wald-test for
continuous variables (ks3_aps, ks4_mean) and allow for clustering by school. The χ2-tests for the
binary variables are based on the unweighted data and with no allowance for clustering but the means
displayed are the weighted values.
12
The sampled pupils are in 159 schools in the ONS file on all sampled pupils. But once they are
merged with the national file using student identifiers they are in as many as 175 schools. We presume
that the reason for this difference is that some of the pupils have changed schools after they sat the
PISA test, with the new school recorded in the national file. For the student analysis we use the school
identifiers of the pupil file.
13
Chapter 2 noted our concern with the ONS student weight for 2003 and so we also produced
unweighted results. Differences between weighted and unweighted results are only slight.
-50-
Table 3.21: Differences in variances (unweighted), 2003
Responding
& nonresponding
pupils
SD
6.43
1.72
19.8
NonTest of
Responding
responding
SD1=SD2
pupils
pupils
KS3_aps
KS4_mean
KS4_pts
SD1
6.28
1.61
18.6
SD2
6.63
1.92
21.5
p
0.02
0.00
0.00
Note: results are for unweighted data. Clustering of pupils within schools is not taken into account and
hence standard errors are understated.
Figure 3.6: Distributions of KS3 and KS4 scores for responding and nonresponding pupils, 2003
KS3 point (ks3_aps)
Mean GCSE point score (ks4_mean)
Not responding pupils
Responding pupils
Not responding pupils
Responding pupils
0
5
10
15
20 25 30 35 40
KS3 average point score
45
50
55
60
0
1
2
3
4
5
Mean GCSE point score
6
Note: data weighted by ONS student weight. Kernel density estimates.
3.3
Using OECD weights and our own response weights
In the two previous sections we applied the ONS weights. These take the sample
design into account but not the response levels of either schools or students or their
patterns of response (e.g. association with achievement levels).
In this section we investigate the impact of two sets of weights that in addition allow
for non-response. In both cases we are interested to see if non-response weights
remove any of the apparent response bias in mean or variance of KS3 and KS4 scores.
7
8
-51The first set of weights are the OECD weights described in Chapter 2. These adjust
for the levels of response by both schools and students and, in the case of the school
weights, for the average achievement levels (in GCSE terms) of responding schools.
The second set of weights are constructed by us and are intended to go further than
the OECD weights in the adjustment for response at the pupil level. These weights
take into account the relationship between the probability of pupil response and
achievement levels.
3.3.1 Calculation of our response weights
A standard approach for adjusting for non-response bias when one has additional
information on respondents and non-respondents is to estimate a statistical model of
the response probability and to use this to construct response weights. Estimates of
parameters of interest, e.g. mean PISA scores, standard deviations, proportions below
given thresholds etc, can then be calculated with and without the use of these nonresponse weights, thus assessing the degree of bias in data to which only design
weights have been applied.
a)
Modelling the response probability
We estimate the probability that a sampled pupil responds to PISA with a logistic
regression model. The dependent variable, Yi, takes the value 0 if a pupil did not
respond and 1 if a pupil responded. The probability that Yi=1 in the logit model is
specified as 1/[1+exp(-βXi)] where X is vector of explanatory variables. The model is
estimated by maximum likelihood.
We use two variants of the specification of the model. First, we estimate non-response
as a function of gender, free school meal eligibility, achievement (a KS4 measure),
private school attendance and region (Models 1a and 1b). Second, we exclude
regional variables but include a dummy variable for each school (Model 2). The
inclusion of school dummies takes into account the possibility that individual
response behaviour is determined in part by characteristics of the school that we are
unable to measure.
The only continuous variable in the models is the GCSE variable, ks4_pts, for which a
quadratic is included to pick up non-linearities. Conditional on the inclusion of the
ks4_pts variable, other measures of achievement, such KS3_aps measure or other
GCSE variables, were much less important or were insignificant and were not
included in the final specification.14
One disadvantage of including school dummies into the model is that we need to
exclude pupils in schools where either no sampled pupil in the school responded or all
sampled pupils responded. In practical terms, this means that we exclude 77 sampled
pupils in schools where all pupils responded and 49 sampled pupils in schools where
14
We also tried including the school mean value of the KS4 measure (calculated over all 15 year olds
and not just the sampled pupils) but it proved insignificant.
-52no pupil responded. The number of sampled pupils available is therefore reduced from
5,213 to 5,087.
Table 3.22 shows estimates of the β for both models. In Model 1a, the standard errors
are adjusted to take into account the clustering of pupils within schools while in
Model 1b the standard errors are unadjusted. Note that we do not present the
estimated coefficients for the school dummy variables in Model 2. These are
significant at least at the 5 % level in 43 out of 156 cases.
Table 3.22: Logistic regression models of pupil response, 2003
Male
FSM
Ks4_pts
Square of ks4_pts/100
Private school
Missing value for at
least one variable
North West England
East Midland
East of England
Constant
Observations
Pseudo R-squared
log-lklhd
F statistic
Model 1a
0.088
(0.075)
0.014
(0.109)
0.060
(0.006)***
-0.047
(0.007)***
0.187
(0.365)
Model 1b
0.088
(0.066)
0.014
(0.103)
0.060
(0.006)***
-0.047
(0.006)***
0.187
(0.160)
Model 2
0.138
(0.077)*
-0.074
(0.117)
0.073
(0.007)***
-0.056
(0.008)***
0.300
(0.687)
-0.656
-0.656
-0.866
(0.321)**
0.392
(0.227)*
0.412
(0.172)**
0.301
(0.214)
-0.721
(0.145)***
5213
(0.193)***
0.392
(0.116)***
0.412
(0.092)***
0.301
(0.092)***
-0.721
(0.124)***
5213
0.04
-2940.82
(0.313)***
-1.292
(0.407)***
5087
0.16
-2526.25
19.32
Note: estimates of standard errors in Model 1a take clustering within schools into account while
estimates in Model 1b do not. Model 2 excludes regional dummies but includes 156 school dummies
(results for school dummies not presented) and allows for clustering of the pupils within schools. A
missing dummy variable was included and set to 1 if values of any variable were missing. (Most of the
missing variables were missing for the same pupils). Data are unweighted.
Model 1 can only explain a small fraction of pupils’ response (pseudo r2 = 0.04). Not
surprisingly given results shown earlier in the report, the quadratic in ks4_pts
performs very well (with a t-value of 10 for the linear term, and almost 7 for the
quadratic term). Of the other variables, only the regional dummies are significant.
Hence, once we control for GCSE scores, the FSM and the private school dummies
are both insignificant, as is gender.
-53Results of Model 2 are very similar to those of Model 1. However, given the
significance of about a quarter of the school dummy variables, this model is better at
explaining the variation in response (pseudo r2 = 0.16).
Figure 3.7 shows the variation in the predicted probability of response with the total
GCSE points score, holding other variables constant at their mean values. We plot the
predicted probability for the range of GCSE point scores that spans the 5th to the 95th
percentiles in the sample. The turning point of the quadratic is close to the top of this
range, suggesting that there is little if any real down turn in response with high levels
of GCSE scores. (Note that while we showed earlier that the variance of scores was
lower for respondents than for non-respondents we did not investigate separately
response at either end of the score distribution.) The predicted probability of response
rises, ceteris paribus, from below 0.5 at the bottom of the GCSE score range to about
0.8 towards the top of the range.
Figure 3.7: Predicted probability of pupil response by GCSE point score, 2003
0.9
predicted probability of response
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
70
80
KS4_pts
Note: All other variables are set to their mean values. The graph displays predicted probabilities for
GCSE points scores between the 5th and 95th percentiles of the sample.
b)
Weighting for response
We use the results of the logistic regression models to calculate a response weight for
each respondent. This weight is simply the inverse of the predicted probability of
response. Hence a respondent with low GCSE scores has a higher weight than one
with a high GCSE score. Summary statistics are given in Table 3.23. (There is clearly
a case for trimming extreme values of the weights but we have not done this.)
Weight1 is derived from Model 1 and weight2 is derived from Model 2.
-54-
Table 3.23: Summary statistics of non-response weights derived from logistic
regression, 2003
Variable
weight1
weight2
Obs
3766
3703
Mean
1.383427
1.38499
Std. Dev.
.2367021
.5953457
Min
1.158141
1
Max
3.05661
19.99076
Note: weight1 gives the inverse of the response probability derived from Model 1 (Table 3.22);
weight2 refers to Model 2.
c)
Construction of a combined weight
Besides allowing for non-response, any weight needs to allow for the sample design.
We also wish to incorporate the adjustment for the level and pattern of response at the
school level that is incorporated in the OECD weights. This adjustment is through the
OECD school weight ‘scweight’ described in Chapter 2. Hence our alternative weight
to the OECD student weight for pupil j in school i is calculated as follows, where Mi
is the total number of 15 year old pupils in school i and mi is the number of sampled
pupils (see equation (3) in Chapter 2 section 2.2.1).
R_WEIGHT1ij = scweighti * Mi/mi * weight1ij
R_WEIGHT2ij = scweighti * Mi/mi * weight2ij
The weights R_WEIGHT1 and R_WEIGHT2 allow for the variation of the pupil
response probability with pupil achievement levels, something not present in the
OECD weight. This should be an important advance since much of any response bias
in achievement scores comes through pupil response rather than school response, as
demonstrated earlier (e.g. Figure 3.3). R_WEIGHT1 has the disadvantage compared
to the OECD student weight of not explicitly adjusting for the average level of pupil
response within the school, while in the case of R_WEIGHT2 this is picked up by the
school dummies in Model 2.
Table 3.24 compares summary statistics for the ONS student weight, the OECD
student weight and our two new weights. The final column gives the coefficient of
variation – the standard deviation divided by the mean. This shows that the OECD
weight has about twice the variation of the ONS weight. R_WEIGHT1 has in fact
somewhat less variation than the OECD weight, which we presume is due to it not
taking explicit account of the average level of response within each school. This
average level is taken into account in R_WEIGHT2, which displays the most variation
of any of the weights (this may well be affected by outlier values that we have not
trimmed).
-55-
Table 3.24: Summary statistics of different student weights, 2003
Variable
ONS student weight
STU_WT
OECD student weight
w_fstuwt
R_WEIGHT1
R_WEIGHT2
Obs
Mean
Std. Dev.
Min
Max
CV
3752
.0302
.0053
.0092
.0643
0.175
3766
154.83
51.08
61.46
579.69
0.330
3766
3703
150.12
148.51
40.20
68.39
15.25
54.21
534.49
2173.74
0.268
0.461
3.3.2 Results
Table 3.25 shows the result of applying the three sets of weights: ONS student, OECD
student and our own calculated weights. We also show results for unweighted data.
Standard
deviation
Mean
Table 3.25: KS3 and KS4 scores and Free School Meal receipt for responding
pupils, different student weights, 2003
Unweighted
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
Unweighted
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
Observations 1
Observations 2
Observations 3
KS3_aps
KS4_mean
KS4_pts
34.75
34.76
34.71
34.18
34.20
6.28
6.30
6.34
6.57
6.57
3,425
3,389
3,455
4.60
4.60
4.59
4.44
4.39
1.61
1.61
1.63
1.73
1.76
3,664
3,611
3,695
45.84
45.77
45.63
43.77
43.50
18.55
18.55
18.66
19.80
20.13
3,664
3,611
3,695
FSM
receipt %
10.50
10.40
10.56
11.44
11.56
3,444
3,410
3,475
Note: ‘Observation 1’ refers to calculations for R_WEIGHT 1. ‘Observations 2’ refers to calculations
for R_WEIGHT_2. ‘Observations 3’ refer to unweighted or weighted results using the ONS student
weight STU_WT or the OECD student weight w_fstuwt.
The results using the OECD student weight are very similar to those using the ONS
weight and those with no weights at all. There are very small reductions in the means,
very small increases in the standard deviation, and a very small increase in the
percentage in receipt of Free School Meals. We reproduce below the means for all
sampled pupils in responding schools, group (iv), that were given earlier in Table 3.3.
Comparing these figures with those in Table 3.25, the conclusion seems to be that the
OECD weights remove only a very small part of the response bias. For example, the
mean for responding pupils using the OECD weight is 4.9% higher than the mean for
-56all sampled pupils. This is not surprising since the OECD weights do not incorporate
any direct adjustment for the way that the pupil response probability varies with
achievement level.
KS3_aps
KS4_mean
KS4_pts
FSM %
34.23
4.41
43.50
11.36
By contrast, our alternative weights, R_WEIGHT1 and R_WEIGHT2, do include an
adjustment of this type. Table 3.25 shows that their application does bring the mean
KS3 and KS4 scores for responding pupils very close to the means for all sampled
pupils. The same is true for the Free Schools Meals rate. And the use of our weight
pushes up the standard deviations to around their levels for all sampled pupils shown
in Figures 3.1 to 3.5 (group (iv) values). (In fact they push them slightly beyond.)15
3.4
Response bias in PISA scores
We now estimate the impact of response bias on PISA achievement scores in 2003
using three different approaches explained below: (a) use of z-scores, (b) regression
adjustment and (c) the use of response weights. Our focus is on response-bias at the
pupil level. However, when we use the response weights we are also taking account of
response bias as the school level (through the adjustment of the OECD school
weight). We focus mainly on bias in the mean but include some results for the
standard deviation.
The mean PISA scores in reading, maths and science for the final sample of
responding pupils in responding schools in England in 2003 are given in Table 3.26.
Note these are results obtained by applying the ONS student weights, and not the
OECD student weights, although the results obtained with the latter are very similar
(see Table 3.30 below).
Table 3.26: PISA means and standard deviations (ONS weights), 2003
PISA reading
Mean
SD
507.2
93.7
PISA maths
Mean
SD
507.7
92.5
PISA science
Mean
SD
520.1
102.2
Note: mean scores are calculated with the average of the five PVs. The SD is the average of the SDs
calculated separately for each PV. The calculations are based on the final PISA sample of respondents
but are weighted with the ONS student weights.
How far might these scores have changed if there had not been any response bias at
the pupil level?
15
The appropriate comparison is arguably slightly different to the one we have described since the
OECD school weight that forms part of our own weight variables adjusts for non-response at the school
level. However, the results as described are very unlikely to mislead.
-57-
3.4.1 Z-score method
One method of addressing the issue is to transform differences in means in KS3 or
KS4 scores between respondents and non-respondents into z-scores (i.e. SD units) and
translate these into PISA scores by multiplying by the values of the PISA SDs taken
from Table 3.26 (i.e. with ONS weights applied). The results of this procedure are
shown in Table 3.27.
Table 3.27: Calculation of bias in the PISA means: z-score method, 2003
7.
6.
5.
3.
2.
1.
Multiplied Multiplied
4.
Multiplied
SD
Mean
Mean
with SD
with SD
(Mrwith SD
responding sampled sampled
PISA
PISA
Ms)/SD
PISA read
pupils=Mr pupils=Ms pupils
science
maths
KS3_aps
34.75
34.23
6.43
0.081
7.59
7.49
8.28
KS4_mean
4.60
4.42
1.72
0.105
9.84
9.71
10.73
Note: the KS3 and GCSE mean scores are calculated on unweighted data, but weighting with ONS
weights does not make a big difference.
Cols. 1 and 2 give mean pupil scores for summary KS3 and KS4 variables among
responding students and all sampled students in responding schools (i.e. respondents
and non-respondents taken together). Col 3. gives the SD of the sampled students. Col
4 gives the difference in means between respondents and all students, standardised by
the all sampled student SD. In other words, this is a z-score and shows that responding
students had a mean that was 0.081 (KS3_aps) and 0.105 (KS4_mean) SDs above the
mean for all sampled students. Cols. 5-7 multiplies these values by the values of the
PISA SDs for reading, maths and science respectively shown in Table 3.26 to estimate
the KS3 and KS4 points differences in terms of a PISA points difference.
The mean difference between responding pupils and sampled pupils in the KS3
variable translates into a difference of about 7 or 8 points. The difference in means
for the KS4 variable translates into ‘PISA differences’ that are slightly larger of about
10 points.
Taking the adjustment based on the KS3 results, mean England PISA scores in 2003
allowing for student non-response can be estimated to be 500 instead of 507 for
reading, 500 instead of 507 for maths and 512 instead of 520 for science.
3.4.2 Regression method
The disadvantage of the z-score method is that it makes a strong assumption about the
relationship between PISA scores and domestic achievement scores, namely that they
are highly correlated. Imagine that there were in fact no association between the two.
In this case response bias in KS3 or KS4 scores would have no implication for PISA
-58scores. This motivates our second approach which is to regress PISA scores on KS3
(or KS4) scores for respondents and use the estimated coefficient of the KS3 (or KS4)
variable as the PISA equivalent of a KS3 (or KS4) point. This equivalent can then be
applied to the difference in the means for responding and sampled pupils.
Regression results are presented in Table 3.28. (Chapter 5 describes these regressions
in more detail; a simple linear model seems a reasonable fit to the data.) The
domestic achievement scores turn out to explain between 60 and 70 percent of the
variation in PISA scores. The correlation between, for example, PISA maths score and
KS3_aps is 0.83; the correlation with KS4_mean is 0.76. Applying the KS3_aps
coefficients to the differences in means in KS3_aps of respondents and sampled
pupils (shown in Table 3.27), yields an estimate of bias in the PISA means of about 6
points, which rises to about 8 points when the KS4 regression is used instead (see
Table 3.29). These figures are somewhat smaller than the figures obtained with the zscore method.
Table 3.28: Regression of PISA scores on KS3 and KS4 measures, respondents,
2003
Regression of PISA scores on a KS3 measure
read
maths
science
ks3_aps
11.027
11.510
12.265
(0.184)
(0.181)
(0.196)
Constant
120.055
103.688
89.942
(6.805)
(6.471)
(7.093)
Obs.
3435
3435
3435
R2
0.65
0.68
0.66
Regression of PISA scores on a KS4 measure
read
maths
science
Ks4_mean
42.543
42.049
45.545
(0.887)
(0.930)
(0.999)
Constant
311.119
313.838
310.153
(4.842)
(4.940)
(5.352)
Obs.
3672
3672
3672
R2
0.61
0.58
0.58
Note: data weighted with ONS student weight STU_WT. Standard errors in brackets. Clustering of
pupils within schools is taken into account in estimation of standard errors.
Table 3.29: Calculation of bias in PISA means: regression method, 2003
KS3_aps
Ks4_mean
reading
5.84
7.66
maths
6.10
7.57
science
6.50
8.20
3.4.3 Use of response weights
The third method to estimate the extent of bias is to simply calculate mean scores with
and without use of response weights. The weights we use are those applied in Section
3.3 – the OECD student weights and our own weights based on the logistic regression
modelling of student response. We compare results for unweighted scores, results
with the ONS student weight (STU_WT), results with the OECD weights, and results
with our weights R_WEIGHT1 and R_WEIGHT2. The use of either the OECD
weights or our own weights also incorporates adjustment for non-response by schools.
The OECD weights adjust only for the level of pupil response within schools while
-59our weights allow in addition for the variation of pupil response with domestic
achievement as measured by GCSE scores.
Application of the OECD weights slightly reduces the means and slightly increases
the standard deviations. Application of our weights has a larger impact in both cases.
We can estimate response bias in mean scores by comparing the results using the
ONS weights with the results using our weights. Table 3.31 shows this calculation for
both the mean and the standard deviation, using R_WEIGHT1. The upward bias in
PISA scores in the sample of respondents is estimated as about 6 or 7 points in the
mean. The downward bias in the standard deviation is estimated at about 3 or 4 points.
Table 3.30: PISA mean scores for responding pupils, different student weights,
2003
Standard
deviation
Mean
Weight
Unweighted
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
Unweighted
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
PISA maths
PISA reading
508.20
507.68
506.95
501.47
501.40
92.72
92.48
93.14
95.36
95.32
507.51
507.15
506.31
500.63
499.69
93.74
93.65
94.40
97.23
97.27
PISA
science
520.55
520.07
519.01
513.16
512.30
102.42
102.23
103.10
105.26
105.38
Obs.
3766
3752
3766
3721
3664
3766
3752
3766
3721
3664
Note: the standard deviation is calculated as the mean of the five standard deviations of each plausible
value. The mean value refers to the pupils’ mean across all five plausible values.
Table 3.31: Calculation of bias in PISA means and standard deviations: response
weights method, 2003
Mean
Standard Deviation
reading
+6.5
-3.6
maths
+6.2
-2.9
science
+6.9
-3.0
Note: the table shows differences between means and standard deviations in Table 3.35 calculated with
the ONS student weight and R_WEIGHT1.
Table 3.32 summarises our results, showing the estimated upwards bias in mean PISA
scores resulting from pupil non-response.
Table 3.32: Summary of estimate of bias in PISA means: different methods, 2003
z-score method
regression method
response weights
Ks3_aps Ks4_mean Ks3_aps Ks4_mean R_WEIGHT1 R_WEIGHT2
Reading
7.6
9.8
5.8
7.7
6.5
7.5
Maths
7.5
9.7
6.1
7.6
6.2
6.3
Science
8.3
10.7
6.5
8.2
6.9
7.8
-60-
Note: estimates taken from Tables 3.27, 3.29 and 3.31.
The regression method and response weight method give similar results and both of
these methods are superior in principle to the z-score method. Based on these
methods, we conclude that mean PISA scores among respondents were upwardly
biased in 2003 by around 6 to 8 points due to pupil non-response.
3.4.4 The OECD decision to exclude the UK
We have not found a clear statement by the OECD of what is considered a sufficiently
large response bias in the estimate of the population mean or any other parameter to
justify the exclusion of a country from the international reports. Commenting on the
thresholds set for acceptable levels of response (e.g. 80 percent for pupils), the report
on PISA 2003 noted:
‘In the case of countries meeting these standards, it was likely that any bias
resulting from non-response would be negligible, i.e. smaller than the
sampling error’ (OECD 2004: 325).
If we measure ‘sampling error’ by one standard error, this statement suggests a
criterion for inclusion that the estimated bias in the mean (or any other parameter)
should be smaller than the estimated standard error of the mean.
There are no widely accepted norms for the relative sizes of bias and sampling error.
Both contribute to ‘total survey error’, which combines sampling and non-sampling
errors. Minimising total survey error of key estimates certainly is an accepted aim of
survey design (e.g. Biemer and Lyberg 2003). Total survey error in the estimate of a
parameter is conventionally measured by mean square error (MSE), which is defined
as:
mean square error = (bias)2 + (standard error)2
Biases can arise for reasons other than non-response but we restrict attention to the
response bias that we have been able to estimate. The formula for MSE implies that as
bias rises above the standard error, the value of MSE will quickly become dominated
by the bias rather than by sampling error. Where the bias is less than the standard
error, MSE will be dominated by the effect of sampling variation.
Another way of looking at the issue is to consider the 95 percent confidence interval
of the mean, equal to its estimated value plus or minus 1.96 standard errors. Imagine
the case where bias equals one standard error. This implies that the correct confidence
interval would be centred on a value that is shifted sideways by this amount.
However, it would still be the case that the confidence interval estimated with no
allowance for bias would include most of the values within the interval estimated
allowing for the bias. With the bias more than twice the standard error, this would no
longer be the case.
-61Our estimates of the bias in the mean of about 6 to 8 points are considerably larger
than the estimated standard error: the standard errors of the mean scores for England
in 2003 were estimated by the OECD as 2.9 (reading), 2.9 (maths) and 3.0 (science).16
In this sense the biases could be classified as large.
Our estimated biases in the standard deviation, of the order of 3 PISA score points,
also exceed the OECD estimates of the standard error for England in 2003, which are
1.6 in maths and reading and 1.7 in science.
Note that ‘bias’ in the formula for MSE above is the expected value calculated from
repeated surveying rather than a single estimate such as we have obtained. We have a
single estimate of the bias and one has to think of there being a standard error around
this estimate. (If we estimate bias by the difference between two estimated means
obtained by applying two different sets of weights to the data, we have to bear in mind
that there is sampling error for both estimated means.) Being cautious, one might
argue that unless one has a good estimate of this standard error, one cannot conclude
on the extent of bias.
It is possible that this could be the reasoning behind the OECD’s statement in the
PISA 2003 report that, when faced with the evidence we had produced in our earlier
unpublished analysis of response bias in 2003, ‘the PISA Consortium concluded that
it was not possible to reliably assess the magnitude, or even the direction, of this nonresponse bias’ (OECD 2004:328). However, such a position would be overly cautious.
We have conducted statistical tests for the significance of differences in achievement
between pupils who respond and those who do not. Or we can calculate a 99 percent
confidence interval around the mean of, say, KS4_pts for group (v), the respondents,
in Figure 3.1. This interval does not include the value for group (i), the population.17
Another (and perhaps more likely) possibility for the reasoning behind the OECD’s
statement lies in the fact that the work we did in 2004, on which the decision to
exclude the UK was taken, did not have the benefit of access to respondents’ PISA
scores. We could show the bias in mean or variance of KS3 and KS4 scores but we
could not estimate the bias in mean or variance of PISA scores (other than through the
z-score method). Taking a very cautious position, the PISA Consortium could have
worried that PISA scores and English domestic achievement scores were only very
weakly associated. We can now see that any such concern was unfounded. PISA
scores and KS3 or KS4 scores are strongly correlated.
We conclude therefore that it is possible to assess reliably both the direction and
magnitude of response bias in 2003. The PISA Consortium’s statement on the subject
in OECD (2004) is at best seen as a cautious assessment based on the evidence
available at the time. The evidence we are now able to produce allows much firmer
conclusions to be drawn.18
16
The estimated standard errors of means and standard deviations come from an Excel file of results
for England in 2003 available from the OECD PISA website.
17
The interval is from 43.940 points to 47.372 points. The standard error is estimated with allowance
for clustering of pupils within schools but not stratification hence it is likely to be an overestimate.
18
At worst, the wording of the statement could appear illogical: a conclusion is made of potential
response bias but the direction of the bias is said to be unknown.
-62As noted earlier, we estimate bias in the estimated means and standard deviations in
England to be larger than the estimated standard error. Viewed in this way, the
estimated bias is quite large. This is not uncommon in large surveys: the larger the
survey sample the smaller the standard error and hence bias comes to dominate mean
squared error. However, does the bias have a substantial impact on the picture shown
by PISA of differences in learning achievement between countries? We calculated
how many places in a ‘league table’ of rankings of countries by their mean scores
would England be shifted if the means for reading, maths and science were adjusted
downwards by the estimated bias of 6 to 8 points?19 The answer is about one place in
each case. Viewed in this way, the effect of the bias in the mean could be classified as
small.
The pupil response rate in England in 2003 of 77 percent was not far short of the
minimum threshold deemed satisfactory of 80 percent. The extent of the bias in
England when the threshold was nearly attained raises the issue of the extent in other
countries with rates not far above the threshold. Australia, Austria, Canada, Ireland,
Poland and the US all had pupil response rates of between 82 and 84 percent (OECD
2004: 327).
Finally, it is worth emphasising that there was a third option available in the case of
the UK besides the two of inclusion or exclusion of the results as they stood. The
OECD could have allowed the use of ‘post-stratification’ weights to adjust the data
for the response bias, which is a common practice in various types of survey. (Our
weights based on the logistic regression modelling described above are one example
of such a weighting system.) The policy of restricting the calculation of weights to use
of information employed for forming survey strata rules out this option.
19
This calculation ignores possible response biases in other countries’ means scores.
- 63 -
4
Bias analysis – 2000 data
The structure follows that of Chapter 3. We do not repeat all the explanations of each
bit of the analysis – we focus on describing the results.
Chapter 2 described how the national data set used for 2000 is not directly comparable
to that for 2003 since it includes pupils with special educational needs in normal
schools and excludes pupils with no GCSE entries. The former are excluded from the
2003 data and the latter are included. The 2000 data relate also to persons born in a 12
month period, rather than 14 months as in 2003. The results presented in this chapter
are not directly comparable to those for 2003 in Chapter 3.
4.1
Analysis at five different stages of sampling and
response
Table 4.1 gives mean values for each of stage (i) to (v), from the population through
to the final PISA sample. Detailed results for each stage are presented in Appendix 2.
Figures 4.1 to 4.4 graph the changes for KS3_aps, KS4_mean and KS4_pts. Tables
4.2 and 4.3 show the changes in the distributions of scores grouped into five intervals.
As for 2003, differences between groups (ii) and (iii) do not give a clean estimate of
the effect of school response on account of our definition of group (ii) as including
pupils in all sampled schools irrespective of whether they are approached or not.
The pattern as one moves from (i) to (v) is fairly similar to that for 2003. The means
of the continuous variables tend to increase, although note they all fall between (i) and
(ii). Mean ks3_apps in (v) is 2.4% higher than in (i) and mean ks4_pts is 8.6% higher.
This compares with figures of 1.7% and 6.8% respectively for 2003. In terms of units
of the group (i) standard deviations, mean ks3_aps rises by 0.12 units and mean
ks4_pts by 0.18 units.
The percentage of students at or above KS3 level 5 or with 5+ GCSEs A*-C is again
notably higher in (v). Indeed the differences are somewhat larger than in 2003, e.g. for
ks4_5ac it is 7.0 percent points compared to 5.2 points in 2003: 59 percent in group
(v) and 52 percent in group (i). Finally, in contrast to 2003, it seems that there are
some notable changes between stages (i) and (iv) rather than the changes being
concentrated between stages (iv) and (v) as a result of pupil response. The graphs
show the mean achievement scores rising as much through school response (and
possibly replacement) – stages (ii) to (iii) – as through pupil response.
As in 2003, the SDs fall at each successive stage for the three achievement variables
in Figures 4.1 to 4.3, the fall coming especially in the final stage of pupil response.
As a result of both an upward bias in the mean and a downward bias in the SD, the
percentage of pupils in the final sample below the bottom decile of KS4_mean in the
population is seriously under-represented, as in 2003 (Table 4.3).
- 64 Following our analysis of the 2003 data, we focus in the rest of this chapter on
response, by schools and pupils, and do not investigate further the effect of sampling.
We merely note that some clear changes can be seen in Figures 4.1 to 4.3 between
stages (i) and (ii) and, especially, between stages (iii) and (iv) as the result of student
sampling.
Table 4.1: Mean values of achievement variables at five stages, 2000
i
All pupils in
English
schools
ii
all pupils in all
sampled
schools
iii
All pupils in
all responding
schools
iv
All sampled
pupils in
responding
schools
v
Responding
pupils
Variable
Male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
Mean*100
50.347
68.123
63.995
59.151
69.346
65.593
60.609
52.100
Mean*100
50.156
68.104
63.455
58.702
69.305
65.027
60.055
52.374
Mean*100
49.512
70.371
66.155
61.910
71.657
67.619
63.391
54.595
Mean*100
49.093
71.662
66.725
63.020
72.887
68.441
64.514
56.442
Mean*100
49.903
73.729
69.075
65.187
74.854
70.731
66.55
59.186
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
Mean
5.463
5.554
5.297
32.960
41.096
36.017
4.417
4.884
Mean
5.452
5.524
5.276
32.797
41.156
36.018
4.411
4.892
Mean
5.529
5.611
5.359
33.294
42.424
36.950
4.511
5.073
Mean
5.558
5.641
5.387
33.458
43.239
37.626
4.561
5.219
Mean
5.604
5.705
5.436
33.766
44.62
38.790
4.686
5.452
Note: data behind the results in columns (ii) and (iii) are weighted by ONS school weight (sch_bwt)
and for columns (iv) and (v) by ONS student weight (stu_wt).
- 65 -
34
7.1
33.8
6.9
33.6
6.7
33.4
6.5
33.2
6.3
33
6.1
32.8
5.9
32.6
5.7
average score
32.4
KS3 standard deviation
KS3 average score (KS3_aps)
Figure 4.1: Mean and standard deviation of average KS3 score (ks3_aps), 2000
5.5
standard deviation
32.2
5.3
i
ii
iii
iv
v
Group of pupils
4.75
1.85
4.7
1.8
4.65
1.75
4.6
1.7
4.55
1.65
4.5
1.6
4.45
1.55
4.4
1.5
4.35
1.45
average score
4.3
standard deviation
4.25
1.4
1.35
i
ii
iii
Group of pupils
iv
v
Mean GCSE score standard deviation
Mean GCSE point score (KS4_mean)
Figure 4.2: Mean and standard deviation of GCSE point score per subject
(ks4_mean), 2000
- 66 -
Figure 4.3: Mean and standard deviation of total GCSE point score (ks4_pts),
2000
45
21
20.5
44
20
43.5
19.5
43
19
42.5
18.5
42
18
41.5
17.5
41
17
40.5
16.5
40
16
average score
39.5
standard deviation
39
Total GCSE score standard deviation
Mean of total GCSE point score
(KS4_pts)
44.5
15.5
15
i
ii
iii
iv
v
Group of pupils
Table 4.2: Percent of pupils in Key Stage 3 (ks3_aps) percentile range of the
population, 2000
Percentile
Range
Point
score
<=10
>10 – 25
>25-75
>75-90
>90
2000
25
>25-29
>29-37
>37-41
>41
i
All pupils
in all
schools
ii
All pupils in
all sampled
schools
iii
All pupils in
responding
schools
15.12
18.19
42.87
15.19
8.63
15.43
18.31
43.69
14.68
7.89
13.56
17.31
44.20
15.78
9.15
iv
All sampled
pupils in
responding
schools
12.23
18.01
44.69
15.99
9.07
v
Responding
pupils
10.36
17.52
46.15
16.77
9.20
Note: Results for (ii) and (iii) are based on data weighted with the ONS school weight (sch_bwt) and
results for (iv) and (v) are for data weighted with the ONS pupil weight (stu_wt).
- 67 -
Table 4.3: Percent of pupils in GCSE points per subject (ks4_mean) percentile
range of the population, 2000
Percentile
Range
Point score
<=10
>10 – 25
>25-75
>75-90
>90
2000
2.11
>2.11-3.24
>3.24-5.67
>5.67-6.62
>6.62
i
All pupils
in all
schools
ii
All pupils
in all
sampled
schools
iii
All pupils
in
responding
schools
10.00
14.98
50.26
14.72
10.04
10.11
14.86
50.53
14.57
9.92
8.91
14.09
51.13
14.88
10.99
iv
All
sampled
pupils in
responding
schools
7.81
13.79
51.71
15.32
11.39
v
responding
pupils
5.18
13.29
53.88
16.26
11.39
Note: Results for (ii) and (iii) are based on data weighted with the ONS school weight (sch_bwt) and
results for (iv) and (v) are for data weighted with the ONS pupil weight (stu_wt).
- 68 -
Figure 4.4: Kernel density plot of ks4_mean by different groups of 15 year olds in 2000
A ll 15 y ear olds
P upils in all s am pled P IS A s c hools
P upils in s am pled s c hools
S am pled pupils
Res ponding pupils
0
1
2
3
4
K s 4_m ean
Note: unweighted
5
6
7
8
- 69 -
4.2
Response bias at school and pupil levels
We follow the same pattern of analysis as in Chapter 3. The comments made there on the
estimation of standard errors in the presence of clustering and stratification apply again here.
4.2.1 Comparison of responding and non-responding schools
The unit of analysis is now the school and for each school we compute school values of all the
variables by taking their averages across the school’s 15 year old students. A total of 543
schools were sampled for PISA: 181 ‘initial’ schools that were all approached, 181 first
replacement schools that were approached in case an initial school did not respond, and 181
second replacement schools that were approached if both an initial school and its first
replacement refused to respond.
Of the 543 schools, a total of 306 were approached at some stage; all the 181 initial schools, 79
of the first replacement schools and 46 of the second replacements. Table 4.4 shows how they
responded.
Table 4.4: School response (numbers of schools), 2000
participation
status
did not
participate
full
participation
partial
participation
Total
main
school
first
second
replacement replacement
Total
71
46
34
151
102
32
11
145
8
1
1
10
181
79
46
306
Schools that participated but achieved a within response below 25 percent are treated as nonresponding schools in the analysis.
As in the 2003 analysis, we investigate response bias at the school level in three different ways:
a) Comparison of responding with non-responding schools in the initial sample
(without taking first and second replacement into account). We compare the school
means of characteristics of pupils in 110 responding schools with the school means
in 71 non-responding schools.
b) Comparison of all schools that responded with all schools that did not respond,
irrespective of stage. We compare the school means of characteristics of pupils in
155 responding schools with the school means in 151 non-responding schools.
c) Comparison of all schools that responded with second replacement schools that did
not respond. We compare 155 responding schools with 34 non-responding schools.
- 70 For the 2000 data we cannot describe characteristics of pupils in schools with pupil response
rates below 25 percent since there is no separate identifier in the data set for such schools (we
assume these schools are coded as non-respondents).
a)
All schools in the initial school sample
Table 4.5: School type by response (number of schools): approach (a), 2000
School
LEA maintained school
Independent school
Total
Did not respond
64
7
71
responded
102
7
109
Total
166
14
180
Note: for one responding school no information on school type is available.
Table 4.6: Differences in means: approach (a), 2000
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Mean*100
49.79
14.76
69.97
66.23
59.34
54.51
Nonresponding
school
Mean*100
44.34
17.30
67.82
62.58
58.55
57.10
KS3_aps
KS4_mean
Mean
32.82
4.54
Mean
32.62
4.66
Responding
school
Difference
p
Observations
-5.45
2.54
-2.15
-3.65
-0.79
2.59
0.31
0.45
0.56
0.33
0.86
0.63
180
181
176
177
178
180
-0.2
0.12
0.79
0.64
178
180
Note: weighted by ONS school weight, sch_bwt. p gives the significance level at which we can just reject the null
hypothesis that the values are the same between responding and non-responding schools (adjusted Wald-test).
As in 2003 we cannot reject the hypothesis that mean values in the two groups show no
significant differences. And unlike in 2003 there are no significant differences in SDs between
responding and non-responding schools at the 5% level, although in several cases we can reject
the null at the 10% level.
- 71 -
Table 4.7: Differences in variances: approach (a): 2000
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
19.41
14.26
16.08
16.00
17.34
19.44
SD2
22.42
19.09
17.16
17.51
21.18
23.41
p
0.19
0.01
0.55
0.41
0.07
0.09
Responding
& nonresponding
school
SD
20.65
16.32
16.50
16.64
18.95
21.03
KS3_aps
KS4_mean
SD1
2.93
0.89
SD2
3.55
1.05
p
0.08
0.13
SD
3.19
0.95
NonResponding
Test of
responding
school
SD1=SD2
school
Note: results are for unweighted data.
Figure 4.5: Distributions of KS3 and KS4 scores: approach (a), 2000
KS3 point score (ks3_aps)
Mean GCSE point score (ks4_mean)
Not responding school
Responding schools
Not responding school
Responding schools
0
5
10
15
20 25 30 35 40
Mean GCSE point score
45
50
55
60
0
1
2
3
4
5
Mean GCSE point score
Note: unweighted data. Kernel density estimates.
b)
All responding and non-responding schools (initial, first, second replacement)
Table 4.8: School type by response (number of schools): approach (b), 2000
LEA maintained school
Independent school
Total
Not responding
135
15
150
Responding
141
12
153
Total
276
27
303
Note: for one non-responding and two responding school no information are available on school type.
6
7
8
- 72 -
Table 4.9: Differences in means: approach (b), 2000
Responding
school
Mean*100
Male
48.29
FSM
13.45
ELevel5fg
70.67
MLevel5fg
67.58
SLevel5fg
61.98
KS4_5ac
57.93
KS3_aps
KS4_mean
Mean
33.23
4.70
Non-responding
school
Mean*100
41.96
16.44
70.13
64.68
60.46
56.83
Difference
p
Observations
-6.33
2.99
-0.54
-2.9
-1.52
-1.1
0.12
0.19
0.86
0.37
0.68
0.78
303
306
295
296
297
303
-0.37
-0.08
0.58
0.71
297
303
Mean
32.86
4.62
Note: weighted by ONS school weight, sch_bwt.
Table 4.10: Differences in variances: approach (b), 2000
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
19.55
14.14
16.57
16.60
18.11
20.00
SD2
22.25
17.05
16.59
17.77
21.18
22.51
p
0.11
0.02
0.98
0.41
0.06
0.15
Responding
& nonresponding
school
SD
20.96
15.71
16.60
17.28
19.77
21.27
KS3_aps
KS4_mean
SD1
3.27
0.90
SD2
3.37
1.00
0.70
0.22
SD
3.33
0.95
NonResponding
Test of
responding
school
SD1=SD2
school
Note: results are unweighted.
The results show little change from analysis (a), both as regards mean and variance. There are
no significant difference in means or standard deviations at the 5% level for achievement
variables.
- 73 c)
All responding schools and non-responding second replacement schools
Table 4.11: School type by response (number of schools): approach (c), 2000
School
LEA maintained
schools
Independent school
Total
Non-responding
31
responding
141
Total
172
3
34
12
153
15
187
Note: for two responding schools school type information is missing.
Table 4.12: Differences in means: approach (c), 2000
Responding
school
Mean*100
Male
48.29
FSM
13.46
ELevel5fg
70.67
MLevel5fg
67.58
SLevel5fg
61.98
KS4_5ac
57.94
KS3_aps
KS4_mean
Non-responding
school
Mean*100
42.67
17.06
69.77
64.19
59.19
53.57
Mean
33.23
4.70
Mean
32.90
4.51
Difference
p
Observations
-5.62
3.6
-0.9
-3.39
-2.79
-4.37
0.40
0.29
0.86
0.57
0.69
0.52
187
189
181
182
182
187
-0.33
-0.19
0.80
0.55
182
187
Note: weighted by ONS school weight, sch_bwt. p gives the significance level at which we can just reject the null
hypothesis that the values are the same between responding and non-responding schools (adjusted Wald-test).
Table 4.13: Differences in variances: approach (c), 2000
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
SD1
19.55
14.14
16.57
16.60
18.11
20.00
SD2
22.03
13.92
14.78
16.89
21.64
21.21
p
0.38
0.96
0.41
0.90
0.20
0.67
Responding
& nonresponding
school
SD
19.98
14.18
16.32
16.77
18.91
20.26
KS3_aps
KS4_mean
SD1
3.27
0.90
SD2
3.24
1.01
0.95
0.43
3.28
0.93
NonResponding
Test of
responding
school
SD1=SD2
school
Note: results are for unweighted data.
- 74 -
With approach (c), we cannot reject the null hypothesis of no difference in mean values at even
the 10% level. The same applies to tests of the standard deviation. In contrast to 2003, the
differences in means between respondents and non-respondents in approach (c) seem not too
dissimilar to those in approach (a).
Finally, as in 2003, we use the pupil as the unit of analysis and test for differences in means
between pupils in responding and non-responding schools, using approach (c) – see Table 4.14.
Here we do find significant differences: pupils in responding schools have higher mean
achievement. For example, the percentage of pupils with 5 or more GCSEs at A*-C is five
percentage points higher in responding schools and mean KS3_aps for these pupils is five
percent higher. In general, the differences are more marked than in 2003 and, also in contrast to
2003, they are always in one direction: pupils in responding schools have higher mean
achievement.
Table 4.14: Differences in means: approach (c) pupil as unit of analysis, 2000
Responding
school
Mean*100
Male
49.51
ELevel5fg
71.66
MLevel5fg
67.62
SLevel5fg
63.39
KS4_5ac
54.60
KS3_aps
KS4_mean
KS4_pts
Mean
33.29
4.51
42.42
Non-responding
school
Mean*100
47.48
64.67
59.17
53.39
49.58
Mean
31.68
4.32
39.94
Difference
p
Observations
-2.03
-6.99
-8.45
-10.00
-5.02
0.01
0.00
0.00
0.00
0.00
33,008
31,253
31,253
31,217
33,008
-1.61
-0.19
-2.48
0.00
0.00
0.00
31,051
33,008
33,008
Note: ONS school weights are applied to the data. Adjusted Wald-test. The clustering of students within schools is
taken into account.
4.2.2 Comparison of responding and non-responding pupils
The analysis of response bias at the student level includes 5,164 sampled PISA pupils in 153
schools of which 4,120 responded (79.8%) and 1,044 did not (20.2%).
For all achievement variables there are significant differences at the 1% level or lower in both
means and standard deviations. Respondents display a higher mean score and a lower
dispersion in scores. These differences both contribute to produce large differences between
respondents and non-respondents in the percentages at or above KS3 level 5 or with 5+ GCSEs
A*-C, which are about 11 and 14 percentage points respectively (Table 4.15).
The higher mean and lower dispersion of respondents is particularly clear in Figure 4.6 which
shows the distributions of KS3_aps and KS4_mean scores.
- 75 -
Table 4.15: Differences in means and percentages, 2000
Non-responding
Difference
pupil
Obs.
Mean*100
936
45.65
-4.25
878
64.64
-10.21
875
58.80
-11.93
878
55.98
-10.57
936
44.78
-14.41
Responding pupil
Male
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Obs.
3976
3681
3686
3682
3976
Mean*100
49.90
74.85
70.73
66.55
59.19
KS3_aps
KS4_mean
3660
3976
Mean
33.77
4.69
Mean
32.16
4.03
866
936
-1.61
-0.66
P
Observations
0.02
0.00
0.00
0.00
0.00
4912
4559
4561
4560
4912
0.00
0.00
4526
4912
Note: the ONS student weights are applied and standard errors allow for clustering by PSUs (PISA school). The
χ2-test is used for tests for binary variables (no weights and PSU taken into account) and the adjusted Wald test for
continuous variables.
Table 4.16: Differences in variances, 2000
Responding
pupils
KS3_aps
KS4_mean
Mean
33.77
4.68
Non-responding
pupils
SD1
6.06
1.56
Mean
32.17
4.03
SD2
6.78
1.97
Responding &
non-responding
pupils
Mean
SD
33.46
6.24
4.56
1.66
Test of
SD1=SD2
P
0.00
0.00
Note: results are unweighted. Clustering of pupils within schools is not taken into account in the statistical test.
Figure 4.6: Distributions of KS3 average points and mean GCSE point score, 2000
KS3 points (ks3_aps)
Mean GCSE point score (ks4_mean)
Not responding pupils
Responding pupils
0
5
10
15
20 25 30 35 40
KS3 average point score
Not responding pupils
Responding pupils
45
50
55
60
Note: data are unweighted. Kernel density estimates.
0
1
2
3
4
5
Mean GCSE point score
6
7
8
- 76 -
4.3
Using the OECD weights and our own response weights
We compare use of the OECD weights and weights we calculate based on results of logistic
regression modelling of the probability of pupil response. The OECD weights take into account
the level and pattern (i.e. association with achievement) of school response and the level of
pupil response. Our weights in addition take into account the variation of pupil response with
achievement level.
4.3.1 Calculation of our non-response weights
a) Modelling the response probability
Table 4.187shows the result of logistic regression models for all sampled pupils of the
probability of pupil response. Models 1a and b specify response as a function of gender,
achievement score (a quadratic in KS4_pts), region and school size. Model 1a differs from
Model 1b only in the estimation of the standard error. Standard errors are adjusted to take into
account the clustering of pupils within schools in Model 1a while unadjusted standard errors
are given in Model 1b. Model 2 omits the regional variable but includes a set of dummy
variables for (results not reported) and a dummy variable on private schools (this variable was
not significant in Models 1a and 1b). Model 2 is based on a smaller sample size since 69 pupils
in schools where all pupils responded and 5 pupils in schools where no pupil responded were
omitted.20 In contrast to the 2003 data, no information on free school meals, language spoken
at home or ethnicity is available for use in the modelling.
The coefficients of the school dummy variables in Model 2 are significant at the 5 percent level
for only 21 out of 155 schools, which is substantially less than in the analogous model for the
2003 data. The quadratic specification for KS4_pts works well but the turning point comes at a
lower score level than in 2003. The predicted probability of response varies somewhat less
with KS4_pts than in 2003 – compare Figure 4.7 with Figure 3.7. In contrast to 2003, school
size and gender are statistically significant.
20
Pupils in schools in which the pupil response rate was less than 25% in principle are omitted since these schools
are treated in PISA as not responding to the survey. However, some of these individuals have remained in the
sample we use.
- 77 -
Table 4.17: Logistic regression models of pupil response, 2000
(1a)
0.278
(0.089)***
0.087
(0.008)***
-0.082
(0.009)***
-1.303
(0.227)***
0.484
(0.157)***
-1.059
(0.713)
Male
Ks4pts
ks4_pts squared/100
Missings
West Midlands
School size/100
(1b)
0.278
(0.076)***
0.087
(0.007)***
-0.082
(0.008)***
-1.303
(0.142)***
0.484
(0.124)***
-1.059
(0.415)**
(2)
0.267
(0.086)***
0.096
(0.008)***
-0.089
(0.009)***
-2.119
(0.192)***
Private school
Constant
-0.439
(0.213)**
5164
Observations
Pseudo R-squared
log-lklhd
F statistic
-1.423
(0.552)***
-0.480
(0.436)
5090
0.14
-2214.27
-0.439
(0.154)***
5164
0.06
-2439.06
32.71
Note: the variable school size gives the number of 15 year olds in schools (this number is estimated with the
population data and is not equal to the variable ‘mos’ that gives the number of 15 year olds at the time of the
construction of the sampling frame). Model 1a takes clustering of pupils in schools for the estimation of standard
errors into account. Model 2 also includes 155 school dummies.
Figure 4.7: Predicted probability of pupil response by GCSE point score, 2000
predicted probability of response
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
Ks4pts
50
60
70
80
Note: Predicted probabilities based on Model 1. Other variables set to their mean values. The graph displays
predicted probabilities for values of the GCSE points scores between the 5th and 95th percentiles of the sample
values.
- 78 -
b) Weighting for non-response
We construct a student non-response weight by calculating the inverse of the predicted
probabilities for responding pupils. Summary statistics are given in Table 4.18. Weight1 is the
non-response weight based on Model 1 and weight2 is based on Model 2. The weights based
on Model 2 display much more variation due to the inclusion in that model of the set of school
dummies. In contrast to 2003, we then trim the highest values to a maximum equal to the mean
multiplied by the ratio of the OECD maximum student weight to its mean. This results in
maxima of 586 for weight1 and 577 for weight2 (affecting the values for 16 and 17
observations respectively).
Table 4.19: Summary statistics of non-response weights derived from logistic regression,
2000
Variable
weight1
weight2
Obs
4120
4051
Mean
1.253
1.257
Std. Dev.
0.197
0.379
Min
1.080
1.012
Max
3.006
10.448
Note: weight1 gives the inverse of the response probability derived from Model 1 (Table 4.21); weight2 refers to
Model 2.
These constructed non-response weights have smaller means and lower standard deviations
than in 2003. This suggests that their use will lead to smaller changes in results than in 2003.
c) Construction of a combined weight
As in 2003 we combine the weights just described with the OECD school weight (‘wnrschbw’
in 2000) and allow also for the sample design. The weight for pupil j in school i is calculated as
follows, where Mi is the total number of 15 year old pupils in school i and mi is the number of
sampled pupils:
R_WEIGHT1ij = wnrschbwi * Mi/mi * weight1ij
R_WEIGHT2ij = wnrschbwi * Mi/mi * weight2ij
Table 4.19 compares summary statistics for the ONS student weight, the OECD student weight
and our two new weights (taking into account the trimming of the extreme values described
above).21
21
As noted in Chapter 2, five schools have values of the OECD school weight of 0 despite a positive student
weight (for a reason we do not understand) and this explains the minimum of 0 for our weights in Table 4.19.
- 79 -
Table 4.19: Summary statistics of different student weights, 2000
Variable
ONS student weight
stu_wt
OECD student weight
w_fstuwt
R_WEIGHT1
R_WEIGHT2
Obs
Mean
Std. Dev.
Min
Max
CV
4120
.02868
.00136
.0286
.05
0.047
4120
135.98
48.614
33.12
588.56
0.358
4120
4051
135.46
133.39
77.79
68.72
0
0
1934.40
1167.03
0.574
0.515
Note: R_WEIGHT1 and R_WEIGHT2 are the product of the OECD school weight and other weight factors as
described in Section 3.3.1. The OECD school weight has the value of zero for 90 responding pupils in 5 schools.
For those students R_WEIGHT1 and R_WEIGHT2 are also equal to zero.
4.3.2 Results
Table 4.20 shows the result of applying the three sets of weights: ONS student, OECD student
and our own calculated weights. We also show results for unweighted data.
Obs
Standard
deviation
Mean
Table 4.20: KS3 and KS4 scores for responding pupils, different student weights, 2000
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
For ONS and OECD weight
For R_WEIGHT1
For R_WEIGHT2
KS3_aps
KS4_mean
KS4_pts
33.77
33.60
33.28
33.35
6.06
6.15
6.32
6.34
3,660
3,573
3,510
4.69
4.67
4.58
4.59
1.56
1.58
1.68
1.69
3,976
3,887
3,820
44.62
44.40
43.29
43.45
17.40
17.59
18.68
18.71
3,976
3,887
3,820
Note: the number of observations in 2 refers to calculations with R_WEIGHT2, the number of observations in 1
refer to all other weight applications.
Mean achievement scores are somewhat lower with the OECD student weights than with the
ONS weights. The application of our own weights pushes the means down further, close to the
means for all sampled pupils in responding schools in 2000 shown in Table 4.1, the values of
which we reproduce below.22 The application of our weights also pushes up the standard
deviations to about the level for all sampled pupils shown in Figures 4.1 to 4.3.
22
This comparison glosses over the fact that our weights include an adjustment for school non-response via their
incorporation of the OECD school weights.
- 80 KS3_aps
KS4_mean
KS4_pts
4.4
33.46
4.56
43.24
Response bias in PISA scores
We estimate the impact of response bias on PISA scores in 2000 using the three approaches
outlined in Chapter 3, Section 3.4: (a) z-scores, (b) regression adjustment and (c) response
weights.
The mean PISA scores in reading, maths and science for the final sample of responding pupils
in responding schools in England in 2000 are given in Table 4.21. These are results obtained
by applying the ONS student weights, and not the OECD student weights (results obtained
with the latter are very similar – see Table 4.24 below).
Table 4.21: PISA means and standard deviations (ONS weights), 2000
PISA reading
Mean
SD
524.13
99.26
PISA maths
Mean
SD
529.27
90.59
PISA science
Mean
SD
534.27
96.57
Note: mean scores are calculated with the average of the five PVs. The standard deviations are the average of the
standard deviations calculated separately for each PV. The calculations are based on the final PISA sample of
respondents but are weighted with the ONS student weights.
As for 2003, we now try to assess how these scores might have changed if there had not been
any response bias to the survey at the pupil level.
4.4.1 Z-score method
The procedure for the adjustment is outlined in Chapter 3, section 3.4: we take the differences
in KS3 and KS4 means between responding and sampled students, standardise these by the
KS3 and KS4 standard deviations and then multiply by the PISA standard deviations in table
4.27.
Table 4.22: Calculation of bias in the PISA means: z-score method, 2000
7.
6.
5.
3.
2.
1.
Multiplied Multiplied
4.
Multiplied
SD
Mean
Mean
with SD
with SD
(Mrwith SD
responding sampled sampled
PISA
PISA
Ms)/SD
PISA read
pupils=Mr pupils=Ms pupils
science
maths
KS3_aps
33.77
33.46
6.24
0.050
4.96
4.53
4.83
KS4_mean
4.68
4.56
1.66
0.072
7.15
6.52
6.95
Note: the KS3 and GCSE mean scores are calculated on unweighted data, but weighting with ONS weights does
not make a big difference.
- 81 -
The difference in mean KS3 points between responding pupils and sampled pupils translates
into a difference of the mean PISA scores of about 5 points. As in 2003, the difference in mean
KS4 points translates into rather larger differences in PISA scores, about 7 points.
Taking the adjustment based on the KS3 results, mean England PISA scores in 2000 allowing
for student non-response can be estimated to be 519 instead of 524 for reading, 525 instead of
529 for maths, and 529 instead of 534 for science.
4.4.2 Regression method
We regress PISA scores for 2000 respondents on their KS3 (or KS4) scores. We use the
coefficient of the KS3 (or KS4) as the PISA equivalent of a KS3 (or KS4) point, and then
apply this to the difference in the KS3 (or KS4) means for respondents and all sampled pupils.
The regressions are reported in Chapter 5. As in 2003, PISA scores and KS3 and KS4 scores
are strongly correlated. For example, PISA reading has a correlation of 0.83 with KS3_aps and
0.81 with KS4_mean. Regressions of PISA reading, maths and science on KS3_aps yields
slope coefficients of 12.64, 11.31 and 12.07 respectively (using weighted data). Applying these
values to the differences in means in KS3_aps of respondents and sampled pupils in Table 4.22
yields estimates of bias in mean PISA points of 3.9, 3.5 and 3.7 respectively for reading, maths
and science. Differences estimated by using GCSE scores are greater – see Table 4.23. As in
2003, these figures are smaller than the figures obtained with the z-score method.
Table 4.23: Calculation of bias in PISA means: regression method, 2000
Ks3_aps
KS4_mean
Reading
3.92
5.97
Maths Science
3.51
3.74
5.38
5.56
4.4.3 Use of response weights
The third method to estimate the extent of bias is to calculate mean scores with and without use
of the response weights applied in Section 3.3 – the OECD student weights and our own
weights based on the logistic regression modelling of student response.
Application of the OECD weights reduces the means by between 0.25 and 0.75 of a point and
increases the standard deviations by about 1 point. Application of our weights has a
substantially larger impact in both cases. Table 4.25 estimates response bias in mean scores by
comparing the results using the ONS weights with the results using our weight R_WEIGHT1.
The upward bias in PISA scores in the sample of respondents is estimated as about 6 in the
mean. The downward bias in the standard deviation is estimated at about 4 points.
- 82 -
Standard
deviation
Mean
Table 4.24: PISA scores for responding pupils, different student weights, 2000
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
ONS STU_WT
OECD w_fstuwt
R_WEIGHT1
R_WEIGHT2
PISA
maths
529.27
529.06
523.56
524.30
90.59
91.71
94.87
94.57
PISA
reading
524.13
523.41
518.02
518.22
99.26
100.34
103.91
103.62
PISA
science
534.27
533.46
528.32
527.99
96.57
97.50
100.76
100.67
Obs
2292/4120/2284
2292/4120/2284
2240/4030/2234
2203/3961/2197
2292/4120/2284
2292/4120/2284
2240/4030/2234
2203/3961/2197
Note: the standard deviation is calculated as the mean of the five standard deviations of each plausible value. The
mean value refers to the pupils’ mean across all five plausible values. The three numbers in the last column give
the number of observations used for maths, reading and science respectively. In PISA 2000, reading scores are
available for all pupils but maths and science scores for only half of them. This leads to different OECD weights
depending on whether reading, maths or science scores are calculated. These different weights are applied in our
analyses.
Table 4.25: Calculation of bias in PISA means and standard deviations: response weights
method, 2000
Mean
Standard Deviation
reading
+6.1
-4.7
maths
+5.7
-4.3
science
+6.0
-4.2
Note: the table shows differences between means and standard deviations in Table 4.31 calculated with the ONS
student weight and R_WEIGHT1.
Table 4.26 summarises our results, showing the estimated upwards bias in mean PISA scores
resulting from pupil non-response. In the case of the response weight method, the estimate also
takes into account the impact of non-response at the school level (since our weight
R_WEIGHT1 includes the OECD school weight). An estimate of about 5 or 6 points seems
appropriate (based on the regression method and the KS4 results or the response weights).
Table 4.26: Summary of estimate of bias in PISA means: different methods, 2000
z-score method
regression method
response weight
Ks3_aps Ks4_mean Ks3_aps Ks4_mean R_WEIGHT1 R_WEIGHT2
Reading
5.0
7.3
3.9
6.0
6.1
5.9
Maths
4.6
6.6
3.5
5.4
5.7
5.0
Science
4.8
7.1
3.7
5.6
6.0
6.3
Note: estimates taken from Tables 4.22, 4.23 and 4.25.
- 83 -
4.4.4 Implications
How does the estimated bias in the PISA means compare with the estimates of their standard
errors? The published standard errors of mean PISA scores in England in 2000 were 3.0 for
reading, 2.9 for maths and 3.2 for science; and standard errors for the standard deviation were
1.7 for reading, 1.8 for maths and 2.3 for science (Gill et al, 2002: Tables 3.1, 4.1, 5.1, A3.1).
Our estimates of the biases are therefore about double these figures, as in 2003.
But, again as in 2003, the impact of the bias on England’s position in a ranking of OECD
countries by mean scores is small. Mean scores that were lower by 6 points would have moved
England by just one place in a ranking for reading, three places for science, and would have
left England’s place in the ranking unchanged for maths.
Our guess is that had the OECD been faced with the evidence on the relative sizes of the bias
and the standard errors at the time, the data for England for 2000 would have been judged in
the same way as the data for 2003: unreliable for estimates at the national level. However, it is
not clear that there would have been any need for DfES to have uncovered and then presented
that evidence. The pupil response rate for England in 2000 of 81 percent (see Table 1.1) was
just above the required minimum threshold of 80 percent. By this criterion, pupil response in
England in 2000 was at a level that was considered satisfactory.
This demonstrates the problem in relying on arbitrary threshold levels. (See also Sturgis et al
2006 on this subject.) It also begs the question as to the extent of response bias in countries
with pupil response rates that were little higher than the rate in England. Australia, Canada,
Germany, Ireland, the Netherlands and the US all had rates of between 84 and 86 percent
(OECD 2001a: 235).
- 84 -
- 85 -
5
Changes over time
This chapter compares results for 2000 and 2003. We start in Section 5.1 with an overview of
changes in means and standard deviations of domestic achievement measures (KS3 and KS4)
for each of the five groups identified at the end of Chapter 1. Section 5.2 then summarises the
key results of the bias analyses presented in Chapters 3 and 4. Section 5.3 compares regression
models for 2000 and 2003 in which PISA scores for respondents are regressed on domestic
achievement scores (KS3 and KS4). Our aim is to see if there has been a change in relationship
between the two.
In comparing results for 2000 and 2003 we try to allow for differences in the definitions of the
national databases for the two years. Chapter 2 described how 2000 national data set excludes
pupils with no GCSE entries and includes pupils with special educational needs. The 2003 data
include pupils with no GCSE entries and in our analysis of the 2003 data we excluded pupils
with special educational needs in order to proxy permitted PISA exclusions. In order to
compare results for 2000 and 2003 more directly we therefore revised the 2003 national
database to exclude pupils with no GCSE entries and to include children with special
educational needs in normal schools. (Pupils in special schools continued to be excluded from
both the 2000 and 2003 datasets.) The results for 2003 in this chapter therefore are slightly
different from those in Chapter 3.
It is important to note that the 2000 and 2003 national databases differ even after this
adjustment. For example, the 2003 database is still based on a 14 month birth window
(although this seems not to be a big issue) and there may be other differences that we are
unaware of.
5.1 Summary of results for different stages of sampling and
response
Figure 5.1 shows the mean and standard deviations for the three domestic achievement
variables on which we have focused, KS3_aps, KS4_mean, and KS4_pts, for each of the five
groups identified in Chapter 1: (i) the population of all 15 year old pupils, (ii) all pupils in
sampled schools, (iii) all pupils in responding schools, (iv) sampled pupils in responding
schools, and (v) responding pupils in responding schools.
The lines with the solid black symbols in each diagram show results for 2003. In the case of the
two ‘quantity’ measures, KS3_aps and KS4_mean, the population values for the mean scores
in 2003 [group (i)] are above those for 2000, reflecting a rise in average domestic achievement
between the two years. (The standard deviations are also higher in 2003, showing greater
dispersion of achievement in the later year.) The means for KS3_pts in the population are the
same in the two years.
- 86 -
Figure 5.1: means and standard deviations of KS3 and KS4 variables, 2003 and 2000
KS3 score per subject (KS3_aps):mean
KS3 score per subject (KS3_aps): SD
8
KS3 score (ks3_aps) standard deviation
35
Average KS3 score (ks3_aps)
34.5
34
33.5
33
2003
32.5
2000
2003
7.5
2000
7
6.5
6
5.5
32
5
i
ii
iii
iv
v
i
ii
Group of pupils
GCSE points per subject (KS4_mean): mean
v
GCSE mean (ks4_mean) standard deviation
2
4.8
Average GCSE score (ks4_mean)
iv
GCSE points per subject (KS4_mean): SD
4.9
4.7
4.6
4.5
4.4
2003
4.3
2000
4.2
1.9
2003
2000
1.8
1.7
1.6
1.5
1.4
1.3
i
ii
iii
iv
v
i
ii
Group of pupils
iii
iv
v
Group of pupils
Total GCSE points score (KS4_pts): mean
Total GCSE points score (KS4_pts): SD
46
24
45
44
43
42
41
40
2003
39
2000
38
Total GCSE score (ks4_ptc) standard deviation
Average total GCSE point score (ks4_ptc)
iii
Group of pupils
2003
23
2000
22
21
20
19
18
17
16
i
ii
iii
Group of pupils
iv
v
i
ii
iii
Group of pupils
Note: estimates are taken from Table 4.1 for 2000 and Table A4.6 in Appendix 4 for 2003.
iv
v
- 87 Looking at the effect of pupil response, which is shown by the change in values between
groups (iv) and (v), the graphs suggest that the impact was similar in the two years. The slope
of this segment of the lines is broadly the same for each measure of achievement and for both
the mean and the standard deviation.
The impact of school response is incorporated in the change in values between groups (ii) and
(iii), although we have noted that this change also reflects the impact of replacement given our
definition of group (ii). Here there is a marked difference between 2000 and 2003. In 2000, the
means rise as a result of school response, the amounts being more or less the same as for the
impact of pupil response. In 2003 the means fall slightly. The standard deviations fall in both
years.
The impact of sampling is in principle shown by comparing groups (i) and (ii) for school
sampling and (iii) and (iv) for pupil sampling. There is no clear pattern for school sampling.
But the mean appears to rise and the standard deviation fall in both years as a result of the pupil
sampling. This could be a perfectly legitimate result of the random sampling process: by
chance, samples were drawn in each year in which mean achievement was higher and
dispersion lower. Or it could be evidence in favour of poor sampling, with schools wrongly
excluding weak pupils who should have been included. Or it may reflect the PISA exclusions
policy correctly applied by schools. Or it could be the result of our inability to correctly
identify the PISA target population in our definition of group (iii). (We have commented for
example on the issue of the width of the birth window in 2003.) Caution is needed here.
Tables 5.1 and 5.2 summarise the impact of the changes in mean and dispersion on the
percentage of persons in ranges of KS3 or KS4 population percentiles. In both years it is the
under-representation in the final sample, group (v), of pupils at the bottom of the KS4
distribution that is most obvious, even if one restricts attention only to the impact of pupil
response as shown by the change between groups (iv) and (v).
Table 5.1: Percent of pupils in Key Stage 3 (ks3_aps) percentile range of the population,
2003 and 2000
iv
Percentile
Point
iii
ii
i
v
All sampled
Range
score
All pupils All pupils in All pupils in
Responding
pupils in
responding
all sampled
in
pupils
responding
PISA
PISA
England
weighted
schools
schools
schools
Schools
weighted
2003
<=10
<=25
12.20
11.90
11.67
10.87
9.54
>10 – 25
>25-29
14.90
14.75
14.83
14.79
12.97
>25-75
>29-39
53.01
52.67
53.88
54.57
56.34
>75-90
>39-43
13.33
13.56
13.48
13.70
14.57
>90
>43
6.56
7.12
6.13
6.06
6.58
2000
<=10
25
15.12
15.43
13.56
12.23
10.36
>10 – 25
>25-29
18.19
18.31
17.31
18.01
17.52
>25-75
>29-37
42.87
43.69
44.20
44.69
46.15
>75-90
>37-41
15.19
14.68
15.78
15.99
16.77
>90
>41
8.63
7.89
9.15
9.07
9.20
- 88 -
Table 5.2: Percent of pupils in average GCSE (ks4_mean) percentile range of the
population, 2003 and 2000
Percentile
i
ii
iii
iv
2003
<=10
<=2.0
11.14
10.91
10.75
8.91
>10 – 25
>2.0-3.2
13.59
13.53
14.02
14.83
>25-75
>3.2-5.7
50.27
49.92
51.12
51.47
>75-90
>5.7-6.6
15.06
15.28
14.92
15.53
>90
>6.6
9.95
10.37
9.19
9.26
2000
<=10
<=2.105
10.00
10.11
8.91
7.81
>10 – 25
>2.105-3.238
14.98
14.86
14.09
13.79
>25-75
>3.238-5.667
50.26
50.53
51.13
51.71
>75-90
>5.6667-6.619
14.72
14.57
14.88
15.32
>90
>6.619
10.04
9.92
10.99
11.39
5.2
v
6.35
13.37
53.61
16.88
9.79
5.18
13.29
53.88
16.26
11.39
Response bias at school and pupil levels
Table 5.3 summarises the results of the analyses of school and pupil level response bias in the
second section of each of Chapters 3 and 4. In the case of 2000, the results are exactly as in
Chapter 4. In the case of 2003, the results are slightly different from those in Chapter 3 and are
based on the adjusted data for that year (results are given in full in Appendix 5). Significant
differences at the 5 percent level are printed in bold.
The table focuses on the means of the continuous score variables, the binary variables
measuring whether someone is above or below a threshold score, and on gender and FSM
receipt. In each case it is the differences in values for respondents and non-respondents that is
shown. A negative difference indicates that the ‘PISA value’ is higher. That is, a negative value
means that the mean is higher for schools or pupils that responded to PISA. We do not
summarise results for the variance of scores.
Three sets of results are shown.
In the case of response by schools, we show two sets of results: (i) those with the school as the
unit of analysis and looking at response in the initial sample of schools only (approach (a) in
sections 3.2.2 and 4.2.2), and (ii) those with the pupil as the unit of analysis and comparing
pupils in responding schools with pupils in second replacement schools that did not respond
(approach (c) in sections 3.2.2 and 4.2.2).
- 89 -
Table 5.3: Summary of differences in means due to school and pupil response, 2000 and
2003
1.
difference in school average values
between schools responding in the initial
sample and schools not responding
2003
2000
2.
differences between pupils in responding
schools (initial, first or second replacement)
and pupils in schools not responding
(second replacement)
2000
2003
mean*100
p
mean*100
p
mean*100
p
mean*100
p
Male
-5.45
0.31
-1.77
0.78
-2.03
0.00
6.12
0.00
FSM
2.54
0.45
8.38
0.04
4.89
0.00
ELevel5fg
-2.15
0.56
0.30
0.95
-6.99
0.00
-6.18
0.00
MLevel5fg
-3.65
0.33
-0.82
0.86
-8.45
0.00
-2.79
0.00
SLevel5fg
-0.79
0.86
0.18
0.97
-10.00
0.00
-4.70
0.00
KS4_5ac
2.59
0.63
3.87
0.57
-5.02
0.00
1.03
0.24
Mean
Mean
Mean
Mean
KS3_aps
-0.20
0.79
0.07
0.94
-1.61
0.00
-0.36
0.01
KS4_mean
0.12
0.64
0.00
0.99
-0.19
0.00
0.11
0.00
3.
differences between pupils responding and those not responding
2000
2003
Male
mean*100
p
mean*100
p
-4.25
0.02
0.17
0.94
3.16
0.00
FSM
ELevel5fg
-10.21
0.00
-10.31
0.00
MLevel5fg
-11.93
0.00
-11.61
0.00
SLevel5fg
-10.57
0.00
-10.16
0.00
KS4_5ac
-14.41
0.00
-15.64
0.00
Mean
Mean
KS3_aps
-1.61
0.00
-1.64
0.00
KS4_mean
-0.66
0.00
-0.60
0.00
Source: estimates are taken from Tables 4.6, 4.15 and 4.15 for 2000 and Tables A5.1 to A5.3 of Appendix 5 for
the adjusted 2003 data. Note: a negative value means that the mean is higher for schools or pupils that responded
to PISA. Results in columns 1 and 2 are based on data weighted with the ONS school weights, results in column 3
are based on data weighted with the ONS student weights.
- 90 The following similarities and differences between 2000 and 2003 can be seen:
•
The mean characteristics of non-responding schools are almost always insignificantly
different from those of responding schools (block 1 of the columns of results). Only the
FSM variable in 2003 shows significant differences. (Note that when we used the larger
samples of all responding and non-responding schools (irrespective of initial, first or
second replacement status), some significant differences did begin to emerge in both years
– at least at the 10% level.)
•
Switching to the pupil as the unit of analysis rather than the school leads to significant
differences for all but one measure (block 2): the percentage of pupils with 5 or more
GCSEs at A*-C in 2003. The differences between responding and non-responding schools
for the achievement variables are typically much smaller in 2003. The sign for the
difference for the KS4_mean variable is not the same for the two years.
•
The pattern and size of differences between responding and non-responding pupils is very
similar in the two years for all the achievement measures (block 3). There are clear
significant differences between the achievement of responding and non-responding
students e.g. a difference of at least 10 percent points in the percentage of students above
KS3 level 5 in all three subjects and about 15 points for.
- 91 -
5.3
Associations between PISA scores and KS3 and KS4 scores
This section considers whether the association between the PISA scores of respondents and their
scores on the domestic achievement measures has changed over time.
Table 5.4 gives the correlations between PISA scores and the continuous KS3 and KS4 measures for
each year (weighted data)
Table 5.4: Correlations of PISA scores and KS3 and KS4 scores, 2000 and 2003
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
Maths
0.656
0.823
0.808
0.826
0.781
0.793
0.800
0.744
PISA 2000
Reading Science
0.726
0.678
0.763
0.772
0.806
0.826
0.829
0.824
0.797
0.777
0.808
0.789
0.813
0.795
0.766
0.749
Maths
0.644
0.847
0.811
0.828
0.739
0.770
0.778
0.715
PISA 2003
Reading Science
0.692
0.653
0.773
0.804
0.781
0.807
0.806
0.812
0.748
0.737
0.784
0.770
0.791
0.775
0.720
0.715
The direction of change is summarised in Table 5.5.
Table 5.5: Summary of correlation coefficients
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
Changes in the correlation
between the two years
PISA
PISA
PISA
Maths Reading Science
+
+
+
=
=
-
Note: ‘-‘ signifies that the correlation is smaller in 2003 than in 2000, ‘=’ that it is similar (+/- 0.005), ‘+’ that the
correlation is greater in 2003.
Correlations typically fall between the two years but the changes are only very slight. The average
correlation between a PISA score and a KS3 or KS4 score is virtually unchanged: 0.78 in 2000 and
0.76 in 2003 (weighted data). We cannot explain the apparent fall in mean PISA achievement between
the two years by a change in correlation between PISA and national English achievement measures.
We now look at the associations in more detail using scattergrams and simple regression models.
Figures 5.2 and 5.3 show respondents’ PISA scores in reading, maths and science plotted against their
- 92 KS3 scores (Figure 5.2) and their GCSE scores (Figure 5.3). The regression lines reported in these
diagrams are for the unweighted data while Tables 5.6a to 5.6e provide results on weighted data
(using the OECD student weight). The diagrams suggest that a simple linear model relating PISA
scores to KS3 or KS4 scores is a reasonable choice of specification. Models using the weighted data
were used in Sections 3.4 and 4.4 as one of our methods for estimating the extent of response bias in
PISA scores.
Figures 5.4a and 5.4b compare the regression lines for 2000 and 2003. The lines are plotted so as to
cover the range from the 5th to the 95th percentile of PISA scores in the data.
The typical pattern is for the 2003 line – the dashed line – to be a simple downwards shift of the 2000
line. In other words, what typically changes is the constant term and not so much the slope coefficient.
The latter clearly does however change in the case of (i) PISA reading and KS4_mean, (ii) PISA
reading and KS3 reading fine grade and KS3 maths fine grade. In these cases, the slope is lower in
2003. The changes in the constant are often large. For example, the constant in the (unweighted)
regression of PISA maths on KS3_aps falls from 142 to 103.
One way of summarising the results is to predict PISA scores from the regressions for the two years
for a given value of KS3 or KS4 score. We take the mean KS3 and KS4 scores in the 2003 data and
calculate the predicted score on the basis of the 2000 regression line and again on the basis of the
2003 regression line. The results are shown in Table 5.7.
There are some very big differences. The predicted PISA scores given the mean 2003 KS3_aps score
differ by 26 to 33 points (depending on the PISA subject) between the two years. The predictions
from the regressions with KS4_mean differ by less, 11-19 points, but this is still a substantial
difference.
The conclusion from this analysis is as follows. While there has been not much change in the
correlations between PISA and KS3 or KS4 between the two years, there certainly has been a major
change in the relationship between the two. For any given KS3 or KS4 score, a person in 2000 would
on average have obtained a much higher PISA score. Of course, part of that story was obvious a priori
from simply considering the changes between 2000 and 2003 in mean KS3 and KS4 scores (generally
up) and in PISA scores (down). What the correlation and regression analysis does is to provide some
insight into the nature of the changes. The story of a change in the constant term is about the simplest
that could be found. One interpretation of it would be that for a given level of competence in KS3 or
KS4, students in 2003 were ‘less good’ at PISA than in 2000. The explanation of that could be that the
PISA test has changed in quality over time. But another is that a given level of KS3 or KS4 score
signals a different thing in the two years.
- 93 -
Figure 5.2: Regression of PISA scores on KS3 average score (KS3_aps) (unweighted)
2003
800
800
700
700
600
600
PISA reading score
PISA reading score
2000
500
400
300
PISA read = 93.09 + 12.62*KS3
R2 = 0.68
200
500
400
300
PISA read = 119.3 + 11.06*KS3
R2 = 0.65
200
100
100
10
20
30
40
50
10
60
20
800
700
700
600
600
PISA maths score
PISA maths score
800
500
400
300
PISA maths = 142.14 + 11.34*KS3
R2 = 0.68
200
40
50
60
500
400
300
PISA maths = 102.73 + 11.55*KS3
R2 = 0.68
200
100
100
10
20
30
40
50
60
10
20
KS3 average scores
800
800
700
700
600
600
500
400
300
PISA science = 121.84+12.06*KS3
R2 = 0.68
200
30
40
50
60
KS3 average scores
PISA science score
PISA science score
30
KS3 average scores
KS3 average scores
500
400
300
200
PISA science = 89.01 * 12.30*KS3
R2 = 0.65
100
100
10
20
30
40
KS3 average scores
50
60
10
20
30
40
KS3 average scores
50
60
- 94 -
Figure 5.3: Regression results of PISA scores on GCSE scores (KS4_mean) (unweighted)
2000
2003
800
800
700
600
PISA reading score
PISA reading score
700
500
400
300
200
2
3
4
5
6
400
300
PISA read = 304.23 + 43.92*GCSE
R2 = 0.62
100
100
1
500
200
PISA read = 290.92+49.93*GSCE
R2 = 0.66
0
600
7
0
8
1
2
3
4
5
6
7
8
Mean GCSE point score
Mean GCSE point score
800
800
700
600
PISA maths score
PISA maths score
700
500
400
300
PISA maths = 320.04+44.77*GCSE
R2 = 0.64
200
600
500
400
300
PISA maths = 305.54+43.74*GCSE
R2 = 0.60
200
100
100
0
1
2
3
4
5
6
7
0
8
1
2
Mean GCSE point score
4
5
6
7
8
800
800
700
PISA science score
700
PISA science score
3
Mean GCSE point scores
600
500
400
300
PISA science = 314.20+47.07*GCSE
R2 = 0.64
200
1
2
3
4
5
Mean GCSE point score
6
7
500
400
300
200
100
0
600
8
PISA science = 47.306x + 301.52
R2 = 0.5954
100
0
1
2
3
4
5
Mean GCCSE point score
6
7
8
- 95 -
Table 5.6a: Regression of PISA score on KS3 score (KS3_aps)
Ks3_aps
Constant
Observations
R-squared
reading
12.643
(0.141)
92.665
(4.822)
3660
0.69
2000
maths
11.313
(0.171)
143.293
(5.839)
2038
0.68
science
12.078
(0.184)
121.446
(6.292)
2033
0.68
2003
maths
11.541
(0.133)
102.303
(4.702)
3444
0.69
reading
11.072
(0.139)
118.045
(4.896)
3444
0.65
science
12.322
(0.151)
87.450
(5.335)
3444
0.66
Table 5.6b: Regression of PISA score on GCSE score (KS4_mean)
ks4_mean
Constant
Observations
R-squared
reading
49.787
(0.567)
291.221
(2.797)
3976
0.66
2000
maths
44.796
(0.715)
319.611
(3.537)
2211
0.64
science
46.327
(0.754)
317.072
(3.727)
2207
0.63
reading
43.898
(0.560)
303.679
(2.737)
3676
0.63
2003
maths
43.766
(0.584)
304.791
(2.853)
3676
0.60
science
47.335
(0.636)
300.479
(3.107)
3676
0.60
2003
maths
52.940
(1.067)
206.031
(6.094)
3469
0.42
science
58.372
(1.151)
187.792
(6.574)
3469
0.43
2003
maths
59.632
(0.637)
150.820
(3.845)
3469
0.72
science
61.691
(0.774)
150.930
(4.673)
3469
0.65
Table 5.6c: Regression of PISA scores on KS3 English fine grade score
ks3Efg
Constant
Obs
R-squared
read
63.383
(0.989)
163.584
(5.613)
3681
0.53
2000
math
51.033
(1.299)
239.200
(7.357)
2045
0.43
science
56.414
(1.356)
211.249
(7.722)
2041
0.46
read
56.024
(0.991)
188.171
(5.661)
3469
0.48
Table 5.6d: Regression of PISA scores on KS3 maths fine grade score
ks3Mfg
Constant
Obs
R-squared
reading
60.836
(0.849)
172.172
(4.921)
3686
0.58
2000
maths
58.881
(0.897)
188.490
(5.203)
2049
0.68
science
59.062
(1.075)
192.538
(6.220)
2048
0.60
reading
53.645
(0.747)
185.647
(4.507)
3469
0.60
Table 5.6e: Regression of PISA score on KS3 science fine grade
ks3Sfg
Constant
Obs
R2
reading
73.689
(0.893)
118.259
(4.920)
3682
0.65
2000
maths
66.207
(1.068)
164.526
(5.888)
2047
0.65
science
72.481
(1.094)
135.145
(6.016)
2041
0.68
reading
67.629
(0.919)
121.616
(5.254)
3469
0.61
Note: results are based on data weighted with the OECD student weights.
2003
maths
71.334
(0.873)
101.274
(4.990)
3469
0.66
science
77.278
(0.961)
80.049
(5.494)
3469
0.65
- 96 -
Figure 5.4a: Predicted PISA achievement scores by KS3 and KS4 scores
GCSE point score, ks4_mean
700
700
650
650
600
600
PISA reading score
PISA reading score
PISA reading
KS3 average score, ks3_aps
550
500
450
2000
400
2003
350
550
500
450
2000
400
2003
350
300
300
20
25
30
35
40
45
50
0
2
4
6
8
Mean GCSE point score
700
700
650
650
600
600
PISA science score
PISA science score
PISA science
KS3 average score
550
500
450
2000
400
2003
350
550
500
450
2000
400
2003
350
300
300
20
25
30
35
KS3 average score
40
45
50
0
2
4
6
Mean GCSE point score
Note: the range of PISA scores predicted is limited at the lower level by the 5th percentile and at the upper level by the
95th percentile. The crossing point of the 2003 and 2000 lines for PISA reading is the mean GCSE point score of 2.12.
8
- 97 -
Figure 5.4b: Predicted PISA achievement scores by KS3 and KS4 scores
700
700
650
650
600
600
PISA science score
PISA reading score
PISA science
550
500
450
2000
400
2003
550
500
450
2000
400
350
2003
350
300
300
2
3
4
5
6
7
8
9
2
3
4
English Key Stage 3 level
6
7
8
9
700
650
650
600
PISA science score
PISA reading score
5
English Key Stage 3 level
700
550
500
450
2000
400
2003
350
600
550
500
450
2000
400
2003
350
300
300
2
3
4
5
6
7
8
9
2
3
4
Maths Key Stage 3 level
5
6
7
8
9
Maths Key Stage 3 level
700
700
650
650
600
600
PISA science score
PISA reading score
Science Key Stage 3 level
Maths Key Stage 3 level
English Key Stage 3 level
PISA reading
550
500
450
2000
400
2003
550
500
450
2000
400
2003
350
350
300
300
2
3
4
5
6
Science Key Stage 3 level
7
8
9
2
3
4
5
6
7
8
Science Key Stage 3 level
Note: Key Stage 3 level refer to fine grades. The range of PISA scores predicted is limited at the lower level by the 5th
percentile and at the upper level by the 95th percentile. The crossing point of the 2003 and 2000 line for PISA reading is
the English Key Stage 3 level 3.34.
9
- 98 -
Table 5.7: Predicted PISA scores in 2000 and 2003 at mean values of KS3 and KS4 scores in
2003
Explanatory variable
Mean
score in
2003
KS3 average score
(KS3_aps)
34.763
Mean GCSE points
(Ks4_mean)
4.629
English KS3 level
5.621
Maths KS3 level
5.908
Science KS3 level
5.639
PISA reading
PISA maths
PISA science
PISA reading
PISA maths
PISA science
PISA reading
PISA maths
PISA science
PISA reading
PISA maths
PISA science
PISA reading
PISA maths
PISA science
Predicted PISA score
2003
2000
regression
Difference
regression
line
line
532.2
503.0
29.2
536.6
503.5
541.3
515.8
25.5
521.7
506.9
14.8
527.0
507.4
531.5
519.6
11.9
519.9
503.1
16.8
526.1
503.6
528.4
515.9
12.5
531.6
502.6
29.0
536.4
503.1
541.5
515.4
26.1
533.8
503.0
30.8
537.9
503.5
543.9
515.8
28.1
Note: the mean score in 2003 is calculated using those pupils who participated in PISA and are weighted with the ONS
student weight. Maths results cannot be compared directly between both years because contents areas used for measuring
maths ability differed between 2000 and 2003.
- 99 -
6
Conclusions and recommendations
Our main results are summarised in Chapter 5. In this chapter we focus on the conclusions we draw
from the analyses made in the report. We also make recommendations for any future analyses of
response bias, for the possible design of further rounds of the survey, and for the use of the existing
PISA data for England.
England has experienced a problem of achieving levels of response that are considered satisfactory by
the OECD, both among schools and among pupils. At first sight, the greater problem is at the school
level. England’s post-replacement response rate fell below levels defined as ‘acceptable’ in both 2000
and 2003. (In fact in both years the pre-replacement school response rate was in principle so low as to
be ‘not acceptable’, irrespective of replacement.) The pupil response rate in 2003 was close to the
level considered acceptable by OECD and in 2000 just exceeded this threshold. For this reason it is
response among schools that seems to have been the focus of DfES concerns following the 2003 PISA
round (leading for example to the commissioning of the analysis by Sturgis et al 2006).
However, if we look at the characteristics of responding schools, measured by mean achievement
levels of their pupils, compared to those of schools that do not respond, there does not appear to be a
statistically significant problem of response bias. This conclusion is based on taking the school as the
unit of analysis – on the grounds that it is schools that choose whether or not to respond. If we take the
pupils in the schools as the unit of analysis (and compare the mean across all pupils in responding
schools with the mean across all pupils in schools that do not respond) then significant differences
emerge, with pupils in responding schools having higher mean scores. This is especially true in 2000.
Indeed, it is not true in 2003 for the main KS4 summary measures we focused on.
The impact of response bias on results for England is larger and clearer at the stage when pupils
respond, despite pupil response rates being at or near the threshold deemed acceptable by the OECD.
In both 2000 and 2003, pupils that responded have KS3 and KS4 scores that display a higher mean
and lower dispersion than scores for non-respondents. We estimate that the impact of the bias was to
increase mean PISA scores for respondents by about 6 points in both years and to reduce the standard
deviation by about 3 or 4 points. The OECD student weights appear to have little impact in reducing
these biases. We consider both the direction and magnitude of these estimates as reasonably reliable.
A bias of about 6 points in the mean is around twice as large as the impact of sampling error, as
summarised in the estimated standard error of the mean. Setting aside the issue of the variability of
this bias in other possible samples, this estimate implies that bias would dominate sampling error in an
estimate of ‘total survey error’ in the mean. This finding is consistent with the criterion for exclusion
of results for any country that seems to be implied in the OECD’s first international report for the
2003 PISA round. We have noted, however, that the impact of the bias on the position of England (or
the UK) in a ranking of OECD countries by mean scores would have been very slight.
Evidence of the extent of pupil response bias in 2000 was not available when the decision was taken
on whether to include results for the UK in the international report for the 2000 data. However, the
logic is that this evidence should not have been asked for, or provided, since the level of pupil
response in England in 2000 reached the 80 percent threshold deemed by the OECD as ‘acceptable’. It
is natural to wonder whether other countries with pupil response rates that cleared this threshold by
only a slightly larger margin in either 2000 and 2003 had levels of response bias in estimates of mean
- 100 scores that would also have dominated estimates of total survey error. This illustrates clearly the
danger of imposing arbitrary thresholds for satisfactory response. (See also Sturgis et al 2006.)
The exclusion of the UK from the international report for the 2003 round of PISA will have given a
signal that the data for this year for England are of doubtful quality. The OECD do note that ‘the
results for the UK are, however, accurate for many within-country comparisons between subgroups
(e.g. males and females) and for relational analyses’ (OECD, 2004: 328).23 Nevertheless the headline
event of the exclusion of the national level results from the international report has seriously
threatened confidence in the data.24
In our view the data do have considerable value, including at the national level. An appropriate return
can still be obtained on the public resources that were devoted to the collection of the data in England.
However, if the data are to be widely used for national level estimates, suitable weights need to be
provided that adjust for the probability of pupil response. The provision of such weights should
increase confidence in the use of the data, which are already publicly available on the OECD website.
The weights we have experimented with as part of our analysis, based on modelling the pupil response
probabilities, provide one way forward. These weights remove most of the apparent bias that is due to
pupil response, both in mean scores and in the dispersion of scores.
Our recommendations cover a range of issues, including research on topics we have only touched on
in this report.
1. Analysis of response in future waves of PISA should take place on data sets that are
prepared and documented at an early stage. (Addressed to DfES and the England PISA
contractor.)
The necessary files from the PISA sampling process (including weighting variables) should be
prepared and documented at the time of the survey is conducted in order to avoid problems associated
with any staff turnover in the survey agency collecting the data.
2. Future analyses of response bias should investigate the impact of (i) school replacement, and
(ii) the use of weights that take into account different response levels within school strata.
(Addressed to DfES.)
The impact of school replacement needs to be investigated rather than assumed to be neutral. Some of
our comparisons of responding and non-responding schools showed that rather different results were
obtained depending on whether the focus was on the initial sample of schools or on all schools that
had been approached. The impact of the OECD school weights based on stratification variables on any
response bias at the school level needs to be investigated systematically.
23
The rationale for this statement is presumably that mean square error in sub-samples will probably be dominated by
sampling error rather than bias: while sampling error will be larger in sub-samples, bias could remain the same.
24
Simon Briscoe, Economics Editor at The Financial Times, noted at the Royal Statistical Society 2005 Cathie Marsh
Memorial Lecture on ‘Public Confidence in Official Statistics’ that the exclusion of the UK from the 2003 PISA report
was among the ‘Top 20’ recent threats to that confidence. (We are grateful to James Brown for this observation.)
- 101 -
3. Consideration should be given to whether it is practical to stratify samples of pupils within
schools by a domestic measure of achievement (KS3 or KS4). (Addressed to DfES and the
PISA Consortium.)
If the main problem of bias arises at the stage of pupil response, then the stratification of pupil
sampling needs to be investigated. The importance of considering this possibility is underlined by the
OECD’s apparent commitment to allowing response adjustments only through weights based on
stratification variables (i.e. in line with response rates within strata), rather than permitting poststratification weights of the type that are used in many surveys.
4. DfES should consider ways of raising response among pupils and not just among schools.
(Addressed to DfES.)
It is tempting to see the school response as the bigger problem, since the after-replacement response
rate of schools fell well below the level defined as acceptable by the OECD in both 2000 and 2003,
while the pupil response rate was above the minimum threshold in 2000 and close to it in 2003. But it
is the response biases at the pupil level that emerge most clearly in our study.
5. Criteria for exclusion of any country from the international reports for PISA need to be
made explicit by OECD and clearly justified. Evidence for a decision on exclusion needs to
be published. (Addressed to OECD and the PISA Consortium)
The criteria for excluding results for the UK in 2003 (in terms of contributions to total survey error)
were at best implicit. Such criteria should be explicit. And having stated the criteria for exclusion, the
evidence on which a decision is taken to exclude a country needs also to be given clearly. The
evidence in the case of the UK in 2003 was only hinted at in the published international report.
6. Response weights for responding pupils in England in both 2000 and 2003 that are based on
a statistical model of pupil response of the type we present in this report should be provided
for users of the data. (Addressed to DfES and to the OECD.)
We have shown that satisfactory weights to adjust for pupil non-response can be calculated. These
weights could be placed in files on the DfES website for users to download and merge with the
microdata for England that are available from the OECD website. The OECD should be encouraged to
publicise the existence of these files of weights on its website.
7. The OECD should be engaged in discussion of whether adjustment for response bias using
post-stratification response weights could be used in the future to avoid excluding a country
from the international report. (Addressed via DfES to the OECD.)
The use of post-stratification weights is a common survey practice. OECD is presumably reluctant to
permit their use on the grounds that this would open PISA to all sorts of individual country practices
that would be hard to monitor. However, the use of such weights could be restricted to countries in
danger of being excluded from the international reports due to problems of response, and only
permitted when it could be shown that the weights are based on variables that are well correlated with
PISA achievement measures, as is the case with domestic English achievement scores.
- 102 -
- 103 -
References
Biemer P and Lyberg L (2003), Introduction to Survey Quality, Wiley: Hoboken NJ
Brown G and Micklewright J (2004) ‘Using International Surveys of Achievement and Literacy: A
View from the Outside’ UNESCO Institute for Statistics, Working Paper 2, UNESCO: Montreal.
DfES (2005), ‘PISA 2003: Sample design, response and weighting for the England sample’,
unpublished document.
Gill B, Dunn M and Goddard E (2002), Student Achievement in England, London: The Stationary
Office.
Greene W (1993), Econometric Analysis, 2nd ed, Macmillan, New York
OECD (2001a), Knowledge and Skills for Life – First results from PISA 2000, OECD, Paris.
OECD (2001b), PISA 2000 Technical Report, OECD, Paris.
OECD (2004), Learning for Tomorrow’s World. First Results From PISA 2003, OECD, Paris.
OECD (2005a), PISA 2003 Technical Report, OECD, Paris.
OECD (2005b), PISA 2003 Data Analysis Manual, OECD, Paris,
Available at: http://www.pisa.oecd.org/dataoecd/53/22/35014883.pdf
Sturgis, P, Smith P and Hughes G (2006) A Study of Suitable Methods for Raising Response Rates in
School Surveys, report for DfES.
- 104 -
- 105 -
Appendix 1: Summary statistics for five groups of pupils in 2003
Table A1.1: All pupils in schools in England (i), 2003
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
698031
644301
643042
647134
645396
642026
645008
642532
698031
Missing
6
53736
54995
50903
52641
56011
53029
55505
6
Mean * 100
50.017
13.778
69.674
70.537
69.825
71.079
72.026
71.400
55.788
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
642026
645008
642532
637949
698030
698031
698031
698031
Missing
56011
53029
55505
60088
7
6
6
6
Mean
5.512
5.785
5.541
34.162
42.854
36.264
4.350
5.377
SD
1.113
1.285
1.052
6.618
21.093
15.814
1.850
4.115
Note: FSM eligibility and KS3 scores are missing for about 8% of the population. These should typically be children in
private schools.
- 106 -
Table A1.2: All pupils in all sampled schools (ii), 2003
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
117045
112892
112013
112466
112250
111803
112106
111807
117045
Missing
6
4159
5038
4585
4801
5248
4945
5244
6
Mean*100
49.332
12.563
70.073
71.075
70.422
71.444
72.616
72.026
55.912
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
111803
112106
111807
110823
117045
117045
117045
117045
Missing
5248
4945
5244
6228
6
6
6
6
Mean
5.531
5.810
5.564
34.301
42.879
36.451
4.373
5.370
SD
1.124
1.284
1.052
6.642
20.810
15.778
1.850
4.080
Note: data are weighted with ONS school weight (sch_bwt)
Table A1.3: All pupils in responding schools (iii), 2003
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
35446
34362
34314
34293
34323
34141
34187
34179
35446
Missing
2
1086
1134
1155
1125
1307
1261
1269
2
Mean * 100
47.395
11.861
70.538
71.135
70.805
71.958
72.715
72.487
55.309
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
34141
34187
34179
33792
35446
35446
35446
35446
Missing
1307
1261
1269
1656
2
2
2
2
Mean
5.521
5.799
5.556
34.214
42.627
36.172
4.334
5.317
SD
1.097
1.267
1.024
6.467
20.609
15.477
1.815
4.091
Note: data are weighted with ONS school weight (sch_bwt)
- 107 -
Table A1.4: All sampled pupils (iv), 2003
Variable
male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
5091
4816
4826
4825
4827
4809
4813
4812
5091
Missing
108
383
373
374
372
390
386
387
108
Mean * 100
46.280
11.357
71.190
71.681
70.991
72.308
73.007
72.340
56.354
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
4809
4813
4812
4772
5091
5091
5091
5091
Missing
390
386
387
427
108
108
108
108
Mean
5.534
5.802
5.555
34.232
43.496
36.975
4.414
5.396
SD
1.094
1.276
1.027
6.447
19.745
14.660
1.729
4.067
Note: data are weighted with ONS student weight. Unweighted and weighted results are very similar.
Table A1.5: All responding pupils (v), 2003
Variable
male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
3695
3475
3491
3487
3490
3481
3481
3481
3695
Missing
57
277
261
265
262
271
271
271
57
Mean * 100
46.267
10.399
74.406
75.280
74.276
75.495
76.632
75.487
60.948
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
3481
3481
3481
3455
3695
3695
3695
3695
Missing
271
271
271
297
57
57
57
57
Mean
5.619
5.908
5.636
34.757
45.769
38.744
4.605
5.799
SD
1.072
1.247
1.001
6.298
18.545
13.450
1.613
3.982
Note: data are weighted with ONS student weight. Unweighted and weighted results are very similar.
- 108 -
Appendix 2: Summary statistics for five groups of pupils in 2000
Table A2.1: All pupils in schools in England (i), 2000
Variable
Male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
556544
511517
512950
511036
509353
511005
508585
556544
Missing
4
45031
43598
45512
47195
45543
47963
4
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
509353
511005
508585
508603
556544
556544
556544
556544
47195
45543
47963
47945
4
4
4
4
Mean*100
50.347
68.123
63.995
59.151
69.346
65.593
60.609
52.100
Mean
5.463
5.554
5.297
32.960
41.096
36.017
4.417
4.884
SD
1.138
1.242
1.082
6.540
19.042
15.000
1.715
3.877
Note: no information on free school meal eligibility available for individuals in 2000.
Table A2.2: All pupils in all sampled schools (ii), 2000
Variable
N
Missing
Mean* 100
SD
male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
91321
86629
86695
86742
86225
86362
86343
91321
4
4696
4630
4583
5100
4963
4982
4
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
86225
86362
86343
85852
91321
91321
91321
91321
5100
4963
4982
5473
4
4
4
4
50.156
68.104
63.455
58.702
69.305
65.027
60.055
52.374
Mean
5.452
5.524
5.276
32.797
41.156
36.018
4.411
4.892
1.131
1.231
1.075
6.460
19.006
14.967
1.713
3.869
Note: data are weighted with ONS school weight (sch_bwt)
- 109 -
Table A2.3: All pupils in responding schools (iii), 2000
Variable
Male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
Missing
Mean*100
27606
26248
26248
26260
26175
26173
26168
27606
2
1360
1360
1348
1433
1435
1440
2
49.512
70.371
66.155
61.910
71.657
67.619
63.391
54.595
1433
1435
1440
1615
2
2
2
2
Mean
5.529
5.611
5.359
33.294
42.424
36.950
4.511
5.073
26175
26173
26168
25993
27606
27606
27606
27606
SD
1.132
1.214
1.062
6.403
18.887
14.724
1.691
3.854
Note: data are weighted with ONS school weight (sch_bwt)
Table A2.4: All sampled pupils (iv), 2000
Variable
Male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
4912
4570
4572
4573
4559
4561
4560
4912
Missing
252
594
592
591
605
603
604
252
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
4559
4561
4560
4526
4912
4912
4912
4912
605
603
604
638
252
252
252
252
Mean*100
49.093
71.662
66.725
63.020
72.887
68.441
64.514
56.442
Mean
5.558
5.641
5.387
33.458
43.239
37.626
4.561
5.219
Note: data are weighted with ONS student weight (stu_wt)
SD
1.089
1.190
1.037
6.239
18.529
14.326
1.666
3.824
- 110 -
Table A2.5: All responding pupils (v), 2000
Variable
Male
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
3976
3687
3689
3690
3681
3686
3682
3976
Missing
144
433
431
430
439
434
438
144
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
3681
3686
3682
3660
3976
3976
3976
3976
439
434
438
460
144
144
144
144
Mean*100
49.903
73.729
69.075
65.187
74.854
70.731
66.550
59.186
Mean
5.604
5.705
5.436
33.766
44.620
38.790
4.686
5.452
Note: data are weighted with ONS student weight (stu_wt).
SD
1.063
1.164
1.014
6.062
17.403
13.276
1.559
3.749
- 111 -
Appendix 3: Summary statistics for private and public schools
The tables in this appendix show summary statistics separately for public sector and private schools in
2003 and 2000. KS3 scores are generally not available for pupils in private schools while GCSE
scores are available for the whole population.
Table A3.1: Number of pupils in private and public schools in 2003 and 2000
Pupils in public schools
Pupils in private schools
Missing
Total
2003
Freq.
Percent
649,190 93.00
48,841
7.00
6
0.00
698,037 100.00
2000
Freq.
Percent
516,578
92.82
39,966
7.18
4
0.00
556,548 100.00
Table A3.2: Summary statistics in public schools for 2000 and 2003
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
649190
644124
633542
635731
636171
632791
633724
633521
649190
632791
633724
633521
626215
649189
649190
649190
649190
2003
Missing
Mean
0
49.950
5066
13.777
15648
69.494
13459
70.232
13019
69.632
16399
70.854
15466
71.709
15669
71.184
0
53.281
16399
5.504
15466
5.771
15669
5.533
22975
34.073
1
41.757
0
35.255
0
4.204
0
5.161
SD
1.110
1.282
1.050
6.586
20.907
15.515
1.801
4.125
N
516578
504221
503798
504378
502131
501949
502019
516578
502131
501949
502019
498819
516578
516578
516578
516578
2000
Missing Mean
0
50.301
12357
12780
12200
14447
14629
14559
0
14447
14629
14559
17759
0
0
0
0
67.907
63.644
58.923
69.115
65.231
60.366
49.330
5.454
5.540
5.290
32.862
39.922
35.013
4.273
4.634
SD
1.134
1.237
1.080
6.495
18.643
14.648
1.665
3.844
- 112 -
Table A3.3: Summary statistics in private schools for 2000 and 2003
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
48841
177
9500
11403
9225
9235
11284
9011
48841
9235
11284
9011
11734
48841
48841
48841
48841
2003
Missing
Mean
0
50.906
48664
18.079
39341
81.695
37438
87.547
39616
83.154
39606
86.486
37557
89.817
39830
86.583
0
89.114
39606
6.115
37557
6.579
39830
6.098
37107
38.953
0
57.438
0
49.673
0
6.290
0
8.251
SD
1.129
1.197
1.037
6.547
17.887
13.468
1.327
2.647
N
39966
7296
9152
6658
7222
9056
6566
39966
7222
9056
6566
9784
39966
39966
39966
39966
2000
Missing Mean
0
50.941
32670
30814
33308
32744
30910
33400
0
32744
30910
33400
30182
0
0
0
0
83.004
83.315
76.449
85.364
85.645
79.165
87.900
6.085
6.368
5.857
37.970
56.260
48.997
6.275
8.105
SD
1.195
1.235
1.124
6.869
17.583
13.356
1.189
2.664
In 2000, Free School Meal data were not available in the pupil data file but in a separately provided
school file. This file included 3,172 public and 887 private schools and contained information on the
percentage of pupils (of all ages) eligible for free school meals in the school. On average, 17.6 percent
of pupils in public sector schools receive Free School Meals. Information was missing for 43 private
schools. In the other 844 private schools no pupil was receiving this benefit.
- 113 -
Appendix 4: Summary statistics for with adjusted data for 2003
This appendix gives the summary statistics for the 2003 national data base adjusted as far as possible
to the coverage of the 2000 data (see Chapters 2 and 5). The results are slightly different from those
reported in Chapter 3. These adjustments mean that pupils who were not entered for any GCSEs (or
equivalent) are excluded and statemented pupils are included.
Table A 4.1: All pupils in schools in England (i), 2003 adjusted
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
697411
644583
644389
648477
646769
644222
647232
644800
697411
Missing
6
52834
53028
48940
50648
53195
50185
52617
6
Mean * 100
50.34
13.61
69.40
70.31
69.79
70.67
71.65
71.21
56.07
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
644222
647232
644800
641070
697411
697411
697411
697411
Missing
53195
50185
52617
56347
6
6
6
6
Mean
5.50
5.77
5.53
34.09
43.29
36.67
4.41
5.41
SD
1.13
1.30
1.06
6.69
20.54
15.24
1.77
4.10
- 114 -
Table A 4.2: All pupils in sampled schools (ii), 2003 adjusted
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
116998
112953
112261
112719
112507
112190
112503
112183
116998
Missing
6
4051
4743
4285
4497
4814
4501
4821
6
Mean*100
49.621
12.460
69.781
70.804
70.346
71.035
72.194
71.806
56.170
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
112190
112503
112183
111362
116998
116998
116998
116998
4814
4501
4821
5642
6
6
6
6
Mean
5.515
5.794
5.556
34.220
43.299
36.841
4.427
5.399
SD
1.140
1.300
1.061
6.717
20.261
15.205
1.768
4.064
Note: data weighted with ONS school weight (sch_bwt)
Table A 4.3: All pupils in responding schools (iii), 2003 adjusted
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
35432
34384
34388
34364
34398
34249
34297
34280
35432
Missing
2
1050
1046
1070
1036
1185
1137
1154
2
Mean*100
47.596
11.886
70.189
70.848
70.696
71.513
72.289
72.262
55.567
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
34249
34297
34280
33936
35432
35432
35432
35432
1185
1137
1154
1498
2
2
2
2
Mean
5.505
5.782
5.548
34.131
43.028
36.547
4.387
5.346
Note: data weighted with ONS school weight (sch_bwt)
SD
1.114
1.285
1.033
6.546
20.082
14.921
1.734
4.075
- 115 -
Table A 4.4: All sampled pupils (iv), 2003 adjusted
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
5027
4753
4770
4769
4771
4755
4758
4758
5027
Missing
108
382
365
366
364
380
377
377
108
Mean * 100
46.234
11.259
71.696
72.191
71.558
72.76
73.517
72.853
57.064
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
4755
4758
4758
4719
5027
5027
5027
5027
Missing
380
377
377
416
108
108
108
108
Mean
5.548
5.817
5.568
34.319
44.044
37.441
4.469
5.464
SD
1.086
1.264
1.017
6.389
19.252
14.149
1.666
4.047
Note: data are weighted with ONS student weight (STU_WT)
Table A 4.5: All responding pupils (v), 2003 adjusted
Variable
Male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
N
3676
3457
3478
3474
3477
3469
3469
3469
3676
Missing
57
276
255
259
256
264
264
264
57
Mean * 100
46.187
10.397
74.482
75.329
74.413
75.551
76.663
75.606
61.266
SD
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
N
3469
3469
3469
3444
3676
3676
3676
3676
Missing
264
264
264
289
57
57
57
57
Mean
5.621
5.908
5.639
34.763
46.008
38.946
4.629
5.829
SD
1.072
1.245
0.998
6.294
18.296
13.19
1.583
3.971
Note: data are weighted with ONS student weight (STU_WT)
- 116 -
Table A 4.6: Summary
The following table summarises the results for the 2003 adjusted data (including statemented pupils,
excluding pupils with no GCSE entries) by giving mean values of all variables for each group of
pupils.
Variable
male
FSM
ELevel5
MLevel5
SLevel5
ELevel5fg
MLevel5fg
SLevel5fg
ks4_5ac
Mean * 100
50.34
13.61
69.40
70.31
69.79
70.67
71.65
71.21
56.07
ii
All pupils in
all sampled
PISA
schools
Mean * 100
49.621
12.460
69.781
70.804
70.346
71.035
72.194
71.806
56.170
Variable
ks3Efg
ks3Mfg
ks3Sfg
ks3_aps
ks4_pts
ks4_ptc
ks4_mean
ks4_ac
Mean
5.50
5.77
5.53
34.09
43.29
36.67
4.41
5.41
Mean
5.515
5.794
5.556
34.220
43.299
36.841
4.427
5.399
i
All pupils in
English
schools
iii
All pupils in
responding
PISA
schools
Mean * 100
47.596
11.886
70.189
70.848
70.696
71.513
72.289
72.262
55.567
iv
All sampled
pupils in
responding
schools
Mean * 100
46.234
11.259
71.696
72.191
71.558
72.76
73.517
72.853
57.064
Mean
5.505
5.782
5.548
34.131
43.028
36.547
4.387
5.346
Mean
5.548
5.817
5.568
34.319
44.044
37.441
4.469
5.464
v
Responding
pupils
Mean * 100
46.187
10.397
74.482
75.329
74.413
75.551
76.663
75.606
61.266
Mean
5.621
5.908
5.639
34.763
46.008
38.946
4.629
5.829
Note: data in columns (ii) and (iii) are weighted with ONS school weight (sch_bwt) and data in columns (iv) and (v) with
ONS student weight (STU_WT).
- 117 -
Appendix 5: Bias analysis with adjusted data for 2003
Table A 5.1: Difference in means between responding and non-responding schools in the initial
sample: approach (a), 2003 adjusted
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Mean*100
48.04
11.42
72.40
72.90
72.97
53.67
Nonresponding
school
Mean*100
46.27
19.80
72.70
72.08
73.15
57.54
KS3_aps
KS4_mean
Mean
34.44
4.56
Mean
34.51
4.56
Responding
school
Difference
p
Observations
-1.77
8.38
0.3
-0.82
0.18
3.87
0.78
0.04
0.95
0.86
0.97
0.57
185
175
182
183
182
185
0.07
0
0.94
0.99
182
185
Note: data are weighted with ONS school weight (sch_bwt). This table can be compared to Table 3.11 that shows results
for unadjusted 2003 data.
Table A 5.2: Differences in means for pupils in all responding schools and pupils in second
replacement non-responding schools: approach (c), 2003 adjusted
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
Mean*100
47.60
11.89
71.51
72.29
72.26
55.57
Nonresponding
school
Mean*100
53.72
16.78
65.33
69.50
67.56
56.60
KS3_aps
KS4_mean
Mean
34.13
4.39
Mean
33.77
4.50
Responding
school
Difference
p
Observations
6.12
4.89
-6.18
-2.79
-4.7
1.03
0.00
0.00
0.00
0.00
0.00
0.24
40,031
38,601
38,417
38,564
38,452
40,031
-0.36
0.11
0.01
0.00
38,155
40,031
Note: data are weighted with ONS school weight (sch_bwt). This table can be compared to Table 3.19 that shows results
for unadjusted 2003 data.
- 118 -
Table A 5.3: Differences in means between responding and non-responding pupils, 2003
adjusted
Responding pupil
Male
FSM
ELevel5fg
MLevel5fg
SLevel5fg
KS4_5ac
KS3_aps
KS4_mean
Obs.
3,676
3,457
3,469
3,469
3,469
3,676
mean*100
46.19
10.40
75.55
76.67
75.61
61.27
1,275
1,351
Mean
34.76
4.63
Non-responding
Difference
pupil
Obs.
Mean*100
1,351
46.36
0.17
1,296
13.56
3.16
1,286
65.24
-10.31
1,289
65.06
-11.61
1,289
65.45
-10.16
1,351
45.63
-15.64
0
Mean
3,444
33.12
-1.64
3,676
4.03
-0.6
p
Observations
0.94
0.00
0.00
0.00
0.00
0.00
5,027
4,753
4,755
4,758
4,758
5,027
0.00
0.00
4,719
5,027
Note: data are weighted with ONS student weight (STU_WT). This table can be compared to Table 3.20 that shows results
for unadjusted 2003 data.
- 119 -
Appendix 6: Peer group effects are biased downwards in PISA data?
The two stage sample design of PISA – sampling of schools and then of pupils within schools –
means that the survey can be used to estimate ‘peer group’ effects on pupils’ achievement. In other
words, the survey can be used to estimate the impact on an individual’s learning of the composition of
the individual’s peer group at school, i.e. the pupils with whom the individual is at school.
Suppose we define an individual’s peer group as all other persons of the same age in that individual’s
school. PISA provides only a random sample of the peer group: a total of only 35 pupils are sampled
in each school, out of what in England is on average about 150 to 200 pupils aged 15 per school. As a
consequence, measures of peer group composition based on the PISA sample in each school are
subject to measurement error. Since the samples drawn within each school are simple random
samples, the form of this measurement error is close to the textbook case of normally distributed
measurement error in an explanatory variable. (The precise situation is not quite this since the
measurement error is highly correlated for all individuals within the same sampled school and the
issue of any response bias also needs to be considered.)
In a simple OLS regression model with one explanatory variable, the well known result is that
measurement error in that variable leads to a downwards bias in its estimated coefficient. In a multiple
regression model with measurement error restricted to just one explanatory variable, the coefficient of
that variable is again biased downwards and all other coefficients are also biased but in unknown
directions (e.g. Greene 1993: 280-4).
In general, therefore, researchers using PISA data to estimate peer group effects (e.g. OECD 2001:
chapter 7) will obtain estimates of those effects that are subject to measurement error bias since they
have information on only a random sample of each individual’s peer group. That measurement error
will also contaminate other parameter estimates. The size of these biases will be unknown.
In contrast, we have been able to merge PISA data in this report with national school population data.
We therefore have information on the complete peer group of 15 year olds in each school rather than
just the sample provided by PISA. Hence we are able to calculate summary measures of the peer
group based on this complete group. We can then compare regression results obtained with peer group
variables calculated with just the PISA sample of pupils in the same school with results obtained with
variables calculated with the complete peer group. A comparison of the results will show the extent of
the measurement error bias in peer group effects obtained from use of PISA sample data alone.
We focus on the socio-economic background of the peer group and measure this for each individual
with the proportion of all other 15 year old pupils who receive free school meals (FSM) in the
individual’s school. We calculate this proportion using (1) the population of all (other) 15 year olds in
each school, which is the true figure, and then (2) the (responding) PISA sample of (other) 15 year
olds in the school, which is an estimate. The analysis is restricted to pupils in state schools, since FSM
is not received by pupils in private schools.
Table A6.1 presents summary statistics for 3,318 responding pupils in the sample. (The sample size is
slightly smaller than the original PISA sample size for state sector pupils since not all individuals
could be matched to the England national data base – see Chapter 2.) The first variable is a simple
dummy variable for individual FSM receipt and its mean shows the average FSM eligibility in the
sample – 10 percent. The other two variables measure the proportion of 15 year old pupils in the peer
group of each individual who receive FSM, one measure based on all 15 year old peers in the
individual’s school and the other on the peers in the PISA sample for the school.
- 120 -
Table A6.1: Summary statistics of FSM variables used in the regression model
5th
25th
75th
95th
0.41
0.02
0.05
0.16
0.32
0.47
0
0.04
0.15
0.29
Variable
Mean
SD
Min
Max
FSM receipt dummy
Proportion of peer
group with FSM (all
15 yr olds in school)
Proportion of peer
group with FSM
(PISA sample)
0.10
0.30
0.00
1.00
0.11
0.09
0
0.10
0.10
0
Obs
3,318
pupils
Note: the first variable shows the proportion of 3,318 PISA pupils with FSM; the second variable shows the actual
proportion of 15 year old peers with FSM; the third variable shows the proportion of peers in the PISA sample with FSM.
Table A6.2: Regression results: PISA maths score, 2003
FSM in receipt
Female
Mother has at least secondary
education
Mother has tertiary education
Mother’s education missing
More than 100 books at home
proportion of peer group with FSM
(all 15 year olds in the school)
(1)
-20.66
(2)
-27.58
(5.48)
(5.95)
-10.05
-10.73
(3.84)
(4.12)
5.05
9.18
(3.98)
(4.57)
15.34
16.52
(3.41)
(3.43)
-23.59
-21.72
(6.30)
(6.74)
43.71
48.15
(3.29)
(3.68)
-217.76
(29.85)
-98.84
proportion of peer group with FSM
(15 year olds in the PISA sample)
(23.66)
509.18
489.25
(6.32)
(6.43)
Observations
3318
3318
R-squared
0.20
0.17
Constant
Note: standard errors in parentheses. Clustering of pupils in schools is taken into account when
estimating standard errors. All variables other than the peer group FSM measures are dummy variables.
- 121 -
0
200
Frequency
400
600
800
Figure A6.1: Histogram of differences between the proportion of peer pupils in receipt of Free
School Meals among all 15 year olds and the proportion in the PISA sample
-.4
-.2
0
.2
Difference
Note: 3,318 pupils in the state sector only are included.
Figure A6.1 graphs the difference between the proportion of peers in receipt of FSM among all 15
year olds and the proportion among the PISA sample. The histogram is overlaid with a normal
distribution with the same mean and variance as for the data. A positive difference indicates that the
proportion among all 15 year olds in the school is higher than in the sample of responding pupils. As
one would expect, there is a distribution of both positive and negative values. The mean is positive
and the distribution displays some non-normality. Further investigation showed that these features are
the result of response. If we take the difference between the measure in the population of peers and
that for all sampled pupils (rather than just the responding pupils), the histogram approximates the
normal much more closely and the mean of the distribution is very close to zero.
We estimate a simple regression model of the sort typically used for estimating peer group effects.
Pupils’ PISA maths scores in 2003 are regressed on dummies for gender, mother’s educational
attainment, whether more than 100 books are reported in the home, and whether a pupil receives FSM.
In addition, we include the proportion of the peer group in receipt of FSM, defining the peer group as
other 15 year olds in the individual’s school. In Model 1 this is measured using all 15 year olds in the
school. In Model 2 the variable is based on the PISA sample for the individual’s school.
Individual receipt of FSM is estimated in Model 1 to reduce the PISA maths score by 20 points,
holding other factors constant. (Strictly speaking, it is just ‘associated with’ a reduction of this size,
since we are not asserting causality.) This is quite a big effect, roughly equal in absolute size to the
positive impact of the mother having had tertiary education rather than having failed to complete
secondary education (i.e. the sum of the coefficients on the two maternal education dummies).
- 122 The Model 1 results show that greater receipt of FSM in the peer group also has a negative impact on
learning achievement (and the estimated coefficient is well determined). The effect is a large one: a
rise of one standard deviation in the proportion of the peer group in receipt is associated with a
reduction in the maths score by about the same amount as individual receipt of FSM: 20 points.25
The results for Model 2 show that the estimated peer group effect halves when we measure social
background by FSM receipt among just the other 15 year olds in the PISA sample for each
individual’s school. The estimated coefficients for all other variables (except the education missing
dummy) rise, sometimes quite appreciably (e.g. for individual FSM receipt, mother’s secondary
education and books at home where the coefficients change by at least one standard error).
These results need further investigation. A range of models needs to be experimented with, using
different (including richer) specifications and exploring other measures of peer group composition.
But the prima facie evidence appears to be that measurement error bias of peer group effects is an
issue to be taken seriously in analysis of PISA data.
25
Note that we put aside issues of interpretation of this coefficient, i.e. whether it really does represent the impact of peers
or whether the peer group variable proxies other factors such as school selection or school resources.
Copies of this publication can be obtained from:
DfES Publications
P.O. Box 5050
Sherwood Park
Annesley
Nottingham
NG15 0DJ
Tel: 0845 60 222 60
Fax: 0845 60 333 60
Minicom: 0845 60 555 60
Online: www.dfespublications.gov.uk
© The University of Southampton 2006
Produced by the Department for Education and Skills
ISBN 1 84478 764 8
Ref No: RR771
www.dfes.go.uk/research