R ESEARCH Response Bias in England In PISA 2000 and 2003 John Micklewright Sylke V. Schnepf Southampton Statistical Sciences Research Institute (S3RI) University of Southampton Research Report RR771 Research Report No 771 Response Bias in England in PISA 2000 and 2003 John Micklewright Sylke V. Schnepf Southampton Statistical Sciences Research Institute (S3RI) University of Southampton The views expressed in this report are the authors’ and do not necessarily reflect those of the Department for Education and Skills. © The University of Southampton 2006 ISBN 1 84478 764 8 -ii- Contents Executive summary.......................................................................................................iii 1 Introduction............................................................................................................1 2 Data ........................................................................................................................7 3 4 5 6 2.1 Data sets used for the analysis .......................................................................7 2.2 Weights ........................................................................................................15 2.3 Summary ......................................................................................................28 Bias analysis – 2003 data.....................................................................................31 3.1 Analysis at five stages of sampling and response........................................31 3.2 Response bias at school and pupil levels .....................................................40 3.3 Using OECD weights and our own response weights .................................50 3.4 Response bias in PISA scores ......................................................................56 Bias analysis – 2000 data.....................................................................................63 4.1 Analysis at five different stages of sampling and response .........................63 4.2 Response bias at school and pupil levels .....................................................69 4.3 Using the OECD weights and our own response weights ...........................76 4.4 Response bias in PISA scores ......................................................................80 Changes over time................................................................................................85 5.1 Summary of results for different stages of sampling and response .............85 5.2 Response bias at school and pupil levels .....................................................88 5.3 Associations between PISA scores and KS3 and KS4 scores .....................91 Conclusions and recommendations......................................................................99 References..................................................................................................................103 Appendix 1: Summary statistics for five groups of pupils in 2003 .......................105 Appendix 2: Summary statistics for five groups of pupils in 2000 .......................108 Appendix 3: Summary statistics for private and public schools............................111 Appendix 4: Summary statistics with adjusted data for 2003................................113 Appendix 5: Bias analysis with adjusted data for 2003 .........................................117 Appendix 6: Peer group effects are biased downwards in PISA data?..................119 -iii- Executive summary The Programme for International Student Assessment (PISA) is an international survey of the learning achievement of 15 year olds. Each participating country (mainly OECD members) draws samples of schools and then samples of pupils from within these schools. Response rates in England to PISA 2003 were well below the average for OECD countries: 64 percent for schools initially sampled and 77 percent for pupils, compared to an OECD average of 90 percent at both school and pupil levels. Results for the UK were not included in the international report prepared by OECD on the first results from PISA 2003 on account of concerns that the data for England suffered from response bias. Our analysis investigates the pattern of response for England in 2003 in much more depth than was possible at the time of the decision taken by the OECD to exclude the results for the UK. We also investigate response in England to PISA in 2000, the first year of the survey, when response rates were at similar levels to those in 2003. In contrast to 2003, results for the UK were included in the international report for 2000 prepared by the OECD. We consider the degree of bias in estimates of average scores, the dispersion of scores, and the percentage of the population with scores at or above a given threshold. The analysis exploits information on measures of learning achievement that are available for all schools and all pupils, both those that respond and those that do not: results from Key Stage 3 (KS3) tests and Key Stage 4 (KS4) exams (GCSEs or their equivalent). These and other data, such as Free School Meals receipt, are taken from the Pupil Level Annual Schools Census and the national pupil database. We show that KS3 and KS4 scores for respondents are strongly correlated with PISA scores. The average correlation coefficient is 0.76 in 2003. For each year, 2003 and 2000, we first provide results for five groups: i) the population of all 15 year old pupils in England schools ii) all pupils in sampled schools iii) all pupils in responding schools iv) sampled pupils in responding schools v) responding pupils Comparison of these different groups requires that the data are appropriately weighted. We focus on results using weights that only take into account the design of the survey. However, we provide some limited comparisons for responding pupils that use the weights developed by the OECD. These allow for the levels of response by schools and by pupils and also for the variation of response by schools with their average GCSE results. We also develop our own weights that go beyond those of the OECD by taking into account the variation of the response probability of individual pupils with their GCSE results. -ivThe impact of pupil response is similar in the two years: in moving from group (iv) to group (v), mean achievement scores rise and the standard deviation falls. The combination of these effects results in under-representation of pupils with low achievement. The impact of school response is less clear and its assessment is complicated by the phenomenon of the replacement of schools that do not respond. In 2000 mean scores rise between group (ii) and group (iii) by a similar amount to between groups (iv) and (v), whereas in 2003 they fall slightly. In 2003, the percentage of pupils with 5 or more GCSEs at grades A*-5 rises by 5.2 percentage points between groups (i) and (v), with the bulk of the increase coming as a result of pupil response. In 2000, the same figure rises by 7.1 percent points with the increases between groups (ii) and (iii) and between groups (iv) and (v) accounting for 2.2 and 2.7 percent points respectively. We test for statistically significant differences in average scores and the dispersion of scores between responding and non-responding schools and between responding and non-responding pupils. In both years, responding pupils have higher average scores and less dispersed scores than pupils who do not respond. The differences between these two groups of students are statistically significant, meaning that they are very unlikely to have arisen by chance. This is true both as regards the mean and the variance of scores. By contrast, the differences in the means between responding and non-responding schools are typically insignificant in both years if we take the school as the unit of analysis. If we analyse the pupils within the schools, then significant differences in means are found in 2000 (with average scores higher in responding schools) but results for 2003 are mixed. On the other hand, the variance of scores appears higher in non-responding schools in 2003 but not in 2000, again taking the school as the unit of the analysis. We then estimate the degree of response bias in summary statistics of PISA scores recorded for responding pupils. We focus mainly on trying to estimate bias that results from the pattern of response by pupils rather than the pattern of response by their schools, the latter having been established to be less important, at least in 2003. We use three methods: a z-score transformation of KS3 or KS4 scores (which takes no account of the actual relationship between these domestic achievement variables and PISA scores), a regression adjustment method (based on a regression of PISA scores on KS3 or KS4 scores of respondents), and the application of our own response weights. Our own weights are based on a statistical model that summarises the relationship between pupil response and domestic achievement variables. (These weights also take account of the differences in the response rates of schools by their average achievement levels.) The three methods give reasonably similar results. Mean PISA scores in reading, maths and science are estimated to be biased upwards by about 6 points in both years. And the standard deviations of scores are estimated to be biased downwards by about 3 or 4 points, again in both years. These amounts are both about twice the size of the standard errors estimated by the OECD for the means and standard deviations. In contrast to the OECD view stated in the international report for the 2003 PISA round – a view based on less extensive evidence – we believe the direction and magnitude of the biases to be reliably estimated. -v - The OECD reports unfortunately do not make explicit the criteria for inclusion for a country that has been asked to investigate the extent of response bias. However, the 2003 report appears to suggest that the estimated bias in key parameters should be less than the estimated standard error. On this criterion, the England data for 2003 fail the test. However, we estimate that the bias in mean scores would have shifted England’s position in a ranking of countries by only one place. The 2000 data also appear to fail a test based on a criterion of bias relative to standard error (although the impact on England’s position in a ranking by mean scores is again only slight). However, the response rate for pupils in England in 2000 did just clear the minimum required threshold. This illustrates the dangers of the OECD practice of setting arbitrary thresholds for satisfactory response rates. It also raises the question as to the likely response biases in other countries participating in PISA with pupil response rates that were little higher than those in England. The same question could be asked of the 2003 data for other countries. We judge the data for England for both 2000 and 2003 to be valuable despite the biases. The return on the investment made in the collection of the data can be enhanced by the provision of suitable weights to adjust for the variation of pupil response with domestic achievement scores along the lines of those we have experimented with in this report. Summary of recommendations: 1. Analysis of response in future waves of PISA should take place on data sets that are prepared and documented at an early stage. 2. Future analyses of response bias should investigate the impact of (i) school replacement and (ii) the use of weights that take into account different response levels within school strata. 3. Consideration should be given to whether it is practical to stratify samples of pupils within schools by a domestic measure of achievement (KS3 or KS4). 4. DfES should consider ways of raising response among pupils and not just among schools. It is the response biases at the pupil level that emerge most clearly in our study. 5. Criteria for exclusion of any country from the international reports for PISA need to be made explicit by OECD and clearly justified. Evidence for a decision on exclusion needs to be published. 6. Response weights for responding pupils in England in both 2000 and 2003 that are based on a statistical model of pupil response of the type we present in this report should be provided for users of the data. 7. The OECD should be engaged in discussion of whether adjustment for response bias using post-stratification response weights could be used in the future to avoid excluding a country from the international report. -vi- -1 - 1 Introduction The Programme for International Student Assessment (PISA) co-ordinated by the OECD is a cross-national survey of the learning achievement of 15 year olds. Together with other international surveys such as TIMSS and PIRLS, PISA makes it possible to compare children’s learning achievement across different countries. Results for the UK were not included in the international report on the first results from PISA 2003 (OECD 2004) on account of concerns that data for England suffered from response bias. (Scotland and Northern Ireland are surveyed separately in PISA but the data are combined with those for England to provide estimates for the UK in the international reports. There is no separate survey for Wales, which participated in the 2003 survey for England pro rata to its share of the population in the two countries.) The present report investigates the pattern of response for England in 2003 in order to look more closely into possible biases than was possible when the decision to exclude results for the UK from OECD (2004) was taken. It also considers the possible response bias in England in 2000, which was the first round of the PISA, when results for the UK were included in the international report on the survey (OECD 2001a). 1.1 The PISA survey and response in England PISA has a two stage sampling design, with sampling of schools and then of pupils within schools. At the first stage, schools with 15 year olds are sampled with probability proportional to their size. In 2003 the sample for England comprised 570 schools: 190 ‘initial schools’, all of which were approached to take part in PISA, together with two potential replacements for each of these initial schools drawn from the same stratum. If an initial school declined to participate in the survey, its ‘first replacement’ school was approached. If this first replacement school did not respond, the second replacement school was asked to participate. The final school sample comprised all the responding schools. At the second stage, a simple random sample of 35 pupils aged 15 was drawn from each responding school, after excluding certain small categories of pupils not eligible for inclusion in the survey. If a school had fewer than 35 pupils of this age, all of them were approached to take part in the survey. There was no replacement of nonresponding pupils. Table 1.1 shows the levels of response in England at both school and pupil levels in 2000 and 2003. In both years, response at both school and pupil levels fell well below the average for all OECD countries in the survey. Automatic inclusion of countries into the international report for PISA depends on the level of response achieved: • Minimum response rate of 85 per cent of the schools initially selected or where the initial response rate of schools was between 65 and 85 percent an ‘acceptable’ -2 school response rate, where this level varies depending on the initial response rate achieved. (The level rises by one percentage point for every half a percentage point the initial response rate falls below the 85 percent threshold.) An initial school response rate smaller than 65 percent is ‘not acceptable’. • Minimum response rate of 80 per cent of students within participating schools. If a country does not meet these requirements it is requested to give evidence that its sample of pupils is representative. This request was made of England in both 2000 and 2003. In 2000, England did not meet the 85 percent response rate requirement at the school level but met (just) the 80 percent threshold for student response. After evidence was provided on the characteristics of responding schools, the UK was included into the 2000 international report. In 2003, England met neither the school nor the student response requirement. The evidence from the analysis of response bias that was provided by DfES to OECD in early 2004 (an analysis undertaken by us) was judged by the latter to indicate potential problems: “The results of a bias analysis provided no evidence for any significant bias of school-level performance but did suggest that there was potential non-response bias at student levels. The PISA Consortium concluded that it was not possible to reliably assess the magnitude, or even the direction, of this non-response bias and to correct for this….The uncertainties surrounding the sample and its bias are such that scores for the United Kingdom cannot be reliably compared with those of other countries. They can also not be compared with the performance scores for the United Kingdom from PISA 2000” (OECD 2004: 328). The result was that the UK was excluded from the 2003 international report, although the microdata for England, as well as for Scotland and Northern Ireland, are made available on the OECD website. Table 1.1: Response rates at school and student levels in 2000 and 2003 (%) PISA 2000 OECD England average School response before replacement School response after replacement Student response PISA 2003 OECD England average 59 86 64 90 82 92 77 95 81 90 77 90 Source: response rates are averaged across all OECD countries participating in PISA in 2000 and 2003. Response rates for OECD countries are given in OECD (2001a) and OECD (2004); response rates for England are given in Gill et al (2000) and DfES (2005). -3 - 1.2 Aims of the study Our aims are to analyse the patterns of response to PISA in England in 2000 and 2003, both among schools and among pupils, and to estimate the extent of any resulting response biases. One key aim is to show that the magnitude and direction of biases in the 2003 data are possible to estimate reliably, contrary to the statement in OECD (2004) cited above. The focus is on how response varies with measures of learning achievement that are available for all schools and all pupils, both those that respond and those that do not. England is unusual for having national tests and public exam records: scores from Key Stage 3 (KS3) tests and Key Stage 4 (KS4) exams (GCSEs or their equivalent). The KS3 and KS4 scores of respondents are strongly correlated with their PISA scores. We find correlation coefficients in the range 0.7 to 0.8. These measures therefore provide a very useful source of information with which to compare respondents and non-respondents and to construct weights that adjust for differential response by achievement levels. The possibility of response bias in PISA data (or in data from any other such survey) is often discussed without specifying the population parameter for which there is concern that there might be a biased estimate. But ‘bias’ only has a meaning if we state what it is that might have bias. Implicitly it seems that many people are concerned with possible bias in the estimate of population mean achievement. Average achievement is clearly of interest and is the most common basis for ranking countries in surveys such as PISA, at least in popular debate. But we should also be interested in the degree of inequality in achievement within countries. We therefore consider both bias in estimates of the mean and bias in estimates of dispersion, the latter measured by the standard deviation of achievement scores. One might also be interested in bias in the estimate of the percentage of pupils at or above a given level of achievement, e.g. with 5 or more GCSEs (or equivalents) at grades A*-C, and we include some results on this basis. We consider the impact of non-response on summary statistics of ‘domestic’ measures of achievement (KS3 and KS4). We also estimate the extent of response bias in summary statistics of PISA scores that can be calculated from the final sample of responding pupils. Besides considering possible response biases, we were also asked to investigate how the characteristics of all sampled schools and all sampled pupils, whether or not they responded, differed from the populations from which they were drawn. These comparisons show the impact of sampling error on the estimates of population parameters, not of response bias. While these results are of interest, it should be noted that sampling error is feature of any sample survey and that there are well established methods for taking its impact into account, notably through the calculation of confidence intervals. The analysis of response bias in England in 2003 that DfES reported to OECD and which led to the UK data being excluded from the international report was undertaken by us in early 2004. This work was done over a brief period of time and was not published. The analysis we have undertaken in 2005-6 that is presented here has been -4 in much greater depth. One major advance over our earlier work is that we compare the domestic achievement scores of respondents with their PISA scores. Our work in 2004 took place before PISA scores for respondents were available. Our terms of reference for the current report have also included analysis of response in 2000 and a comparison with 2003. In the case of the 2000 data we have been able to exploit information on domestic achievement scores of sampled pupils that was not available to the team undertaking the bias analysis at the time of the 2000 survey (which had access only to summary scores for schools). We thus are able to substantially extend that earlier analysis of the 2000 data. 1.3 Limitations of the study The work we have carried out has several limitations. As a result, this report is neither the last word in the analysis of the pattern of response to PISA in 2000 or 2003 nor is it comprehensive in its analysis of actual or potential adjustments to the data for response bias. First, we remained uncertain about aspects of the national databases with which we were provided, of the weights supplied with the PISA files, and of a number of other aspects of the data. (These issues are documented in Chapter 2.) These uncertainties mean that we have more confidence in our comparisons of sampled respondents with non-respondents (whether schools or pupils) than the comparisons between either of these groups and the populations from which the samples were drawn. Second, we have not considered how biases introduced by the pattern of response at the school level are affected by the strategy of replacement of non-responding schools by other schools. As noted by Sturgis et al (2006), this is an open issue since the assumption that replacement schools are a perfect match for non-responding schools (since they come from the same achievement stratum) is one that is untested. Third, we have made only limited analysis of whether weights calculated by OECD help resolve problems of response bias at school or pupil level (a subject not included in our terms of reference). Schools were stratified by GCSE results in the school sampling frame. In calculating weights to apply to the data, OECD took into account the average school response rate within each stratum, i.e. within each of several bands of summary school GCSE scores. (Details are given in Chapter 2.) Suppose that low achievement schools (in GCSE terms) have response rates that on average are lower than those of high achievement schools. In the absence of any adjustment to the data, this would introduce an upward bias into an estimate of mean achievement based on the responding schools only. However, application of the OECD school weights should in principle compensate for this. The low achieving schools will receive a larger weight, thus bringing down the estimate of overall mean achievement. At the pupil level, the weights calculated by OECD take into account the response rate of pupils in each school (details in Chapter 2). In other words, pupils in schools where a below average number of pupils respond receive a higher weight than pupils in other schools. However, in contrast to the school level there is no stratification of sampling at the pupil level by domestic achievement scores within schools. Hence the adjustment for non-response within each school does not vary according to the -5 achievement levels of the responding pupils. Every pupil within the school receives the same weight. Most of our results are based on use of simple design weights only, which account for the design of the survey (these weights are described in Chapter 2). However, we have included some limited comparisons of results using these design weights with results using the OECD weights (Chapters 3 and 4). We also experiment with an approach to adjustment for non-response at the pupil level that goes beyond that of the OECD. We estimate logistic regression models of the probability that a sampled student responds to the survey as a function of individual and school characteristics, including domestic achievement scores (Key Stage 3 and Key Stage 4 scores). A new adjustment factor for the weights of respondents is then computed from the reciprocal of the predicted probabilities of response. We think that these weights perform well and their application provides an additional method of estimating the extent of response bias in PISA scores. Their use also allows response bias to be taken into account in other analyses of the data. However, the experimental nature of these weights represents another limitation of our study. A final limitation is that we have not tried to comment in detail on the OECD’s criteria for inclusion or exclusion of countries from the published PISA reports. We do have something to say on the matter at the end of Chapter 3, where the reason for the decision to exclude the UK data from the 2003 report is deduced. (This seems to be based on a consideration of the components of ‘total survey error’.) We comment again briefly on the subject in Chapter 6 where we note that criteria for exclusion need to be spelt out clearly in future OECD reports. 1.4 Outline of the report Our terms of reference asked us to compare five different groups of pupils. i) ii) iii) iv) v) the population of all 15 year old pupils in England schools all pupils in sampled schools all pupils in responding schools all sampled pupils in responding schools responding pupils (the final sample of pupils in the OECD database) Chapter 2 describes the data we use to do this, including the available weights. Chapters 3 and 4 provide results for KS3 and KS4 scores for the five groups listed above for PISA 2003 and PISA 2000 data respectively. Comparison of groups (ii) and (iii) and of groups (iv) and (v) provides implicitly a comparison of respondents and non-respondents. We also explicitly compare results for the samples of responding and non-responding schools and for the samples of responding and non-responding pupils. These two types of comparison are represented by the vertical and horizontal lines in Figure 1.1. Chapters 3 and 4 each conclude with our estimates of the extent of response bias in summary measures of PISA scores in the year concerned. We also comment on the decision taken to include or exclude the data for the UK in the international reports. -6 - Chapter 5 compares results for the two years and investigates whether there was a change in the relationship between PISA scores and domestic English achievement scores between 2000 and 2003. Chapter 6 concludes and gives our recommendations. These relate to future analyses of response bias, the use of the existing data from 2000 and 2003, and the design of the survey in England in the future. Figure 1.1: PISA survey design and groups of pupils analysed All 15 year old pupils in England schools (i) Sampling variation All pupils in sampled PISA schools (ii) School response All pupils in responding schools (iii) All pupils in nonresponding schools Sampling variation Sampled pupils (iv) Pupil response Responding pupils (v) Non-responding pupils -7 - 2 Data We first describe the various data sets we use in the report and then discuss the weights that can be applied to the data. 2.1 Data sets used for the analysis 2.1.1 Different data sets received Data on pupils in all English schools derive from two sources of national pupil data. The first is the Pupil Level Annual School Census (PLASC), a census of pupils in English schools conducted in January of each year since 2001. PLASC covers information such as pupils’ dates of birth, ethnicity, free school meal (FSM) eligibility, entry to GCSEs, statementing for special educational needs (SEN) etc, as well as identifying their school. However, the PLASC data do not contain any information on results of achievement tests and exams – KS3 and KS4. Information on test and exam results are contained in the national pupil database, the second source of national pupil data that we use. The main data base used in our analysis, which we refer to as the ‘national database’, is a product of merging PLASC data with achievement data from the national pupil database. Pupils eligible for PISA 2000 were all children born in calendar year 1984. Pupils eligible for PISA 2003 were all children born in calendar year 1987. However, in the case of 2003, the PISA testing window in England was extended by two months. In line with this, the national database with which we have worked for that year constitutes all pupils born in a 14 month window, January 1987 to February 1988. In this national database, 14 percent of pupils have a 1988 birthdate. However, just under 3 percent of pupils in the PISA sample were born in 1988. The population level data and PISA data we have used were supplied to us on behalf of DfES by the Fischer Family Trust and the Office of National Statistics (ONS). Table 2.1 Data used in the analysis Level of analysis 2000 2003 All 15 year olds in England schools All sampled schools All sampled students School average of pupils eligible for free school meals Pupil School Pupil School X X X X X X X We have a number of data sets for the two years. First, the national database for each year (supplied by the Fischer Family Trust) contains information on all 15 year old students (less exclusions discussed below). -8 Since PLASC was first collected in 2001, data on FSM eligibility and on pupils with SEN are not in the 2000 file. Only pupils with at least one GCSE (or equivalent) entry (not necessarily a pass) are in the file. This means that we do not have a national pupil data set that is the same for the two years we want to analyse, 2000 and 2003. The national database includes unique student and school identifiers allowing the file to be merged with the other two data files (or three in 2000). Second, the school file (supplied by ONS) contains information on all schools that were sampled for PISA. The file includes the same school identifier as in the national pupil database. It gives information on whether the school was an initial school in the PISA sampling procedure, a first replacement school, or a second replacement, whether the school was actually approached to participate, and, if approached, whether the school responded or not. The file also contains a variable indicating the level of within-school pupil response, since schools with pupil response rates of below 25% are treated as non-responding schools in PISA. Third, the PISA student file (supplied by ONS) contains information on all sampled pupils in schools that responded to PISA. The file contains a unique student identifier, so that it can be merged with the national database in order to add information on pupils’ domestic achievement scores etc. The file also includes a variable indicating whether the student responded to PISA. Since we compare domestic achievement scores with the PISA scores of the responding pupils, we needed in addition to be able to merge the file of sampled PISA students supplied by ONS with the OECD data set for responding pupils available on the OECD website. The 2000 data for all sampled students contained the appropriate unique PISA pupil identifier, so that it was possible to merge these data with the OECD data. In 2003, the file of sampled students provided by ONS did not contain this identifier. Instead, an additional file on responding pupils was provided that included their PISA scores and that contained the unique ONS pupil identifier. As noted earlier, the 2000 national database lacked information on FSM, GCSE entries and statementing of pupils. However, we were supplied with an additional school file for 2000 giving the percentage of all pupils in the school eligible for free school meals (FSM). This meant that while we still lacked information for 2000 for individual pupils on FSM receipt, we at least had available information at the school level. (Note that this information is based on all pupils in the schools concerned and not just the 15 year olds, while our school level analysis involving FSM for 2003 is based on the percentage of 15 year olds in the school in receipt of FSM.) The process of obtaining, checking, cleaning and merging the data files took about three months, which was much longer than planned. In total we were sent over 20 different versions of one or other of the data files. Files were often inadequately documented, with insufficient information on the rows and/or the columns of data, i.e. information on which schools or pupils the file contained (rows), and on the definitions of the variables in the data (the columns). Critical variables (e.g. on response) were sometimes missing, or a file turned out to be composed only of a certain type of student. We conclude that the problems of using administrative databases (PLASC and the national pupil data base) as a research resource were underestimated by both ourselves and DfES. Staff turnover at ONS meant that people -9 who had been responsible for the 2000 and 2003 PISA sampling had left the organisation with obvious consequences for knowledge of the relevant files for our analysis. 2.1.2 Target and survey populations The ideal national data set on 15 year olds in England schools for our analysis would include all those pupils and schools and only those pupils and schools in the PISA ‘survey population’, i.e. the target population less permitted exclusions. PISA allows the exclusion of certain schools and pupils so that the survey population is smaller than the target population. Such an ideal is very difficult to achieve in practice. First, there is the issue of whether the national database includes all those pupils and schools in the target population. In the case of the 2003 national database, this seems to be the case. However, the 2000 national database excluded the small minority of students who had not been entered for any GCSEs (or their equivalent). Second, there is the issue of whether the national database includes only those in the survey population. The identification of pupils who met PISA exclusion criteria is not straightforward. Exclusions of pupils in PISA took place after sampling of pupils. That is, sampled pupils who met the criteria for exclusion were discarded. Our problem was to try to exclude pupils from the national database who, if sampled, would have been excluded from PISA. Exclusion criteria Countries were advised to keep the overall exclusion rate below 5% of their pupil population (OECD 2005a: 47). Pupils or entire schools could be excluded if the following criteria were met1: • Exclusion of schools from the target population Schools could be excluded if they were in particularly remote area, if they had very few eligible students or if it was not feasible to conduct the assessment for whatever reason. However, an entire school could be excluded if it catered exclusively for students with special educational needs or for students who were non-native speakers of the language of assessment. • Exclusion of pupils from the target population Pupils within a sampled school could be excluded if they had limited proficiency in English, if they had special educational needs or learning difficulties or if they were permanently functionally disabled such that they could not perform in the PISA test. 1 We refer here to the criteria described in the 2003 report (OECD 2004:320 -322). Exclusion criteria described in the 2000 report (OECD 2001a: 232) seem to be similar but in some cases are formulated somewhat differently. -10However, students could not be excluded due to their poor academic performance alone, discipline problems or temporary physical disability. Table 2.2: Reported exclusions from PISA for the UK (2003) and England (2000) 2003 (UK) As % of Obs. population Total population of 15 year olds School exclusions Weighted pupil exclusions School and pupil exclusions Final survey population 2000 (England) As % of Obs. population 736,785 24,773 100 3.36 580,846 16,121 100 2.78 15,062 2.04 15,309 2.64 39,835 696,950 5.41 94.59 31,430 549,416 5.41 94.59 Source: OECD (2001b: 136), OECD (2005a: 168). Figures for 2003 refer to the UK. Figures for 2000 refer to England only. Note: The calculation of percentages in the reports differs slightly. In the 2003 report the percent of weighted pupil exclusion is estimated on the basis of the number of total population minus school exclusions. In 2000, percentages are estimated based on weighted numbers of included and excluded pupils. The population total refers to the number of 15 year olds enrolled in schools in grades 7 or above, which is referred to as the eligible population. The student exclusion took place after PISA sampling. The weighted number of excluded students (15,062) gives the overall number of students in the national defined target population represented by the number of students excluded (270) from the sample. The school-level exclusions in the UK in 2003 accounted for 3.36 percent and the pupil exclusion for 2.04 percent of the target population.2 The overall exclusion rate of pupils was 5.4 percent in both 2003 and 2000. Adaptation of the national database to the PISA survey population Our aim was to mirror as far as possible these exclusions in our national database files. The school exclusion was done by omitting all schools from the analysis that were marked as special schools. In the case of the 2000 data, this was done by the Fischer Family Trust before we received the data.3 In the case of the 2003 data the exclusion was done by us. Proxying the pupil level exclusions was harder to effect. The decision was made with DfES counterparts that we would exclude all statemented pupils. However, this could only be done (and was done by us with the data we received) for 2003, since the 2000 national database lacked SEN status of pupils. Another possibility would have been to exclude also pupils who have no GCSE entries. Indeed as noted the 2000 national database already excluded such pupils. However, these pupils were retained for the 2003 analyses, except when we wished to put the 2003 data on the same basis as those for 2000 (see below). 2 Exclusion of pupils took place after pupils were sampled (hence at the second stage of sampling). Hence, the exclusion rate for the population derives from weighting these pupils in responding schools up in terms of how many pupils they represent in the population. 3 Other than 101 pupils in schools remaining in the database that were identified as special schools, which we excluded. -11Table 2.3 shows the effect of our 2003 exclusions of schools and pupils, which led to the dropping of 2.41 percent (special schools criterion) and 2.31 percent (statemented criterion) of pupils respectively, leading to an overall exclusion rate of 4.73 percent. This rate is almost as high as the actual rate of 5.40 percent in 2003 described above. The last row of the table shows how many pupils would have been excluded in 2003 if, in addition, pupils with no GCSE entries had been excluded. Table 2.3: Exclusions of pupils from the national database4 2003 Obs All 15 year olds in educational institutions School exclusion (special schools) Pupils exclusion (statemented) School and pupil exclusion Pupil exclusion (no GCSE entry) School and noGCSE exclusion Estimated final survey population School exclusion, pupils exclusion (statemented, no GCSEs) % of population 732,550 732,550 17,664 2.41 16,951 2.31 34,615 2003 (Ch. 5 trend analysis) % of Obs population 17,664 2000 Obs % of population 556,645 2.41 101 0.02 Not possible to identify and exclude 4.73 697,935 95.27 50,206 6.85 17,559 2.40 35,223 4.81 697,327 95.19 Not included in the data set 556,544 To summarise, the 2003 and 2000 national databases have important differences. The 2000 data set on the one hand excludes pupils with no GCSE entries and on the other includes statemented pupils (who cannot be identified). These differences would make a direct comparison of 2003 with 2000 inconsistent. We therefore constructed another 2003 data set that was adjusted to the constraints of the 2000 data. For this data set, we excluded pupils with no GCSE entries from the 2003 data but included statemented pupils. (All pupils in special schools remained excluded.) This dataset was used only when we directly compare 2000 and 2003 results in Chapter 5. 4 The data refers to the national data set as we received it, not to the cleaned national data set on which calculations are based. However, only slight cleaning was undertaken, so that results should change little. (The data merging important for cleaning would probably increase artificially the population size (due to observations that could not be matched), which would lead to an underestimation of exclusion rates.) ? -12Finally, we have to recognise that 91 statemented pupils were included in the 2003 PISA sample. We retain these pupils in our analyses of 2003 data.5 2.1.3 Exclusion of sampled pupils from the analysis Responding schools with a within-school pupil response rate of less than 25 percent were excluded from the PISA sample. Consequently, responding pupils in these schools were not included in our analyses of non-response at the pupil level. However, these pupils were included in our analyses up to that stage in the survey process (sampling and response of schools and sampling of students). In total, 105 pupils in 2003 were excluded from the pupil response analysis; which is about 2 percent of all sampled pupils. The 2000 pupil file supplied by ONS did not contain any pupils in schools with less than a 25 % within-school response rate. Hence we do not know how many pupils were excluded in that year. 2.1.4 Merging and cleaning of data sets The different data sets at the pupil and school level described in Section 2.1.1 were merged for the analysis. This merging served also to help check the data in so far as overlapping information was present in the different data sets. (For example a pupil who sat the PISA test and is included in the OECD file should also be marked as a ‘participating student’ in the ONS file.) Over a process of about three months we obtained and checked files that included conflicting information and therefore needed replacement. At the end of the process, the 2000 data sets were free of any conflicting information. The 2003 data files however still contained some problems, so some moderate cleaning of the data was conducted. 2000 In 2000, 543 schools were sampled (including replacement schools) of which 155 schools both responded themselves and then had pupil response rate of greater than or equal to 25 percent (and were therefore defined to be eligible schools). 5,406 pupils in these schools were sampled and 4,120 pupils responded. In a first step, we merged the original OECD PISA data file (downloadable from the OECD website and containing pupils’ achievement scores, OECD weights etc.) with the ONS data file on all sampled PISA pupils. We could attribute successfully PISA scores and weights from the OECD file for England to all 4,120 responding pupils in the ONS pupil file. 5 There are two possible explanations why some statemented pupils who were not part of the PISA target population took part in the PISA assessment. Schools send a list of all 15 year old students to ONS where the random sample of pupils was selected and send back to schools. ONS advised schools not to include those sampled pupils who were statemented. Some schools might not have followed this advice. Another explanation could be that pupils were statemented after they sat the PISA test. -13In a second step, we merged the resulting pupil file from the first step with the ONS school file giving information on all sampled PISA schools. All 155 responding schools of the pupil file could be merged successfully with the ONS school file. Overlapping information provided in the two files was coherent. The resulting file from step two was then merged with the national data set (containing national achievement scores for all 15 year old pupils in England) provided by the Fischer Family Trust. As a result, 340 pupils sampled for PISA could not be merged because their student identifier was not included in the national data base. In practical terms this means that it was impossible to obtain any information on domestic achievement scores (KS3 and KS4) for this group of PISA pupils. As a consequence, these pupils had to be excluded from the subsequent analysis. Another problem was found in relation to the school identifiers. In the national data set, the school identifiers refer to the school where the pupil took the KS4 public exams (GCSEs and GNVQs). This happened a few months after the PISA test was conducted. Some pupils might have changed schools before taking their KS4 exams. First, pupils could have moved to a PISA school after the PISA test was conducted. Those pupils cannot be identified, although their numbers should be low. Second, pupils sampled for PISA might have moved to another school for the KS4 exams. As far as we could identify students with conflicting information on school identifiers we corrected their school identifier in the national file. 539 out of 543 sampled PISA schools could be identified and therefore successfully merged with the national pupil data set. The final school file for the school analysis included therefore 539 sampled PISA schools and 3,531 other schools. The proportion of all pupils with Free School Meal eligibility in schools (provided by an additional file from DfES in 2000) could be successfully attributed to all responding and non-responding PISA schools. 2003 In 2003, 570 PISA schools were sampled (including replacement schools). 159 schools (with 5,123 sampled pupils) responded and had pupil response greater than or equal to 25 percent. In these schools, 3,766 pupils responded and were therefore included into the final OECD PISA pupil file. In a first step, the ONS file containing sampled pupils was merged with the OECD file on PISA pupils for England (downloadable from the OECD website). 14 out of 3,766 responding pupils could not be merged on basis of the student identifiers provided. For a small number of pupils, the data files provided conflicting information regarding pupils’ response status and we changed some values as we saw fit. In a second step, the pupil data from the first step were merged with the ONS data file on sampled PISA schools. One school identifier was wrongly coded in the file of sampled pupils. After cleaning all 175 schools of the pupil file could be matched with the file for sampled PISA schools. -14In a third step, we merged the file on sampled PISA pupils and schools with the national file by using the supplied student identifiers. As a result, 71 out of 3,776 responding pupils could not be matched with the national file and therefore had to be excluded from the analysis. The merged data set also revealed that 50 responding PISA pupils were either statemented or in special schools and should therefore in principle not have been part of the survey. However, given that these pupils had been included in the PISA sampling we also included them into the analysis. The national file was missing information for 6 out of 570 schools given in the ONS school sample file. The final school file for the school analysis included therefore 564 sampled PISA schools and 3,637 other schools. Final sample sizes and missing values for the entire data set are given in Table A1.1 of Appendix 1 for 2003 and Table A2.1 of Appendix 2 for 2000. The purpose of the merging of data from PISA files and from national pupil files is to be able to compare characteristics of respondents and non-respondents, whether schools or pupils. Appendix 6 gives an example of a further use of the merged data, which is to investigate whether peer group effects estimated with PISA data are biased downwards. 2.1.5 Information on students Table 2.4 lists key information we have on students from the national databases. There is a large array of measures of achievement provided by KS3 and KS4 (GCSE or equivalent) scores. In subsequent chapters we focus in particular on ‘Ks3_aps’, ‘Ks4_pts’, and ‘Ks4_mean’. These are respectively (i) the average points score across the three KS3 subjects – English, Maths and Science, (ii) the total points scored in GCSEs (or equivalent), and (iii) the average points per subject scored in those GCSEs (or equivalent) that were taken by the student. Other measures listed in the table include the popular summary measure ‘Ks4_ac’, which is a 0/1 variable indicating whether the student has 5 or more GCSE passes at grades A* to C. (Students in general will have taken their GCSEs after the period when the PISA assessment was carried out – the database includes all GCSE results irrespective of when taken.) -15- Table 2.4: Information on students from the national databases Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg Ks4_5ac = 1 if male (0= female) = 1 if pupil is eligible for free school meals (0 otherwise) = 1 if pupil achieved English KS3 level 5 or above (simple level data) = 1 if pupil achieved Maths KS3 level 5 or above (simple level data) = 1 if pupil achieved Science KS3 level 5 or above (simple level data) = 1 pupil achieved English KS3 level 5 or above (fine grade data) = 1 pupil achieved Maths KS3 level 5 or above (fine grade data) = 1 pupil achieved Science KS3 level 5 or above (fine grade data) = 1 if pupil has 5 or more GCSE/GNVQ A*-C passes Ks3Efg Ks3Mfg Ks3Sfg Ks3_aps Ks4_pts Ks4_ptc Ks4_mean Ks4_ac English Key Stage 3 level (fine grade data) Maths Key Stage 3 level (fine grade data) Science Key Stage 3 level (fine grade data) KS3 average point score over English, Maths and Science Total GCSE/GNVQ point scores Capped GCSE\ GNVQ point scores (best 8 subjects) Mean GCSE / GNVQ point score across all subjects where GCSE/GNVQ taken Number of GCSE/GNVQ A*-C passes 2.2 Weights PISA samples schools with probability proportional to their size and then samples 35 pupils aged 15 with equal probability within each school. All 15 year old pupils in the school are selected if a school has less than 35 pupils of this age. In principle, such a sample design is self-weighting. The higher probability of being selected of schools with many pupils is balanced by the lower probability of any individual pupil within a larger school to be among the 35 selected for that school. The lower probability of small schools to be selected is balanced by the higher probability of any individual pupil within these schools to be among the 35 selected. However, in practice the self-weighting is not perfect. School size may turn out to differ from that indicated in the school sampling frame when the sampling probabilities were determined. And there is the issue of the very small schools with less than 35 pupils. Hence in practice weights are needed at both school and pupil levels to take account of the survey’s design. These are ‘design weights’. In addition, weights may be constructed to allow for the levels of response by schools and by pupils, and also for the pattern of this response. Our terms of reference asked us to compare five different groups of pupils at different stages of the sample design. i) ii) iii) iv) v) the population of all 15 year old pupils in England schools all pupils in sampled schools all pupils in the sample of responding schools all sampled pupils in responding schools responding pupils (the final sample of pupils in the OECD database) -16- The issue of the appropriate weights to apply at each stage (ii)-(v) arises. The data sets we received from ONS contained various weights at the school and individual level. However, the files lacked sufficient documentation on what these weights are intended to be and how they were constructed. For example, we were unclear whether the school weights are intended to be pure design weights or whether they also take into account response levels within the various strata, most notably average levels of achievement within schools. We therefore decided to investigate the weights supplied with the data and to compare them with suitable benchmarks. There are two sets of weights that we can use as benchmarks. First, the final OECD samples of schools and pupils in England in 2000 and 2003 (i.e. the data available from the OECD website) contain both school and pupil weights, the construction of which is described in the international reports for PISA published by the OECD. Second, given that we have population data available, we can construct what we call ‘theoretical’ design weights in the light of the PISA sample design. 2.2.1 Theoretical design weights in the PISA two stage design 2.2.1.1 School weights Let N be the number of schools in the population and n the number of sampled schools. Let Mi be the total number of 15 year olds in school i and mi the number of sampled pupils in the school (which should be 35, unless the school has fewer than this number of 15 year olds). The design weight for any school i should be the reciprocal of the school’s selection probability, pi. The latter is given by: (1) pi = n*Mi N ∑M i =1 i Hence, the school design weight, wi, should be: N (2) wi = ∑Mi i =1 n*Mi We call this weight wi the “theoretical school weight”. We do not apply it in any of the analyses that follow in subsequent chapters of this report but it serves as a benchmark for examining the weights supplied by ONS. -17The other benchmark is the OECD school weights. These adjust for all other relevant factors besides sample design, notably the difference between actual school size and the size indicated in the frame and average levels of response within the strata. Some of our analysis at stages (ii) and (iii) uses the individual pupils within schools as the unit of analysis. The question arises of which weight to apply. The correct weight is the product of the reciprocals of the school selection probability and the pupil selection probability. Since the analyses at stages (ii) and (iii) comprise all pupils in the selected schools, a pupil’s probability of being included given their school is included equals 1.0 in all cases. Hence, when conducting analyses at these stages, we can simply apply whatever school weight we use to each pupil. 2.2.1.2 Pupil weights The probability of pupil j in school i being selected for the sample, pij, is the product of the probability of his/her school being selected, pi, and the probability of the pupil being among the (in general) 35 pupils sampled in the pupil’s school. The latter is equal (in general) to 35 divided by the total number of 15 year olds in the school: (3) pij = pi * mi n * M i mi n*m * = N = N i Mi M ∑Mi i ∑Mi i =1 i =1 The pupil weight, wij, is then the reciprocal of pij: N (4) wij = ∑M i =1 i n * mi We call this weight the ‘theoretical student weight’. Note that given the PISA sample design, the theoretical student weight, wij, is a constant for almost all pupils, since mi equals 35 in almost all schools. This reflects the ‘self-weighting’ principle of the two stage design. However, in small schools with less than 35 pupils aged 15, wij will be bigger than in other schools, producing some variation in the values of the theoretical student weight. Given that small schools had a lower probability of being sampled, only a small number of pupils in the sample will have a weight different from that of the great majority of pupils in the sample, who are in schools with 35 or more 15 year olds. This means that applying the theoretical student weight will yield very similar results to not applying any weight at all. Again, as with the theoretical school weight, we do not use the theoretical student weight in any analyses in subsequent chapters but it does give us a comparator for the values of the pupil weights supplied by ONS. -18- 2.2.2 ONS weights received with the data Table 2.5 describes the ONS weights we received in the 2000 and 2003 data bases. 6 Table 2.5: ONS school and pupil weights in 2000 and 2003 School weight sch_bwt sch_wt All schools sampled for PISA Name of variable Availability Pupil weight STU_WT in 2003 stu_wt in 2000 All sampled pupils In both years two school weight variables, ‘sch_wt’ and ‘sch_bwt’, were included in the ONS files. These weights were present for all sampled schools, including all initial, first and second-replacement schools, irrespective on whether the school was actually approached. The ONS pupil weight (‘STU_WT’ in 2003 and ‘stu_wt’ in 2000) was present for all sampled pupils (whether responding or not responding) in responding schools. We were not provided with a description of these weights. The school weight variable sch_wt was equal to 1.0 for almost all schools in both 2000 and 2003. This variable could not have been constructed as described in equation (2). The alternative was the other variable that was supplied with the data, sch_bwt. In both 2000 and 2003 we could reproduce its values for each school, giving us confidence that it is a suitable weighting variable. 2.2.2.1 School weight Although we could reproduce the values of the ONS school weight that we received with the data, its calculation did not in fact follow exactly that of the theoretical weight described above in equation (2). Its calculation is shown in equation (3) (5) sch _ bwt i = 1 Mi Values of Mi were provided in the database in a variable called ‘mos’ which gives the estimated number of 15 year old pupils in schools at the time of the construction of the sampling frame. Our calculation of the theoretical school weight is also based on this ‘mos’ variable. (5) differs from (2) only by a constant factor: the number of pupils in the population divided by the number of sampled schools. (We do not know why this constant factor was not included in the ONS calculation.) As a consequence, the values of sch_bwt are very small and do not correspond to the inverse of a schools’ probability of being sampled. However, this difference by a constant factor only means that sch_bwt, 6 The school weight was provided in the school file including all sampled PISA schools. The pupil weight was provided in the student file containing all sampled PISA pupils. -19based on (2), is perfectly correlated with the theoretical weight, based on (5). We therefore used sch_bwt in most of our school level analyses. 2.2.2.2 Student weight The ONS student weight is meant to be the product of the ONS school weight and the reciprocal of the within-school selection probability (the number of 15 year old pupils sampled divided by the number of 15 year old pupils attending the school): (6) sch _ bwt i * Mi 1 = mi mi In general the same number of pupils was sampled in each school: 35. Hence, given (6) the student weight for each pupil in each school is in general the same. Using equation (6), we were able to reproduce all the values of the supplied ONS student weight for 2000, ‘stu_wt’. However, we could not reproduce the values of the 2003 weight, ‘STU_WT’. This leads to some uncertainty as to how the 2003 variable was calculated. For example, we do not know whether the student weight for 2003 takes the level of response within schools into account. Figure 2.1 and Table 2.6 shed light on the differences between a student weight calculated with (6) and the student weight variable actually provided for 2003, ‘STU_WT’. Figure 2.1: Scatter plot of student weight ‘STU_WT’ provided by ONS for 2003 and a student weight estimated with equation (6) Values up to 95th percentile of eq(6) estimates Values above 95th percentile of eq (6) estimates 0.055 weight based on equation (6) weight based on equation (6) 0.035 0.03 0.025 0.05 0.045 0.04 0.035 0.03 0.025 0 0.01 0.02 0.03 0.04 ONS student weight (STU_WT) 0.05 0.06 0 0.01 0.02 0.03 0.04 0.05 ONS student weight (STU_WT) The first scatter plot shows values up to the 95th percentile of a variable based on equation (6). This weight is a constant within this range of values, equal to 1/35. However, the student weight ‘STU_WT’ provided by ONS varies for these pupils. Hence it must take account of other things than those entering equation (6). 0.06 0.07 -20The second scatter plot is restricted to those pupils with values of the weight based on equation (6) that are above the 95th percentile. These are students in schools with less than 35 pupils aged 15. Table 2.6: Summary statistics of 2003 student weight variables Variable STU_WT Values based on equation (6) 5653 Std. Min Max Dev. .0302 .0048 .0092 .0643 0.024 5653 .0290 .0029 .0286 0.0286 0.0286 0.0286 0.0286 Obs Mean .05 5th 25th 75th 95th 0.0285 0.0314 0.0355 Faced with these findings, our first guess was that the values of STU_WT were perhaps the same as those of the OECD student weight, that takes non-response at both school and pupil levels into account (see below). However, these two weights, STU_WT and the OECD student weight, have a correlation of only 0.62. In our student level analyses we have used the supplied 2000 weight, stu_wt, which we can replicate, and the supplied 2003 weight, STU_WT, which we cannot replicate. The fact that we could not replicate the 2003 weight causes us some disquiet. 2.2.3 OECD weights Table 2.7 presents the school weights and pupil weights available in the OECD database (available for download from the OECD website). Table 2.7: OECD school and pupil weights in 2000 and 2003 School weight Pupil weight Name of variable scweight in 2003 w_fstuwt wnrschbw in 2000 Availability All schools responding in All responding students PISA The calculation of the OECD weights differ from the theoretical design weights in equations (2) and (4). The details of the calculations are given in OECD (2005a: 107117 and 2001b: 89-98). We focus here only on those additional factors included in the OECD weight that adjust for non-response. School non-response adjustment The OECD school weight adjusts for school non-response. We give only the gist of the calculation here (details are given in OECD 2005a: 111-113). Schools were grouped into LEA maintained, grant maintained and independent schools. (The situation described is that for 2003.) The LEA maintained schools were further stratified into six groups and grant maintained schools into three groups based on the proportion of students in the school gaining five or more GCSEs at grades A*-C. -21Independent schools were divided more simply into coeducational schools, boys only schools and girls only schools (DfES 2005). (These groups often corresponded to the strata that were used in sampling.) For each of these groups, or ‘weighting classes’, the school non-response adjustment factor in the OECD weight is the number of pupils in (a) participating schools and (b) non-responding schools in the initial sample that were not replaced divided by the number of pupils in the participating schools. Hence, if a school in the initial school sample could not be replaced by a first or second-replacement school, schools in the same weighting class as this school received a greater weight. The weighting factor depends on the number of pupils in the school that could not be replaced. Given that these weighting classes were constructed on the basis of GCSE results this school non-response adjustment is de facto an adjustment based on schools’ average achievement levels. Student non-response adjustment Achievement levels of students were not taken into account for the OECD student non-response adjustment. In general, the student response adjustment factor was the ratio of the number of students who were sampled to the number of students who responded. For example, in a school were 35 pupils were sampled but only 25 pupils responded, the OECD student weight would include a factor equal to 35/25. All pupils who responded in the school would receive the same weight. The final OECD student weights take into account both the non-response adjustment at the school level and at the student level. 2.2.4 Comparison of ONS weights, OECD weights and theoretical design weights How do the weights we have described compare with each other? The theoretical student weight is equal to 1.0 for all pupils in schools with 35 or more 15 year old pupils. The ONS student weight should be perfectly correlated with this theoretical student weight, and we have found that to be the case for 2000 but not 2003. The OECD weights takes non-response at school and student level into account and we therefore expect them to show more variation than the other weights. 2.2.4.1 Weights in 2003 Table 2.8 shows values of the school weights for all schools sampled in 2003 (original sample of 190 together with the first and second replacement schools) and Table 2.9 shows values for all responding schools. The correlations are shown in Table 2.10.7 7 For the estimation of the theoretical student weight, the population size was set to a value of 697,935 which equals the final survey population minus exclusions (see Table 2.3). The number of sampled schools was set to 190. -22- Table 2.8: Summary statistics of weights for all sampled schools (initial, first replacement, second replacement), 2003 Variable Theoretical weight (wi) ONS school weight (sch_bwt) Obs8 Mean Std. Dev. Min Max 5th 25th 75th 95th 560 25.83 17.45 6.09 104.95 12.49 16.15 28.04 61.24 560 .0070 .0048 .0017 .0286 0.0034 0.0044 0.0076 0.0167 Table 2.9: Summary statistics of weights for responding schools, 2003 Variable Theoretical weight (wi) ONS school weight (sch_bwt) OECD school weight (scweight) Obs Mean Std. Dev. Min Max 5th 25th 75th 95th 161 23.82 16.52 6.34 104.95 12.67 15.97 24.01 44.80 161 .0065 .0045 .0017 .0286 0.0034 0.0043 0.0065 0.0122 161 23.72 17.13 5.61 125.66 12.34 15.10 25.30 43.73 Table 2.10: Correlation matrix, school weights, 2003 ONS school weight (sch_bwt) ONS school weight (sch_bwt) Theoretical school weight (wi) OECD school weight (scweight) Theoretical school weight (wi) OECD school weight (scweight) 1.000 1.000 1.000 0.848 0.848 1.000 Note: the size of the sample used in the calculation of the correlation coefficients varies according to the number of valid values for the variables concerned. The values of all three weights vary substantially due to differences in the sizes of the sampled schools. The values of the ONS weight are very small since it lacks the constant factor in equation (2), as discussed above. However, it is perfectly correlated 8 In the original OECD data base, the number of schools is 159 and not 161. The higher school number derives from the merging process. Some pupils have changed schools after sitting the PISA test which impacts upon the attribution of pupils to schools. However, results of using the OECD weight from OECD data base are very similar to those given here with a mean of 23.83 and a standard deviation of 17.20. -23with the theoretical school weight. The mean and standard deviation of the OECD school weight are similar to those of the theoretical school weight. The correlation between the two is relatively high, 0.85. The left hand scatter plot in Figure 2.2 shows the pattern of association behind this correlation, which is reduced by a few outliers. One factor that leads to differences between the theoretical school weight and the OECD weight will be the adjustment of school non-response taken into account in the latter. The right hand scatter plot shows the association of the OECD school weight with school size. (The theoretical school weight and the ONS weight have an even closer inverse relationship with school size.) Figure 2.2: ONS and OECD school weights and school size, 2003 OECD weight and school size 0.03 140 OECD school weight (scweight) ONS school weight (sch_bwt) OECD school weight and ONS school weight 0.025 0.02 0.015 0.01 r=0.85 0.005 0 0 50 100 150 120 100 80 60 40 20 0 0 OECD school weight (scweight) 200 400 600 School size (mos) 800 Note: the unit of analysis is the school. Summary statistics for the student weights in 2003 are given in Table 2.11 and correlations in Table 2.12. Table 2.11: Summary statistics for student weights in the final student sample, 2003 Variable Theoretical pupil weight wij ONS weight (STU_WT) OECD weight (w_fstuwt) Obs Mean Std. Dev. Min Max 5th 25th 75th 95th 3631 106.96 11.61 104.95 183.67 104.95 104.95 104.95 104.95 3617 .0302 .0052 .0092 .0643 0.0241 0.0285 0.0313 0.0347 3631 154.86 50.34 61.46 579.69 97.58 124.11 176.48 236.71 As discussed earlier, the theoretical student weight is a constant up to the 95th percentile. By contrast, both the ONS and OECD weights show considerable -24variation. As discussed above, we do not know why the ONS student weight varies in this way. However, for the OECD student weight we know that non-response at school and student level will help generate variation. Correlation coefficients between the three weights are relatively low: 0.58 between ONS and theoretical student weight and 0.62 between the OECD and ONS student weights. Table 2.12: Correlation matrix, student weights, 2003 ONS student weight STU_WT ONS student weight STU_WT Theoretical student weight wij OECD student weight w_fstuwt Theoretical student weight wij OECD student weight w_fstuwt 1.000 0.578 1.000 0.618 0.306 1.000 Note: the size of the sample used in the calculation of the correlation coefficients varies according to the number of valid values for the variables concerned. The association between the ONS weight and the OECD weight is shown in Figure 2.3. Figure 2.3: OECD student weight and ONS student weight, 2003 ONS student weight (STU_WT) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 OECD student weight (w_fstuwt) 2.2.4.2 Weights in 2000 Tables 2.13 and 2.14 present summary statistics for school weights in 2000 for all sampled schools and for all responding schools. -25- Table 2.13: Summary statistics of weights for all sampled schools (initial, first replacement, second replacement), 2000 Variable Theoretical weight wi ONS school weight (sch_bwt) Obs Mean Std. Dev. Min Max 5th 25th 75th 95th 543 22.36 14.25 5.08 87.85 11.06 14.37 25.00 51.25 543 .0073 .0046 .0017 .0286 0.0036 0.0047 0.0081 0.0167 Table 2.14: Summary statistics of weights for responding schools, 2000 Variable Theoretical weight wi ONS school weight (sch_bwt) OECD school weight (wnrschbw) Obs Mean Std. Dev. Min Max 5th 25th 75th 95th 153 20.66 11.55 5.08 87.85 10.60 13.85 25.00 38.44 153 .0067 .0037 .0017 .0286 0.0034 0.0045 0.0081 0.0125 154 27.35 42.70 0 523.17 8.84 15.83 29.08 51.15 Note: for 105 pupils in the national data set no pupil identifiers are available so that they cannot be merged with the OECD data. Data of all pupils in one school are not available, leading to the smaller sample of schools with 154 in the England data set. The merging of the databases is not perfect due to missing school identifiers. The mean of the OECD school weight (wnrschbw) is 27.26 and the standard deviation 42.55 if the original OECD data file is used. As for the 2003, we see considerable variation in the values of all three weights.9 Table 2.15 presents the correlation coefficients. The ONS school weight and the theoretical school weight are perfectly correlated. The OECD school weight has a correlation of 0.72 with both other weights, an association that is demonstrated graphically in Figure 2.4 and which is affected by one obvious outlier. Figure 2.4 illustrates the inverse relationship of the OECD weights with school size. 9 Oddly, five schools in the 2000 data have a value of 0 for the OECD weight, meaning that those schools are excluded from the analysis if school weights are applied. Pupils in these schools however have an OECD student weight greater than 0. -26- Table 2.15: Correlation matrix, school weights, 2000 sch_bwt wi Sch_bwt wi OECD school weight 1.0000 1.0000 0.7201 OECD school weight 1.0000 0.7201 1.0000 Note: the size of the sample used in the calculation of the correlation coefficients varies according to the number of valid values for the variables concerned. ONS school weight (sch_bwt) 0.03 0.025 0.02 0.015 r=0.72 0.01 0.005 0 0 100 200 300 400 500 600 OECD school weight (wnrschbw) Figure 2.4: ONS and OECD school weights and school size, 2000 OECD school weight and ONS school weight OECD weight and school size 600 500 400 300 200 100 0 0 OECD school weight (wnrschbw) 200 400 600 School size (mos) 800 Tables 2.16 and 2.17 give summary statistics of the student weights in 2000. The theoretical student weight is constant for most pupils. The ONS student weight is also almost a constant in 2000 and is correlated perfectly with the theoretical student weight. However, the OECD weight shows considerable variation. Figure 2.5 compares ONS and OECD weights. The ONS student weight appears constant but for two data points, which represent two schools with 41 pupils in total. The OECD student weight displays considerable variation. Table 2.16: Summary statistics of student weights in final student sample, 2000 Variable ONS weight (stu_wt) Theoretical weight wij OECD weight (w_fstuwt) Obs Mean Std. Dev. Min Max 5th 25th 75th 95th 4015 .0287 .0012 .0286 .05 0.0286 0.0286 0.0286 0.0286 4015 88.13 3.84 87.85 153.74 87.85 87.85 87.85 87.85 4120 135.98 48.61 33.14 588.57 54.58 115.91 155.56 206.89 -27- Table 2.17: Correlation matrix, pupil weights, 2000 ONS weight (stu_wt) Theoretical weight wij OECD weight (w_fstuwt) ONS weight (stu_wt) 1.0000 ONS weight (stu_wt) 1.0000 1.0000 0.5008 0.5008 ONS weight (stu_wt) 1.0000 Figure 2.5: OECD student weight and ONS student weight, 2000 ONS student weight (STU_WT) 0.06 0.05 0.04 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 OECD student weight (w_fstuwt) Note: the unit of analysis is the pupil. 2.2.5 Summary of available weights and their properties • The PISA sample design is approximately self-weighting as far as analysis at the student level is concerned. Weights that are the inverse of the product of the exante school and pupil selection probabilities generally take the same value, with departures only for the small minority of pupils in schools that are attended by less than 35 15 year olds. • In both 2000 and 2003 we could reproduce the ONS school weights which were supplied with the data. These weights are pure sampling weights that take no allowance of levels of school response. • We could reproduce the ONS student weight in 2000. Bar a constant factor, the values of this weight are identical to the student design weight that we calculated for the survey in that year, which displays very little variation due to the approximate self-weighting nature of the survey design. However, we could not reproduce the ONS 2003 student weight, which displays more variation. -28- • The values of the OECD school and student weights differ considerably from the ONS weights. One reason for this is that the OECD weights incorporate adjustment for non-response at the school level (an adjustment that varies with the school GCSE results) and at the pupil level (where the adjustment does not vary with individual pupil achievement). 2.3 Summary We use a variety of datasets that have come from different sources. The reader is reminded of the following features of these data: • For both 2000 and 2003 we have national databases that attempt to mirror the PISA survey population – the target population less exclusions. The exclusions cannot be perfectly implemented but we presume a greater degree of success with the 2003 data. The 2000 dataset includes statemented pupils. It excludes all pupils with no GCSE (or equivalent) entries (who are in the PISA survey population). • The 2003 national database is composed of pupils who had their 15th birthday during a 14 month period. The additional two months beyond the 12 months taken for the 2000 data was to allow for the extension of the PISA study window in England in 2003. While 14 percent of the national database had birthdays in the additional two months, this was the case for less than 3 percent of the PISA sample of responding pupils. • The national databases for both 2000 and 2003 included information both on KS3 and KS4 scores. KS3 tests are not typically conducted in private schools hence this information is missing for most private school pupils in both years (see Appendix 3). We explored the school and student weights supplied with the files of PISA data at some length. These ‘ONS weights’ are available for all sampled schools and pupils, irrespective of response. The purpose and construction of these weights was unclear. Two school weight variables were provided with the data. We discarded one of these as being unsuitable. The other seemed to be a suitable design weight that we were able to reproduce. We could also reproduce the ONS student weight for 2000, which also appears to be a pure design weight, but we could not reproduce the ONS student weight for 2003. Given the PISA sampling design, the ONS student weight in 2000 displays virtually no variation and hence the application of this weight in the student level analyses – stages (iv) and (v) in the list at the start of this chapter – has virtually no effect. The student weight provided for the 2003 database displays more variation. We apply this weight for the 2003 analyses but we also report unweighted results. Our terms of reference did not include any comparison of the impact of the ONS weights and the OECD weights. The latter incorporate adjustment for non-response – an adjustment that in the case of schools varies with their GCSE exam results. Hence -29there is a good case for comparing the use of ONS and OECD weights when calculating summary statistics of achievement scores and assessing the extent of possible response bias in estimates of key population parameters. We include some limited comparisons of this type in the chapters that follow. We also compute student response adjustment factors that go beyond those that enter the OECD weight calculations, drawing on the results of logistic regression models of the probability of pupil response. These logistic regression models are described in Chapters 3 and 4 for the 2003 and 2000 data respectively. This gives us an alternative set of weights that includes a more detailed adjustment for pupil response. -30- -31- 3 Bias analysis – 2003 data Our analysis for each year of data, 2000 and 2003, is in four parts. First, following our terms of reference, we summarise key domestic achievement variables (KS3 and KS4) at five stages of the PISA survey process, from the full population of all pupils in all schools through to the final sample of responding pupils in responding schools. Second, we conduct a more detailed analysis of differences in domestic achievement scores between responding schools and non-responding schools, and between responding pupils and non-responding pupils. This includes tests for statistically significant differences between respondents and non-respondents. Third, we investigate the impact of weights that adjust for response on results for the domestic achievement variables. In doing so we briefly examine the extent to which results using the OECD weights differ from results using the ONS weights, which do not adjust for non-response. This sheds light on whether the OECD weights help resolve any problem of response bias in the data. We also show results with a weighting variable that we construct ourselves which goes further than the OECD weights in its adjustment for the pattern of response at the pupil level. Finally, we estimate the extent of response bias in mean PISA scores in the final sample of responding pupils with three different methods, which include the application of our weights. In the light of these results, we then comment on the decision to exclude the UK from the international report on PISA 2003. 3.1 Analysis at five stages of sampling and response Table 3.1 lists the five stages in the PISA survey process (see Chapter 1, Figure 1.1). Table 3.1: Description of five stages of PISA Stage i) All 15 year olds in schools in England ii) All pupils in all sampled schools iii) All pupils in responding schools iv) All sampled pupils v) All responding pupils Description All pupils in the population who were not excluded (see Chapter 2) All sampled schools, including initial schools, first and second replacement schools All pupils in all responding schools. (Responding schools with fewer than 25% of pupils responding were excluded.) All sampled pupils in responding schools covered at (iii) (Statemented pupils in the PISA sample are included.) All responding pupils. (Pupils in the final PISA data set. (Statemented pupils in the PISA sample are included.) -32The description of each stage summarises how we implemented empirically the concepts shown in the first column. We have commented on some of the issues in Chapter 2. But we have yet to mention a key choice concerning group (ii), ‘all pupils in all sampled schools’. PISA has a system of replacement of schools that do not respond, as described in Chapter 1. Hence there is the issue of whether ‘sampled schools’ should refer to just the schools in the initial sample (all of which were asked to participate in the survey), all schools that were sampled irrespective of whether they were approached or not (including replacement schools never approached because the initial school agreed to participate), or just the ‘last’ school in each case to be approached – an initial school if it participated, a first replacement school if it in turn participated after a refusal, or a second replacement school whether it participated or not. We chose to define group (ii) as all schools that were sampled, irrespective of whether they were approached or not. In retrospect, this was probably a bad decision. It means that results for groups (ii) and (iii) will differ not only due to the impact of school response. They will also differ due to sampling variation and replacement, since group (iii) contains only schools that were actually approached (and responded) while group (ii) contains in addition schools never approached. As noted by Sturgis et al (2006), the issue of the impact of replacement on the results is not clear-cut and it would have been sensible to have tried to isolate it. We partly make up for this decision in the subsequent analysis of non-response, where we do compare directly the responding and the non-responding schools (with various treatments of replacement). But as far as our comparison of groups (i) to (v) is concerned, it is important to realise that the difference in results between groups (ii) and (iii) does not give a clean estimate of the effect of non-response by schools, although this is probably the major influence on any differences. Table 3.2 lists the variables that we analyse. Table 3.2: Variables relating to individual students used in the analysis Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg Ks4_5ac = 1 if male (0= female) = 1 if pupil is eligible for free school meals (0 otherwise) = 1 if pupil achieved English KS3 level 5 or above (simple level data) = 1 if pupil achieved Maths KS3 level 5 or above (simple level data) = 1 if pupil achieved Science KS3 level 5 or above (simple level data) = 1 pupil achieved English KS3 level 5 or above (fine grade data) = 1 pupil achieved Maths KS3 level 5 or above (fine grade data) = 1 pupil achieved Science KS3 level 5 or above (fine grade data) = 1 if pupil has 5 or more GCSE/GNVQ A*-C passes Ks3Efg Ks3Mfg Ks3Sfg Ks3_aps Ks4_pts Ks4_ptc Ks4_mean Ks4_ac English Key Stage 3 level (fine grade data) Maths Key Stage 3 level (fine grade data) Science Key Stage 3 level (fine grade data) KS3 average point score over English, Maths and Science Total GCSE/GNVQ point scores Capped GCSE\ GNVQ point scores (best 8 subjects) Mean GCSE / GNVQ point score across all subjects where GCSE/GNVQ taken Number of GCSE/GNVQ A*-C passes -33- We start with a summary of mean values of each variable for the five different groups in Table 3.3. (More details are given in Appendix 1 including the number of missing values and the standard deviations.) We remind the reader that KS3 variables are typically missing for private school pupils (see Appendix 3). Figures 3.1 to 3.3 show the changes in both the means and in standard deviations between the five groups for three continuous achievement variables, one for KS3 (which averages across the three subjects for each student) and two for GCSE (a ‘quality’ variable, ks4_mean, and a ‘quantity’ variable, ks4_pts). A dashed line is shown between values for groups (iii) and (iv) as a reminder of a small change in the exclusion criteria of statemented pupils. We included 91 statemented and sampled pupils in all groups (i) to (v). In practice this means that we included all statemented pupils present in groups (iv) and (v) but only a very small fraction of all statemented pupils in groups (i) to (iii). At stage (i) we analyse values across all pupils in the population and hence no weights are required. At stages (ii) and (iii) we apply the school weight provided by ONS (‘sch_bwt’). At stages (iv) and (v) we use the ONS student weight (‘STU_WT)’. As noted in Chapter 2, we are unable to reproduce the ONS student weight for 2003. Looking first at the bottom half of Table 3.3, the mean values of the continuous achievement variables in the final sample of responding students (v) are clearly all higher than the mean values for the population (i). For example, the mean of ks3_aps is 1.7% higher and that of ks4_pts is 6.8% higher. When comparing these figures one needs to bear in mind that the degree of dispersion in the two measures differs. Using the standard deviations of the population, group (i), as the units, the ks3_aps mean rises by 0.09 units between groups (i) and (v) and the ks4_pts mean by 0.14 units, i.e. by a fairly similar amount. The main change comes between columns (iv) and (v), i.e. at the last stage of the sampling process: student response.10 Looking at the top half of the table, as far as FSM eligibility is concerned, the sample of pupils appears to get less and less representative from the population of all pupils in English schools the more we proceed in the stages of the PISA survey process. There is a fall of three percentage points in the FSM rate between stages (i) and (v). Note also the quite large differences between the percentage of students above given levels of achievement: level 5+ in KS3 subjects or with 5 or more GCSEs at A*-C (or equivalent) – typically a 4 to 5 percent point difference between cols (i) and (v); 61 percent of the PISA sample of responding pupils have 5 or more GCSEs at A*-C compared to 56 percent of the population. Again, the big increase tends to come between cols (iv) and (v), but there are often smaller increases at earlier stages. Figures 3.1 to 3.3 show clearly that with all three of the achievement variables concerned, there is: • 10 a clear rise in the mean between groups (iv) and (v) as a result of pupil response; We noted in Chapter 2 that our definition of group (i), the population, included all persons born in the 14 months January 1987-February 1988 rather than the 12 months of the 1987 calendar year. If we restrict attention just to those persons born in 1987, the achievement score means for group (i) in fact change very little. Most are very slightly lower e.g. 55.70% of pupils have 5 or more GCSEs at A*-C, compared to the figure of 55.79% in Table 3.3, and the mean ks3_aps score is 34.128 compared to the figure in the table of 34.162. -34- • a slight decline in the mean between (ii) and (iii), which could be due to school response or school replacement. • less dispersion of scores among responding pupils (v) than in the population (i). The standard deviation falls to greater or smaller degrees as a result of school response, the sampling of pupils within schools, and the response by pupils. Overall, and especially as a result of pupil response, the final sample of responding pupils has higher mean achievement and less dispersion of achievement than is found in the population. Table 3.3: Mean values of achievement variables at five stages, 2003 i All pupils in English schools Ii All pupils in all sampled schools iii All pupils in responding schools iv Sampled pupils in responding schools v Responding pupils Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac Mean *100 50.017 13.778 69.674 70.537 69.825 71.079 72.026 71.400 55.788 Mean *100 49.332 12.563 70.073 71.075 70.422 71.444 72.616 72.026 55.912 Mean *100 47.395 11.861 70.538 71.135 70.805 71.958 72.715 72.487 55.309 Mean *100 46.28 11.357 71.190 71.681 70.991 72.308 73.007 72.340 56.354 Mean *100 46.267 10.399 74.406 75.280 74.276 75.495 76.632 75.487 60.948 Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac Mean 5.512 5.785 5.541 34.162 42.854 36.264 4.350 5.377 Mean 5.531 5.810 5.564 34.301 42.879 36.451 4.373 5.370 Mean 5.521 5.799 5.556 34.214 42.627 36.172 4.334 5.317 Mean 5.534 5.802 5.555 34.232 43.496 36.975 4.414 5.396 Mean 5.619 5.908 5.636 34.757 45.769 38.744 4.605 5.799 Note: data behind the results in columns (ii) and (iii) are weighted by ONS school weight (sch_bwt) and for columns (iv) and (v) by ONS student weight (STU_WT). -35- 34.9 6.9 34.8 6.8 34.7 6.7 34.6 6.6 34.5 6.5 34.4 6.4 34.3 6.3 34.2 6.2 34.1 6.1 34 KS3 standard deviation KS3 average score (KS3_aps) Figure 3.1: Mean and standard deviation of average KS3 score (ks3_aps), 2003 6 average score 33.9 standard deviation 33.8 5.9 5.8 i ii iii iv v Group of pupils 4.65 1.9 4.6 1.85 4.55 1.8 4.5 1.75 4.45 1.7 4.4 1.65 4.35 1.6 4.3 1.55 4.25 1.5 average score 4.2 standard deviation 4.15 1.45 1.4 i ii iii Group of pupils iv v Mean GCSE score standard deviation Mean GCSE point score (KS4_mean) Figure 3.2: Mean and standard deviation of GCSE point score per subject (ks4_mean), 2003 -36- Figure 3.3: Mean and standard deviation of total GCSE point score (ks4_pts), 2003 46.5 21.5 21 45.5 20.5 45 20 44.5 19.5 44 19 43.5 18.5 43 18 42.5 17.5 42 17 average score 41.5 standard deviation 41 Total GCSE score standard deviation Mean of total GCSE point score (KS4_pts) 46 16.5 16 i ii iii iv v Group of pupils What implications does this have for the percentages of pupils in each group that are above (or below) given thresholds of achievement e.g. level 5 in KS3 or 5 or more GCSEs at A*-C? These percentages are affected by both the changes in the mean and in the standard deviation (SD). Assume for sake of argument that changes in the SD come about through both tails in the distribution shrinking in towards the mean. This will tend to reduce the proportion of pupils found below any given score level that is below the mean. Hence it will tend to increase the proportion of pupils above KS3 level 5, since this is a level of achievement that comes below the mean level. If the mean score is also increasing – as it clearly does between (iv) and (v) – this will tend to further push up the proportion of students above KS3 level 5. By contrast, at score levels above the mean, the effects of the higher mean and lower SD will tend to cancel each other out. The distribution shifts to the right (higher mean) but if the top tail is pulled in (lower SD) then it is possible that the percentage of students above a given level could even be unchanged. Tables 3.4 and 3.5 shed more light on this. We take four threshold levels of score from across the distribution, chosen to divide the population into five slices: at or below the bottom decile, between the bottom decile and the lower quartile, between the lower and upper quartile, between the upper quartile and the top decile and above the top decile. In the case of the variable ks4_mean, the population divides almost exactly in this way. But for variable ks3_aps the population does not divide quite so neatly due to lumpiness in the distribution. -37The results are clearest for the GCSE measure, ks4_mean. In group (v), almost exactly the same percentage of pupils is found in the top group as in the population: 9.9% compared to 10.1%. The effects of higher mean and lower SD have cancelled out. But only 4.9% of group (v) are in the bottom tail of lowest achievement, compared to 10.0% in the population. Restricting attention to the change between stages (iv) and (v) as a result of pupil response, the percentage in the bottom range of GCSE scores falls by one third. The changes are smaller for the KS3 variable. For example, the percentage in the bottom range falls by about one seventh between stages (iv) and (v). Table 3.4: Percent of pupils in KS3 (ks3_aps) percentile range of the population, 2003 Percentile Range Point score <=10 >10 – 25 >25-75 >75-90 >90 2003 25 >25-29 >29-39 >39-43 >43 i All pupils in England Schools 11.65 15.01 53.36 13.40 6.59 ii All pupils in all sampled schools iii All pupils in responding schools 11.33 14.86 53.02 13.64 7.15 11.09 14.91 54.29 13.56 6.15 iv All sampled pupils in responding schools 11.40 14.74 54.25 13.61 6.00 v Responding pupils 9.59 12.93 56.33 14.58 6.56 Note: data are weighted with ONS school weight (sch_bwt) for (ii) and (iii), and with ONS student weight (STU_WT) in (iv) and (v). Table 3.5: Percent of pupils in GCSE points per subject (ks4_mean) percentile range of the population, 2003 Percentile range <=10 >10 – 25 >25-75 >75-90 >90 Point score 2003 1.769 >1.769-3.158 >3.158-5.684 >5.684-6.636 >6.636 i All pupils in England Schools ii All pupils in all sampled schools 9.99 15.12 49.81 15.04 10.05 9.74 15.02 49.51 15.25 10.48 v iv iii responding All All pupils pupils sampled in responding pupils in responding schools schools 9.51 7.57 4.92 15.60 16.80 14.93 50.67 51.11 53.56 14.93 15.24 16.67 9.30 9.27 9.92 Note: percentages are weighted with ONS school weight (sch_bwt) for (ii) and (iii), and with ONS pupil weight in (iv) and (v). Another way of summarising what is happening at the final stage of response by pupils is to compare the figures for group (v) in these two tables with the figures for non-responding pupils. The latter are shown in Tables 3.6 and 3.7. For example, while, 9.6% of respondents are in the lowest range on the basis of the KS3 variable, this is the case for 16.1% of non-respondents. The figures are 4.9% and 14.6% respectively for the GCSE variable. -38- Table 3.6: Percent of non-responding pupils in KS3 (ks3_aps) population percentile range, 2003 Percentile range <=10 >10 – 25 >25-75 >75-90 >90 % 16.14 19.48 48.79 11.08 4.51 Table 3.7: Percent of non-responding pupils in GCSE (ks4_mean) population percentile range, 2003 Percentile range <=10 >10 – 25 >25-75 >75-90 >90 % 14.59 21.76 44.63 11.46 7.56 Figure 3.4 shows the shape of the distribution of ks4_mean in each group (i)-(v). The distribution for responding students (v) is clearly seen to be different from that for the population (i). The combination of higher mean and lower SD can be seen to have contrasting effects at the two ends of the distribution: producing under-representation among respondents of pupils at or below low levels of ks4_mean (e.g. a value of 2) but with the distributions coinciding at higher levels (from about 6.5 upwards). In the next section we restrict attention to the impact of response, whether by schools or by pupils. We do not investigate any further the impact of sampling. The issue of sampling is not without interest. We have seen that means and standard deviations do change between stages (i) and (ii) due to the sampling of schools and between stages (iii) and (iv) due to the sampling of pupils. Of course, change is entirely to be expected – sampling error results in values of sample statistics that depart from the population values. However, the extent of the changes in means and standard deviations could be investigated further. For example, do population values lie within the 95 or 99 percent confidence intervals that can be estimated from the sample information? If not then questions could be asked as to whether the sampling was done in an appropriate manner and whether, for example, exclusions of pupils from the PISA survey population were being appropriately carried out. -39- Figure 3.4: Kernel density estimates of ks4_mean distribution for all 5 groups of children in 2003 All 15 year olds in population 0.3 Pupils in all sampled PISA schools Pupils in responding schools Sampled pupils in responding schools Responding pupils 0.25 Density 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 KS4 mean score 6 7 8 9 Note: Pupil groups (ii) and (iii) weighted with ONS school weight and groups (iv) and (v) with ONS pupil weight. -40- 3.2 Response bias at school and pupil levels We now conduct a more formal analysis of possible biases due to response at the school and pupil levels for a selection of achievement variables. In terms of Figure 1.1 at the end of Chapter 1, we are now looking at the comparisons horizontally rather than vertically. a) School response bias. We first compare responding and non-responding schools (the school as the unit of analysis) and then pupils in responding and non-responding schools (the pupil as the unit of analysis). We allow for the phenomenon of school replacement in various ways. b) Student response bias. We compare responding and non-responding pupils. We consider possible biases in both average achievement (as measured by the mean) and in the dispersion of achievement (as measured by the standard deviation). We also consider biases in the percentage of students above (or below) given score levels, which are affected by any biases in either average scores or the dispersion of scores. The issue of the appropriate weights to apply to the data is less important than in the previous section. For example, large schools are over-represented in the sample of schools and this was important to adjust for when comparing pupils in sampled schools with pupils in all schools. But, conditional on having drawn such a sample, it is less important to apply weights when we are comparing the responding with the non-responding schools. Some of the results in this section are based on weighted data and some are not. In most of statistical tests involving the pupil as the unit of analysis we adjust the estimated standard errors for the clustering of the pupils within schools, recognising that the PISA sample design is not one of a simple random sample. However, we do not allow for stratification and for this reason our estimated standard errors are likely to be somewhat too large.11 This means that we may sometimes fail to reject the null hypothesis of no difference between responding and non-responding schools or pupils when it is in fact false (Type 2 error). In some tests (especially of differences in variances between groups of pupils) we do not allow for clustering of pupils within schools and run the risk of rejecting the null hypothesis when it is true (Type 1 error) on account of the resulting underestimation of the standard errors. 3.2.1 Comparison of responding and non-responding schools Arguably, the natural unit of analysis when analysing school response is the school itself. It is schools that decide whether or not to respond at this stage of the survey. For each school we therefore compute school values of all the variables by taking their averages across each school’s 15 year old students. However, since PISA is 11 We use the ‘svy’ command in Stata and declare the schools as clusters. See Brown and Micklewright (2004:25) for a comparison of standard errors of PISA scores (i) estimated by the OECD, (ii) allowing for clustering but not stratification with the Stata ‘svy’ commands, and (iii) assuming simple random sampling. -41ultimately a survey of pupils’ learning achievement, one can make the argument for using the pupil as the unit of analysis and we include some results on this basis. (Our analysis in the previous section of groups (ii) and (iii) was also based on the pupil as the unit of analysis.) A total of 570 schools were sampled for PISA: 190 ‘initial’ schools, 190 first replacement schools, and 190 second replacement schools. The replacement schools may or may not have been actually approached, depending on the response of the initial school for which they were a replacement. Of the 570 schools, a total of 281 were approached at some stage; all the 190 initial schools (187 effectively), 55 of the first replacement schools and 39 of the second replacements. Table 3.8 shows how they responded: Table 3.8: School response (numbers of schools), 2003 Refused Participated (>50% student rate) Participated (25 to 50% student rate) Participated (<25% student rate) Total Initial school 54 118 First Second replacement replacement 38 27 14 12 Total 119 144 13 2 0 15 2 187 1 55 0 39 3 281 Schools that participated but achieved a students response rate below 25% are treated as non-responding schools in the analysis. There are several possible ways to allow for the phenomenon of replacement. a) Comparison of responding with non-responding schools in the initial sample (without taking first and second replacement into account). We can focus on the initial school sample of 187 schools. (This was the approach we were asked to follow in the bias analysis we conducted for DfES in 2004.) This means we ignore any resolution to response bias that may come from replacement of a non-responding school from another school from within the same achievement stratum. b) Comparison of all schools that responded with all schools that did not respond, irrespective of stage. Another approach is to compare all nonresponding schools, whether initial, first or second replacement, with all responding schools, whether initial, first or second replacement. This analysis compares the 159 responding PISA schools (the 144 with student rates of >50% and the 15 with rates of 25-50%) with the 122 nonresponding schools (the 119 with refusals and the 3 with pupil response rates of <25%). The advantage this approach is that we include all schools that were asked to respond into the analysis. It also boosts sample size. c) Comparison of all schools that responded with second replacement schools that did not respond. One might argue that it is the characteristics of non-responding schools that were not replaced that are of most -42importance. These are the non-responding second-replacement schools. The final approach we adopt is therefore to compare the 187 responding schools (at whatever stage – initial, first or second replacement) with the 27 non-responding second replacement schools. (One problem of this analysis is that schools with low within-school student response which were not replaced cannot be identified. Hence, this analysis does not include schools with pupil response rates below 25%.) Table 3.9 shows the characteristics of pupils in schools with low student response rates – pupils included in approaches (a) and (b) but not (c). Pupils in these schools have much lower average achievement e.g. only 36 percent have 5 or more GCSEs at grades A*-C compared with 56 percent in the population of all 15 year old pupils . Their schools were treated as not responding in the analyses that follow. Table 3.9: Characteristics of pupils in schools with low pupil response rates (< 25%), 2003 Variable Male FSM Elevel5 MLevel5 Slevel5 Elevel5fg MLevel5fg Slevel5fg ks4_5ac ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac n 624 621 611 610 612 604 604 600 624 n 604 604 600 591 624 624 624 624 Missing 0 3 13 14 12 20 20 24 0 Missing 20 20 24 33 0 0 0 0 Mean*1000 50.439 25.427 49.002 53.479 48.561 50.093 55.562 50.572 36.074 Mean 4.903 5.201 4.970 30.578 31.430 28.012 3.387 3.524 SD 1.188 1.382 1.181 7.174 21.189 17.605 2.044 3.830 Note: ONS school weights (sch_bwt) applied. a) All schools in the initial school sample Table 3.10 summarises the sample of schools used in this approach. Table 3.10: School type by response (number of schools): approach (a), 2003 School LEA maintained Independent Total Non-responding 49 7 56 responding 123 6 129 Note: for two responding schools no information on school type is available. Total 172 13 185 -43- Differences in means and differences in standard deviations are shown in Tables 3.11 and 3.12 respectively. We find no significant differences in mean values between responding and nonresponding schools, other than in the FSM eligibility rate (at the 5% level) where responding schools have a lower rate. Differences in dispersion are statistically significant however: responding schools display lower variation of KS3 and KS4 results (NB this relates to variation in their school-average values of pupils’ results). The differences in variance of KS4_mean are clearly visible in the distributions shown in Figure 3.5. Table 3.11: Difference in means: approach (a), 2003 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac Mean*100 47.79 11.42 72.87 73.44 73.23 53.58 Nonresponding school Mean*100 45.96 19.92 72.86 72.38 73.40 57.47 KS3_aps KS4_mean Ks4_pts Mean 34.53 4.52 41.10 Mean 34.54 4.53 42.63 Responding school Difference p Observations -1.83 8.5 -0.01 -1.06 0.17 3.89 0.77 0.04 0.99 0.82 0.98 0.56 185 175 182 183 182 185 0.01 0.01 1.53 0.99 0.97 0.66 182 185 185 Note: p gives the significance level at which we can just reject the null hypothesis that the values are the same between responding and non-responding schools (adjusted Wald-test). Results are weighted by ONS school weight sch_bwt. -44- Table 3.12: Differences in variances: approach (a), 2003 Responding school Nonresponding school Test of SD1=SD2 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 16.32 8.94 15.14 13.75 14.85 18.87 SD2 25.66 16.51 20.67 19.65 21.89 26.57 p 0.00 0.00 0.01 0.00 0.00 0.00 Responding & nonresponding schools SD 19.55 11.72 16.94 15.71 17.19 21.43 KS3_aps KS4_mean SD1 2.99 0.91 SD2 4.63 1.37 p 0.00 0.00 SD 3.54 1.07 Note: results are unweighted. Figure 3.5: Distributions of KS3 and KS4 scores: approach (a), 2003 KS3 points score (ks3_aps) Mean GCSE point score (ks4_mean) Not responding school Responding schools Not responding school Responding schools 0 5 10 15 20 25 30 35 40 45 50 55 60 KS3 average point score 0 1 2 3 4 5 Mean GCSE point score 6 Note: unweighted results. Kernel density estimates. b) All responding and non-responding schools (initial, first, second replacement) Table 3.13 summarises the sample of schools used in this second approach. 7 8 -45- Table 3.13: School type by response (number of schools): approach (b), 2003 School type LEA maintained school Independent school Total Non-responding 106 Responding 148 Total 254 15 121 9 157 24 278 Note: for two non-responding and one responding school no information is available on school type. Table 3.14: Differences in means: approach (b), 2003 Responding school Mean*100 Male 48.73 FSM 11.58 ELevel5fg 72.15 MLevel5fg 73.19 SLevel5fg 73.52 KS4_5ac 55.52 KS3_aps KS4_mean Ks4_pts Mean 34.54 4.55 42.09 Non-responding school Mean*100 50.24 20.32 64.92 66.29 65.97 53.67 Difference Mean 33.38 4.31 39.81 P Observations 1.51 8.74 -7.23 -6.9 -7.55 -1.85 0.73 0.00 0.10 0.09 0.10 0.72 278 258 271 272 271 278 -1.16 -0.24 -2.28 0.15 0.36 0.42 271 278 278 Note: ONS school weights applied. Adjusted Wald-test. Table 3.15 Differences in variances: approach (b), 2003 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 17.40 9.49 15.46 13.81 15.17 19.25 SD2 26.11 17.91 23.43 21.96 24.00 26.99 p 0.00 0.00 0.00 0.00 0.00 0.00 Responding & nonresponding school SD 21.65 13.88 19.39 17.83 19.53 22.89 KS3_aps KS4_mean SD 3.08 0.94 SD 4.63 1.43 P 0.00 0.00 SD 3.83 1.18 NonResponding Test of responding school SD1=SD2 school Note: results are unweighted. -46The conclusions seem pretty much unchanged from approach (a), both as regards mean and variance. There seems no significant difference in means (except for FSM) but significant differences in the variance. The larger sample size does however begin to push some of the differences in means towards being statistically significant at the 10% level. c) All responding schools and non-responding second replacement schools Table 3.16 summarises the sample of schools used in this third approach. Table 3.16: School type by response: approach (c), 2003 School type LEA maintained Independent Total Non-responding 22 4 26 Responding 148 9 157 Total 170 13 183 Note: for one non-responding and two responding schools school type information is missing. Table 3.17: Differences in means: approach (c), 2003 Responding school Mean*100 Male 48.73 FSM 11.58 ELevel5fg 72.15 MLevel5fg 73.19 SLevel5fg 73.52 KS4_5ac 55.52 KS3_aps KS4_mean Ks4_pts 34.54 4.55 42.09 Non-responding school Mean*100 52.33 15.82 58.22 59.17 59.14 48.91 32.32 4.22 37.01 Note: ONS school weights applied. Adjusted Wald-test. Difference p Observations 3.60 4.24 -13.93 -14.02 -14.38 -6.61 0.57 0.36 0.12 0.14 0.15 0.50 183 175 179 179 179 183 -2.22 -0.33 -5.08 0.16 0.47 0.38 179 183 183 -47- Table 3.18: Differences in variances: approach (c), 2003 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 17.40 9.49 15.46 13.81 15.17 19.25 SD2 25.33 19.31 23.43 22.89 24.49 26.56 p 0.02 0.00 0.00 0.00 0.00 0.00 Responding & nonresponding school SD 18.76 11.35 16.85 15.41 16.77 20.36 KS3_aps KS4_mean SD2 3.08 0.94 SD1 4.45 1.35 p 0.00 0.00 SD 3.30 1.01 NonResponding Test of responding school SD1=SD2 school Note: results are unweighted. With the smaller sample size in approach (c) we cannot reject the hypothesis that mean values are the same for the populations from which the responding and nonresponding schools are drawn. But we can once again reject the hypothesis that the SDs are the same. Note that the size of the differences in means in Table 3.17 is much larger than in Table 3.11 with approach (a), although they are still insignificant. The key thing to note here is that the non-responding schools in approach (c) are all different schools to those non-responding in approach (a). In approach (c), non-responding schools are all second replacement schools. In approach (a) they are all initial schools. The values of the means for non-responding schools in Table 3.17 are notably lower than those in Table 3.11. In analyses in all three approaches, (a)-(c), we have so far used the school as the unit of analysis, on the grounds that we are investigating response by schools. However, one might argue that what matters are the characteristics of the pupils in the responding schools compared to those of the pupils in the non-responding schools. This would imply using the pupil as the unit of analysis. In Table 3.19 we therefore compare means for pupils in responding schools and with those for pupils in the second replacement non-responding schools (approach (c)). -48- Table 3.19: Differences in means: approach (c) pupil as unit of analysis, 2003 Responding school Mean*100 Male 47.39 FSM 11.86 ELevel5fg 71.96 MLevel5fg 72.71 SLevel5fg 72.49 KS4_5ac 55.31 KS3_aps KS4_mean KS4_pts Mean 34.21 4.33 42.63 Non-responding school Mean*100 53.68 16.93 65.27 69.47 67.30 55.80 Mean 33.78 4.40 42.74 Difference P Observations 6.29 5.07 -6.69 -3.24 -5.19 0.49 0.00 0.00 0.00 0.00 0.00 0.57 40,095 38,622 38,325 38,475 38,376 40,095 -0.43 0.07 0.11 0.00 0.07 0.78 38,018 40,095 40,095 Note: ONS school weights applied. Adjusted Wald-test. The clustering of the students within schools is taken into account. With the pupil as the unit of analysis there is a huge increase in sample size – from about 180 in the case of the school to around 40,000. Not surprisingly, significant differences are easier to find e.g. the five percentage point difference in FSM receipt between pupils in responding and non-responding schools. Nevertheless, it is still the case that for all three GCSE variables, we find no significant difference at the 5% level between pupils in responding and non-responding schools. Note that the significant difference in the means for the continuous KS3 variable, ks3_aps, is the opposite of what one would expect from Figure 3.1. That graph showed group (iii), respondents, to have a lower mean for this variable than group (ii), all pupils in all sampled schools. But Table 3.19 shows the respondents to have a higher mean than the non-respondents. The explanation is linked to the point made above that the non-responding schools in approach (c) are different schools to those in approach (a). All sampled pupils in all sampled schools, group (ii), include pupils in schools not responding to the initial sample and in schools not approached at all. Against that base, the pupils in responding schools have a lower mean score on ks3_aps (Figure 3.1). But compared to the pupils in second replacement schools which do not respond, their mean is higher. All this underlines the need for a separate investigation of the phenomenon of replacement, which we have not undertaken. -49- 3.2.2 Comparison of responding and non-responding pupils The analysis of response bias at the student level includes 5,213 sampled pupils in 159 schools of which 3,766 responded (72.2%) and 1,447 did not (27.8%).12 The differences in means and in percentages in different categories are now clearly significant (with the exception of the difference in the percentage male).13 The differences in dispersion are also significant, although the lack of adjustment for clustering when testing for differences in the standard deviations implies that the p value of 0.02 for ks3_aps should be viewed as a sizeable underestimate – it is plausible that there is in fact no significant difference at the 5% level. Respondents display a significantly higher mean score and a significantly lower dispersion in scores, at least in KS4 variables. These differences both contribute to results in large differences between respondents and non-respondents in the percentages at or above KS3 level 5 or with 5+ GCSEs A*-C, around 11 and 17 percentage points respectively. The higher mean and lower dispersion of respondents is particularly clear in the graph of the distribution on KS4_mean scores – the right hand side of Figure 3.6. Table 3.20: Differences in means and percentages (weighted), 2003 Responding pupil Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac Obs. 3695 3475 3481 3481 3481 3695 mean*100 46.27 10.40 75.49 76.63 75.49 60.95 KS3_aps KS4_mean KS4_pts 3455 3695 3695 Mean 34.76 4.60 45.77 Non-responding pupil Obs. Mean*100 Difference P 1396 46.32 0.05 0.98 1341 13.84 3.44 0.00 1328 64.0 -11.49 0.00 1332 63.53 -13.10 0.00 1331 64.11 -11.38 0.00 1396 44.19 -16.76 0.00 1317 1396 1396 Mean 32.85 3.91 37.48 -1.91 -0.69 -8.29 0.00 0.00 0.00 All pupils Obs. 5091 4816 4809 4813 4812 5091 4772 5091 5091 Note: ONS student weights (STU_WT) are applied to the data. We use the adjusted Wald-test for continuous variables (ks3_aps, ks4_mean) and allow for clustering by school. The χ2-tests for the binary variables are based on the unweighted data and with no allowance for clustering but the means displayed are the weighted values. 12 The sampled pupils are in 159 schools in the ONS file on all sampled pupils. But once they are merged with the national file using student identifiers they are in as many as 175 schools. We presume that the reason for this difference is that some of the pupils have changed schools after they sat the PISA test, with the new school recorded in the national file. For the student analysis we use the school identifiers of the pupil file. 13 Chapter 2 noted our concern with the ONS student weight for 2003 and so we also produced unweighted results. Differences between weighted and unweighted results are only slight. -50- Table 3.21: Differences in variances (unweighted), 2003 Responding & nonresponding pupils SD 6.43 1.72 19.8 NonTest of Responding responding SD1=SD2 pupils pupils KS3_aps KS4_mean KS4_pts SD1 6.28 1.61 18.6 SD2 6.63 1.92 21.5 p 0.02 0.00 0.00 Note: results are for unweighted data. Clustering of pupils within schools is not taken into account and hence standard errors are understated. Figure 3.6: Distributions of KS3 and KS4 scores for responding and nonresponding pupils, 2003 KS3 point (ks3_aps) Mean GCSE point score (ks4_mean) Not responding pupils Responding pupils Not responding pupils Responding pupils 0 5 10 15 20 25 30 35 40 KS3 average point score 45 50 55 60 0 1 2 3 4 5 Mean GCSE point score 6 Note: data weighted by ONS student weight. Kernel density estimates. 3.3 Using OECD weights and our own response weights In the two previous sections we applied the ONS weights. These take the sample design into account but not the response levels of either schools or students or their patterns of response (e.g. association with achievement levels). In this section we investigate the impact of two sets of weights that in addition allow for non-response. In both cases we are interested to see if non-response weights remove any of the apparent response bias in mean or variance of KS3 and KS4 scores. 7 8 -51The first set of weights are the OECD weights described in Chapter 2. These adjust for the levels of response by both schools and students and, in the case of the school weights, for the average achievement levels (in GCSE terms) of responding schools. The second set of weights are constructed by us and are intended to go further than the OECD weights in the adjustment for response at the pupil level. These weights take into account the relationship between the probability of pupil response and achievement levels. 3.3.1 Calculation of our response weights A standard approach for adjusting for non-response bias when one has additional information on respondents and non-respondents is to estimate a statistical model of the response probability and to use this to construct response weights. Estimates of parameters of interest, e.g. mean PISA scores, standard deviations, proportions below given thresholds etc, can then be calculated with and without the use of these nonresponse weights, thus assessing the degree of bias in data to which only design weights have been applied. a) Modelling the response probability We estimate the probability that a sampled pupil responds to PISA with a logistic regression model. The dependent variable, Yi, takes the value 0 if a pupil did not respond and 1 if a pupil responded. The probability that Yi=1 in the logit model is specified as 1/[1+exp(-βXi)] where X is vector of explanatory variables. The model is estimated by maximum likelihood. We use two variants of the specification of the model. First, we estimate non-response as a function of gender, free school meal eligibility, achievement (a KS4 measure), private school attendance and region (Models 1a and 1b). Second, we exclude regional variables but include a dummy variable for each school (Model 2). The inclusion of school dummies takes into account the possibility that individual response behaviour is determined in part by characteristics of the school that we are unable to measure. The only continuous variable in the models is the GCSE variable, ks4_pts, for which a quadratic is included to pick up non-linearities. Conditional on the inclusion of the ks4_pts variable, other measures of achievement, such KS3_aps measure or other GCSE variables, were much less important or were insignificant and were not included in the final specification.14 One disadvantage of including school dummies into the model is that we need to exclude pupils in schools where either no sampled pupil in the school responded or all sampled pupils responded. In practical terms, this means that we exclude 77 sampled pupils in schools where all pupils responded and 49 sampled pupils in schools where 14 We also tried including the school mean value of the KS4 measure (calculated over all 15 year olds and not just the sampled pupils) but it proved insignificant. -52no pupil responded. The number of sampled pupils available is therefore reduced from 5,213 to 5,087. Table 3.22 shows estimates of the β for both models. In Model 1a, the standard errors are adjusted to take into account the clustering of pupils within schools while in Model 1b the standard errors are unadjusted. Note that we do not present the estimated coefficients for the school dummy variables in Model 2. These are significant at least at the 5 % level in 43 out of 156 cases. Table 3.22: Logistic regression models of pupil response, 2003 Male FSM Ks4_pts Square of ks4_pts/100 Private school Missing value for at least one variable North West England East Midland East of England Constant Observations Pseudo R-squared log-lklhd F statistic Model 1a 0.088 (0.075) 0.014 (0.109) 0.060 (0.006)*** -0.047 (0.007)*** 0.187 (0.365) Model 1b 0.088 (0.066) 0.014 (0.103) 0.060 (0.006)*** -0.047 (0.006)*** 0.187 (0.160) Model 2 0.138 (0.077)* -0.074 (0.117) 0.073 (0.007)*** -0.056 (0.008)*** 0.300 (0.687) -0.656 -0.656 -0.866 (0.321)** 0.392 (0.227)* 0.412 (0.172)** 0.301 (0.214) -0.721 (0.145)*** 5213 (0.193)*** 0.392 (0.116)*** 0.412 (0.092)*** 0.301 (0.092)*** -0.721 (0.124)*** 5213 0.04 -2940.82 (0.313)*** -1.292 (0.407)*** 5087 0.16 -2526.25 19.32 Note: estimates of standard errors in Model 1a take clustering within schools into account while estimates in Model 1b do not. Model 2 excludes regional dummies but includes 156 school dummies (results for school dummies not presented) and allows for clustering of the pupils within schools. A missing dummy variable was included and set to 1 if values of any variable were missing. (Most of the missing variables were missing for the same pupils). Data are unweighted. Model 1 can only explain a small fraction of pupils’ response (pseudo r2 = 0.04). Not surprisingly given results shown earlier in the report, the quadratic in ks4_pts performs very well (with a t-value of 10 for the linear term, and almost 7 for the quadratic term). Of the other variables, only the regional dummies are significant. Hence, once we control for GCSE scores, the FSM and the private school dummies are both insignificant, as is gender. -53Results of Model 2 are very similar to those of Model 1. However, given the significance of about a quarter of the school dummy variables, this model is better at explaining the variation in response (pseudo r2 = 0.16). Figure 3.7 shows the variation in the predicted probability of response with the total GCSE points score, holding other variables constant at their mean values. We plot the predicted probability for the range of GCSE point scores that spans the 5th to the 95th percentiles in the sample. The turning point of the quadratic is close to the top of this range, suggesting that there is little if any real down turn in response with high levels of GCSE scores. (Note that while we showed earlier that the variance of scores was lower for respondents than for non-respondents we did not investigate separately response at either end of the score distribution.) The predicted probability of response rises, ceteris paribus, from below 0.5 at the bottom of the GCSE score range to about 0.8 towards the top of the range. Figure 3.7: Predicted probability of pupil response by GCSE point score, 2003 0.9 predicted probability of response 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 KS4_pts Note: All other variables are set to their mean values. The graph displays predicted probabilities for GCSE points scores between the 5th and 95th percentiles of the sample. b) Weighting for response We use the results of the logistic regression models to calculate a response weight for each respondent. This weight is simply the inverse of the predicted probability of response. Hence a respondent with low GCSE scores has a higher weight than one with a high GCSE score. Summary statistics are given in Table 3.23. (There is clearly a case for trimming extreme values of the weights but we have not done this.) Weight1 is derived from Model 1 and weight2 is derived from Model 2. -54- Table 3.23: Summary statistics of non-response weights derived from logistic regression, 2003 Variable weight1 weight2 Obs 3766 3703 Mean 1.383427 1.38499 Std. Dev. .2367021 .5953457 Min 1.158141 1 Max 3.05661 19.99076 Note: weight1 gives the inverse of the response probability derived from Model 1 (Table 3.22); weight2 refers to Model 2. c) Construction of a combined weight Besides allowing for non-response, any weight needs to allow for the sample design. We also wish to incorporate the adjustment for the level and pattern of response at the school level that is incorporated in the OECD weights. This adjustment is through the OECD school weight ‘scweight’ described in Chapter 2. Hence our alternative weight to the OECD student weight for pupil j in school i is calculated as follows, where Mi is the total number of 15 year old pupils in school i and mi is the number of sampled pupils (see equation (3) in Chapter 2 section 2.2.1). R_WEIGHT1ij = scweighti * Mi/mi * weight1ij R_WEIGHT2ij = scweighti * Mi/mi * weight2ij The weights R_WEIGHT1 and R_WEIGHT2 allow for the variation of the pupil response probability with pupil achievement levels, something not present in the OECD weight. This should be an important advance since much of any response bias in achievement scores comes through pupil response rather than school response, as demonstrated earlier (e.g. Figure 3.3). R_WEIGHT1 has the disadvantage compared to the OECD student weight of not explicitly adjusting for the average level of pupil response within the school, while in the case of R_WEIGHT2 this is picked up by the school dummies in Model 2. Table 3.24 compares summary statistics for the ONS student weight, the OECD student weight and our two new weights. The final column gives the coefficient of variation – the standard deviation divided by the mean. This shows that the OECD weight has about twice the variation of the ONS weight. R_WEIGHT1 has in fact somewhat less variation than the OECD weight, which we presume is due to it not taking explicit account of the average level of response within each school. This average level is taken into account in R_WEIGHT2, which displays the most variation of any of the weights (this may well be affected by outlier values that we have not trimmed). -55- Table 3.24: Summary statistics of different student weights, 2003 Variable ONS student weight STU_WT OECD student weight w_fstuwt R_WEIGHT1 R_WEIGHT2 Obs Mean Std. Dev. Min Max CV 3752 .0302 .0053 .0092 .0643 0.175 3766 154.83 51.08 61.46 579.69 0.330 3766 3703 150.12 148.51 40.20 68.39 15.25 54.21 534.49 2173.74 0.268 0.461 3.3.2 Results Table 3.25 shows the result of applying the three sets of weights: ONS student, OECD student and our own calculated weights. We also show results for unweighted data. Standard deviation Mean Table 3.25: KS3 and KS4 scores and Free School Meal receipt for responding pupils, different student weights, 2003 Unweighted ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 Unweighted ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 Observations 1 Observations 2 Observations 3 KS3_aps KS4_mean KS4_pts 34.75 34.76 34.71 34.18 34.20 6.28 6.30 6.34 6.57 6.57 3,425 3,389 3,455 4.60 4.60 4.59 4.44 4.39 1.61 1.61 1.63 1.73 1.76 3,664 3,611 3,695 45.84 45.77 45.63 43.77 43.50 18.55 18.55 18.66 19.80 20.13 3,664 3,611 3,695 FSM receipt % 10.50 10.40 10.56 11.44 11.56 3,444 3,410 3,475 Note: ‘Observation 1’ refers to calculations for R_WEIGHT 1. ‘Observations 2’ refers to calculations for R_WEIGHT_2. ‘Observations 3’ refer to unweighted or weighted results using the ONS student weight STU_WT or the OECD student weight w_fstuwt. The results using the OECD student weight are very similar to those using the ONS weight and those with no weights at all. There are very small reductions in the means, very small increases in the standard deviation, and a very small increase in the percentage in receipt of Free School Meals. We reproduce below the means for all sampled pupils in responding schools, group (iv), that were given earlier in Table 3.3. Comparing these figures with those in Table 3.25, the conclusion seems to be that the OECD weights remove only a very small part of the response bias. For example, the mean for responding pupils using the OECD weight is 4.9% higher than the mean for -56all sampled pupils. This is not surprising since the OECD weights do not incorporate any direct adjustment for the way that the pupil response probability varies with achievement level. KS3_aps KS4_mean KS4_pts FSM % 34.23 4.41 43.50 11.36 By contrast, our alternative weights, R_WEIGHT1 and R_WEIGHT2, do include an adjustment of this type. Table 3.25 shows that their application does bring the mean KS3 and KS4 scores for responding pupils very close to the means for all sampled pupils. The same is true for the Free Schools Meals rate. And the use of our weight pushes up the standard deviations to around their levels for all sampled pupils shown in Figures 3.1 to 3.5 (group (iv) values). (In fact they push them slightly beyond.)15 3.4 Response bias in PISA scores We now estimate the impact of response bias on PISA achievement scores in 2003 using three different approaches explained below: (a) use of z-scores, (b) regression adjustment and (c) the use of response weights. Our focus is on response-bias at the pupil level. However, when we use the response weights we are also taking account of response bias as the school level (through the adjustment of the OECD school weight). We focus mainly on bias in the mean but include some results for the standard deviation. The mean PISA scores in reading, maths and science for the final sample of responding pupils in responding schools in England in 2003 are given in Table 3.26. Note these are results obtained by applying the ONS student weights, and not the OECD student weights, although the results obtained with the latter are very similar (see Table 3.30 below). Table 3.26: PISA means and standard deviations (ONS weights), 2003 PISA reading Mean SD 507.2 93.7 PISA maths Mean SD 507.7 92.5 PISA science Mean SD 520.1 102.2 Note: mean scores are calculated with the average of the five PVs. The SD is the average of the SDs calculated separately for each PV. The calculations are based on the final PISA sample of respondents but are weighted with the ONS student weights. How far might these scores have changed if there had not been any response bias at the pupil level? 15 The appropriate comparison is arguably slightly different to the one we have described since the OECD school weight that forms part of our own weight variables adjusts for non-response at the school level. However, the results as described are very unlikely to mislead. -57- 3.4.1 Z-score method One method of addressing the issue is to transform differences in means in KS3 or KS4 scores between respondents and non-respondents into z-scores (i.e. SD units) and translate these into PISA scores by multiplying by the values of the PISA SDs taken from Table 3.26 (i.e. with ONS weights applied). The results of this procedure are shown in Table 3.27. Table 3.27: Calculation of bias in the PISA means: z-score method, 2003 7. 6. 5. 3. 2. 1. Multiplied Multiplied 4. Multiplied SD Mean Mean with SD with SD (Mrwith SD responding sampled sampled PISA PISA Ms)/SD PISA read pupils=Mr pupils=Ms pupils science maths KS3_aps 34.75 34.23 6.43 0.081 7.59 7.49 8.28 KS4_mean 4.60 4.42 1.72 0.105 9.84 9.71 10.73 Note: the KS3 and GCSE mean scores are calculated on unweighted data, but weighting with ONS weights does not make a big difference. Cols. 1 and 2 give mean pupil scores for summary KS3 and KS4 variables among responding students and all sampled students in responding schools (i.e. respondents and non-respondents taken together). Col 3. gives the SD of the sampled students. Col 4 gives the difference in means between respondents and all students, standardised by the all sampled student SD. In other words, this is a z-score and shows that responding students had a mean that was 0.081 (KS3_aps) and 0.105 (KS4_mean) SDs above the mean for all sampled students. Cols. 5-7 multiplies these values by the values of the PISA SDs for reading, maths and science respectively shown in Table 3.26 to estimate the KS3 and KS4 points differences in terms of a PISA points difference. The mean difference between responding pupils and sampled pupils in the KS3 variable translates into a difference of about 7 or 8 points. The difference in means for the KS4 variable translates into ‘PISA differences’ that are slightly larger of about 10 points. Taking the adjustment based on the KS3 results, mean England PISA scores in 2003 allowing for student non-response can be estimated to be 500 instead of 507 for reading, 500 instead of 507 for maths and 512 instead of 520 for science. 3.4.2 Regression method The disadvantage of the z-score method is that it makes a strong assumption about the relationship between PISA scores and domestic achievement scores, namely that they are highly correlated. Imagine that there were in fact no association between the two. In this case response bias in KS3 or KS4 scores would have no implication for PISA -58scores. This motivates our second approach which is to regress PISA scores on KS3 (or KS4) scores for respondents and use the estimated coefficient of the KS3 (or KS4) variable as the PISA equivalent of a KS3 (or KS4) point. This equivalent can then be applied to the difference in the means for responding and sampled pupils. Regression results are presented in Table 3.28. (Chapter 5 describes these regressions in more detail; a simple linear model seems a reasonable fit to the data.) The domestic achievement scores turn out to explain between 60 and 70 percent of the variation in PISA scores. The correlation between, for example, PISA maths score and KS3_aps is 0.83; the correlation with KS4_mean is 0.76. Applying the KS3_aps coefficients to the differences in means in KS3_aps of respondents and sampled pupils (shown in Table 3.27), yields an estimate of bias in the PISA means of about 6 points, which rises to about 8 points when the KS4 regression is used instead (see Table 3.29). These figures are somewhat smaller than the figures obtained with the zscore method. Table 3.28: Regression of PISA scores on KS3 and KS4 measures, respondents, 2003 Regression of PISA scores on a KS3 measure read maths science ks3_aps 11.027 11.510 12.265 (0.184) (0.181) (0.196) Constant 120.055 103.688 89.942 (6.805) (6.471) (7.093) Obs. 3435 3435 3435 R2 0.65 0.68 0.66 Regression of PISA scores on a KS4 measure read maths science Ks4_mean 42.543 42.049 45.545 (0.887) (0.930) (0.999) Constant 311.119 313.838 310.153 (4.842) (4.940) (5.352) Obs. 3672 3672 3672 R2 0.61 0.58 0.58 Note: data weighted with ONS student weight STU_WT. Standard errors in brackets. Clustering of pupils within schools is taken into account in estimation of standard errors. Table 3.29: Calculation of bias in PISA means: regression method, 2003 KS3_aps Ks4_mean reading 5.84 7.66 maths 6.10 7.57 science 6.50 8.20 3.4.3 Use of response weights The third method to estimate the extent of bias is to simply calculate mean scores with and without use of response weights. The weights we use are those applied in Section 3.3 – the OECD student weights and our own weights based on the logistic regression modelling of student response. We compare results for unweighted scores, results with the ONS student weight (STU_WT), results with the OECD weights, and results with our weights R_WEIGHT1 and R_WEIGHT2. The use of either the OECD weights or our own weights also incorporates adjustment for non-response by schools. The OECD weights adjust only for the level of pupil response within schools while -59our weights allow in addition for the variation of pupil response with domestic achievement as measured by GCSE scores. Application of the OECD weights slightly reduces the means and slightly increases the standard deviations. Application of our weights has a larger impact in both cases. We can estimate response bias in mean scores by comparing the results using the ONS weights with the results using our weights. Table 3.31 shows this calculation for both the mean and the standard deviation, using R_WEIGHT1. The upward bias in PISA scores in the sample of respondents is estimated as about 6 or 7 points in the mean. The downward bias in the standard deviation is estimated at about 3 or 4 points. Table 3.30: PISA mean scores for responding pupils, different student weights, 2003 Standard deviation Mean Weight Unweighted ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 Unweighted ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 PISA maths PISA reading 508.20 507.68 506.95 501.47 501.40 92.72 92.48 93.14 95.36 95.32 507.51 507.15 506.31 500.63 499.69 93.74 93.65 94.40 97.23 97.27 PISA science 520.55 520.07 519.01 513.16 512.30 102.42 102.23 103.10 105.26 105.38 Obs. 3766 3752 3766 3721 3664 3766 3752 3766 3721 3664 Note: the standard deviation is calculated as the mean of the five standard deviations of each plausible value. The mean value refers to the pupils’ mean across all five plausible values. Table 3.31: Calculation of bias in PISA means and standard deviations: response weights method, 2003 Mean Standard Deviation reading +6.5 -3.6 maths +6.2 -2.9 science +6.9 -3.0 Note: the table shows differences between means and standard deviations in Table 3.35 calculated with the ONS student weight and R_WEIGHT1. Table 3.32 summarises our results, showing the estimated upwards bias in mean PISA scores resulting from pupil non-response. Table 3.32: Summary of estimate of bias in PISA means: different methods, 2003 z-score method regression method response weights Ks3_aps Ks4_mean Ks3_aps Ks4_mean R_WEIGHT1 R_WEIGHT2 Reading 7.6 9.8 5.8 7.7 6.5 7.5 Maths 7.5 9.7 6.1 7.6 6.2 6.3 Science 8.3 10.7 6.5 8.2 6.9 7.8 -60- Note: estimates taken from Tables 3.27, 3.29 and 3.31. The regression method and response weight method give similar results and both of these methods are superior in principle to the z-score method. Based on these methods, we conclude that mean PISA scores among respondents were upwardly biased in 2003 by around 6 to 8 points due to pupil non-response. 3.4.4 The OECD decision to exclude the UK We have not found a clear statement by the OECD of what is considered a sufficiently large response bias in the estimate of the population mean or any other parameter to justify the exclusion of a country from the international reports. Commenting on the thresholds set for acceptable levels of response (e.g. 80 percent for pupils), the report on PISA 2003 noted: ‘In the case of countries meeting these standards, it was likely that any bias resulting from non-response would be negligible, i.e. smaller than the sampling error’ (OECD 2004: 325). If we measure ‘sampling error’ by one standard error, this statement suggests a criterion for inclusion that the estimated bias in the mean (or any other parameter) should be smaller than the estimated standard error of the mean. There are no widely accepted norms for the relative sizes of bias and sampling error. Both contribute to ‘total survey error’, which combines sampling and non-sampling errors. Minimising total survey error of key estimates certainly is an accepted aim of survey design (e.g. Biemer and Lyberg 2003). Total survey error in the estimate of a parameter is conventionally measured by mean square error (MSE), which is defined as: mean square error = (bias)2 + (standard error)2 Biases can arise for reasons other than non-response but we restrict attention to the response bias that we have been able to estimate. The formula for MSE implies that as bias rises above the standard error, the value of MSE will quickly become dominated by the bias rather than by sampling error. Where the bias is less than the standard error, MSE will be dominated by the effect of sampling variation. Another way of looking at the issue is to consider the 95 percent confidence interval of the mean, equal to its estimated value plus or minus 1.96 standard errors. Imagine the case where bias equals one standard error. This implies that the correct confidence interval would be centred on a value that is shifted sideways by this amount. However, it would still be the case that the confidence interval estimated with no allowance for bias would include most of the values within the interval estimated allowing for the bias. With the bias more than twice the standard error, this would no longer be the case. -61Our estimates of the bias in the mean of about 6 to 8 points are considerably larger than the estimated standard error: the standard errors of the mean scores for England in 2003 were estimated by the OECD as 2.9 (reading), 2.9 (maths) and 3.0 (science).16 In this sense the biases could be classified as large. Our estimated biases in the standard deviation, of the order of 3 PISA score points, also exceed the OECD estimates of the standard error for England in 2003, which are 1.6 in maths and reading and 1.7 in science. Note that ‘bias’ in the formula for MSE above is the expected value calculated from repeated surveying rather than a single estimate such as we have obtained. We have a single estimate of the bias and one has to think of there being a standard error around this estimate. (If we estimate bias by the difference between two estimated means obtained by applying two different sets of weights to the data, we have to bear in mind that there is sampling error for both estimated means.) Being cautious, one might argue that unless one has a good estimate of this standard error, one cannot conclude on the extent of bias. It is possible that this could be the reasoning behind the OECD’s statement in the PISA 2003 report that, when faced with the evidence we had produced in our earlier unpublished analysis of response bias in 2003, ‘the PISA Consortium concluded that it was not possible to reliably assess the magnitude, or even the direction, of this nonresponse bias’ (OECD 2004:328). However, such a position would be overly cautious. We have conducted statistical tests for the significance of differences in achievement between pupils who respond and those who do not. Or we can calculate a 99 percent confidence interval around the mean of, say, KS4_pts for group (v), the respondents, in Figure 3.1. This interval does not include the value for group (i), the population.17 Another (and perhaps more likely) possibility for the reasoning behind the OECD’s statement lies in the fact that the work we did in 2004, on which the decision to exclude the UK was taken, did not have the benefit of access to respondents’ PISA scores. We could show the bias in mean or variance of KS3 and KS4 scores but we could not estimate the bias in mean or variance of PISA scores (other than through the z-score method). Taking a very cautious position, the PISA Consortium could have worried that PISA scores and English domestic achievement scores were only very weakly associated. We can now see that any such concern was unfounded. PISA scores and KS3 or KS4 scores are strongly correlated. We conclude therefore that it is possible to assess reliably both the direction and magnitude of response bias in 2003. The PISA Consortium’s statement on the subject in OECD (2004) is at best seen as a cautious assessment based on the evidence available at the time. The evidence we are now able to produce allows much firmer conclusions to be drawn.18 16 The estimated standard errors of means and standard deviations come from an Excel file of results for England in 2003 available from the OECD PISA website. 17 The interval is from 43.940 points to 47.372 points. The standard error is estimated with allowance for clustering of pupils within schools but not stratification hence it is likely to be an overestimate. 18 At worst, the wording of the statement could appear illogical: a conclusion is made of potential response bias but the direction of the bias is said to be unknown. -62As noted earlier, we estimate bias in the estimated means and standard deviations in England to be larger than the estimated standard error. Viewed in this way, the estimated bias is quite large. This is not uncommon in large surveys: the larger the survey sample the smaller the standard error and hence bias comes to dominate mean squared error. However, does the bias have a substantial impact on the picture shown by PISA of differences in learning achievement between countries? We calculated how many places in a ‘league table’ of rankings of countries by their mean scores would England be shifted if the means for reading, maths and science were adjusted downwards by the estimated bias of 6 to 8 points?19 The answer is about one place in each case. Viewed in this way, the effect of the bias in the mean could be classified as small. The pupil response rate in England in 2003 of 77 percent was not far short of the minimum threshold deemed satisfactory of 80 percent. The extent of the bias in England when the threshold was nearly attained raises the issue of the extent in other countries with rates not far above the threshold. Australia, Austria, Canada, Ireland, Poland and the US all had pupil response rates of between 82 and 84 percent (OECD 2004: 327). Finally, it is worth emphasising that there was a third option available in the case of the UK besides the two of inclusion or exclusion of the results as they stood. The OECD could have allowed the use of ‘post-stratification’ weights to adjust the data for the response bias, which is a common practice in various types of survey. (Our weights based on the logistic regression modelling described above are one example of such a weighting system.) The policy of restricting the calculation of weights to use of information employed for forming survey strata rules out this option. 19 This calculation ignores possible response biases in other countries’ means scores. - 63 - 4 Bias analysis – 2000 data The structure follows that of Chapter 3. We do not repeat all the explanations of each bit of the analysis – we focus on describing the results. Chapter 2 described how the national data set used for 2000 is not directly comparable to that for 2003 since it includes pupils with special educational needs in normal schools and excludes pupils with no GCSE entries. The former are excluded from the 2003 data and the latter are included. The 2000 data relate also to persons born in a 12 month period, rather than 14 months as in 2003. The results presented in this chapter are not directly comparable to those for 2003 in Chapter 3. 4.1 Analysis at five different stages of sampling and response Table 4.1 gives mean values for each of stage (i) to (v), from the population through to the final PISA sample. Detailed results for each stage are presented in Appendix 2. Figures 4.1 to 4.4 graph the changes for KS3_aps, KS4_mean and KS4_pts. Tables 4.2 and 4.3 show the changes in the distributions of scores grouped into five intervals. As for 2003, differences between groups (ii) and (iii) do not give a clean estimate of the effect of school response on account of our definition of group (ii) as including pupils in all sampled schools irrespective of whether they are approached or not. The pattern as one moves from (i) to (v) is fairly similar to that for 2003. The means of the continuous variables tend to increase, although note they all fall between (i) and (ii). Mean ks3_apps in (v) is 2.4% higher than in (i) and mean ks4_pts is 8.6% higher. This compares with figures of 1.7% and 6.8% respectively for 2003. In terms of units of the group (i) standard deviations, mean ks3_aps rises by 0.12 units and mean ks4_pts by 0.18 units. The percentage of students at or above KS3 level 5 or with 5+ GCSEs A*-C is again notably higher in (v). Indeed the differences are somewhat larger than in 2003, e.g. for ks4_5ac it is 7.0 percent points compared to 5.2 points in 2003: 59 percent in group (v) and 52 percent in group (i). Finally, in contrast to 2003, it seems that there are some notable changes between stages (i) and (iv) rather than the changes being concentrated between stages (iv) and (v) as a result of pupil response. The graphs show the mean achievement scores rising as much through school response (and possibly replacement) – stages (ii) to (iii) – as through pupil response. As in 2003, the SDs fall at each successive stage for the three achievement variables in Figures 4.1 to 4.3, the fall coming especially in the final stage of pupil response. As a result of both an upward bias in the mean and a downward bias in the SD, the percentage of pupils in the final sample below the bottom decile of KS4_mean in the population is seriously under-represented, as in 2003 (Table 4.3). - 64 Following our analysis of the 2003 data, we focus in the rest of this chapter on response, by schools and pupils, and do not investigate further the effect of sampling. We merely note that some clear changes can be seen in Figures 4.1 to 4.3 between stages (i) and (ii) and, especially, between stages (iii) and (iv) as the result of student sampling. Table 4.1: Mean values of achievement variables at five stages, 2000 i All pupils in English schools ii all pupils in all sampled schools iii All pupils in all responding schools iv All sampled pupils in responding schools v Responding pupils Variable Male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac Mean*100 50.347 68.123 63.995 59.151 69.346 65.593 60.609 52.100 Mean*100 50.156 68.104 63.455 58.702 69.305 65.027 60.055 52.374 Mean*100 49.512 70.371 66.155 61.910 71.657 67.619 63.391 54.595 Mean*100 49.093 71.662 66.725 63.020 72.887 68.441 64.514 56.442 Mean*100 49.903 73.729 69.075 65.187 74.854 70.731 66.55 59.186 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac Mean 5.463 5.554 5.297 32.960 41.096 36.017 4.417 4.884 Mean 5.452 5.524 5.276 32.797 41.156 36.018 4.411 4.892 Mean 5.529 5.611 5.359 33.294 42.424 36.950 4.511 5.073 Mean 5.558 5.641 5.387 33.458 43.239 37.626 4.561 5.219 Mean 5.604 5.705 5.436 33.766 44.62 38.790 4.686 5.452 Note: data behind the results in columns (ii) and (iii) are weighted by ONS school weight (sch_bwt) and for columns (iv) and (v) by ONS student weight (stu_wt). - 65 - 34 7.1 33.8 6.9 33.6 6.7 33.4 6.5 33.2 6.3 33 6.1 32.8 5.9 32.6 5.7 average score 32.4 KS3 standard deviation KS3 average score (KS3_aps) Figure 4.1: Mean and standard deviation of average KS3 score (ks3_aps), 2000 5.5 standard deviation 32.2 5.3 i ii iii iv v Group of pupils 4.75 1.85 4.7 1.8 4.65 1.75 4.6 1.7 4.55 1.65 4.5 1.6 4.45 1.55 4.4 1.5 4.35 1.45 average score 4.3 standard deviation 4.25 1.4 1.35 i ii iii Group of pupils iv v Mean GCSE score standard deviation Mean GCSE point score (KS4_mean) Figure 4.2: Mean and standard deviation of GCSE point score per subject (ks4_mean), 2000 - 66 - Figure 4.3: Mean and standard deviation of total GCSE point score (ks4_pts), 2000 45 21 20.5 44 20 43.5 19.5 43 19 42.5 18.5 42 18 41.5 17.5 41 17 40.5 16.5 40 16 average score 39.5 standard deviation 39 Total GCSE score standard deviation Mean of total GCSE point score (KS4_pts) 44.5 15.5 15 i ii iii iv v Group of pupils Table 4.2: Percent of pupils in Key Stage 3 (ks3_aps) percentile range of the population, 2000 Percentile Range Point score <=10 >10 – 25 >25-75 >75-90 >90 2000 25 >25-29 >29-37 >37-41 >41 i All pupils in all schools ii All pupils in all sampled schools iii All pupils in responding schools 15.12 18.19 42.87 15.19 8.63 15.43 18.31 43.69 14.68 7.89 13.56 17.31 44.20 15.78 9.15 iv All sampled pupils in responding schools 12.23 18.01 44.69 15.99 9.07 v Responding pupils 10.36 17.52 46.15 16.77 9.20 Note: Results for (ii) and (iii) are based on data weighted with the ONS school weight (sch_bwt) and results for (iv) and (v) are for data weighted with the ONS pupil weight (stu_wt). - 67 - Table 4.3: Percent of pupils in GCSE points per subject (ks4_mean) percentile range of the population, 2000 Percentile Range Point score <=10 >10 – 25 >25-75 >75-90 >90 2000 2.11 >2.11-3.24 >3.24-5.67 >5.67-6.62 >6.62 i All pupils in all schools ii All pupils in all sampled schools iii All pupils in responding schools 10.00 14.98 50.26 14.72 10.04 10.11 14.86 50.53 14.57 9.92 8.91 14.09 51.13 14.88 10.99 iv All sampled pupils in responding schools 7.81 13.79 51.71 15.32 11.39 v responding pupils 5.18 13.29 53.88 16.26 11.39 Note: Results for (ii) and (iii) are based on data weighted with the ONS school weight (sch_bwt) and results for (iv) and (v) are for data weighted with the ONS pupil weight (stu_wt). - 68 - Figure 4.4: Kernel density plot of ks4_mean by different groups of 15 year olds in 2000 A ll 15 y ear olds P upils in all s am pled P IS A s c hools P upils in s am pled s c hools S am pled pupils Res ponding pupils 0 1 2 3 4 K s 4_m ean Note: unweighted 5 6 7 8 - 69 - 4.2 Response bias at school and pupil levels We follow the same pattern of analysis as in Chapter 3. The comments made there on the estimation of standard errors in the presence of clustering and stratification apply again here. 4.2.1 Comparison of responding and non-responding schools The unit of analysis is now the school and for each school we compute school values of all the variables by taking their averages across the school’s 15 year old students. A total of 543 schools were sampled for PISA: 181 ‘initial’ schools that were all approached, 181 first replacement schools that were approached in case an initial school did not respond, and 181 second replacement schools that were approached if both an initial school and its first replacement refused to respond. Of the 543 schools, a total of 306 were approached at some stage; all the 181 initial schools, 79 of the first replacement schools and 46 of the second replacements. Table 4.4 shows how they responded. Table 4.4: School response (numbers of schools), 2000 participation status did not participate full participation partial participation Total main school first second replacement replacement Total 71 46 34 151 102 32 11 145 8 1 1 10 181 79 46 306 Schools that participated but achieved a within response below 25 percent are treated as nonresponding schools in the analysis. As in the 2003 analysis, we investigate response bias at the school level in three different ways: a) Comparison of responding with non-responding schools in the initial sample (without taking first and second replacement into account). We compare the school means of characteristics of pupils in 110 responding schools with the school means in 71 non-responding schools. b) Comparison of all schools that responded with all schools that did not respond, irrespective of stage. We compare the school means of characteristics of pupils in 155 responding schools with the school means in 151 non-responding schools. c) Comparison of all schools that responded with second replacement schools that did not respond. We compare 155 responding schools with 34 non-responding schools. - 70 For the 2000 data we cannot describe characteristics of pupils in schools with pupil response rates below 25 percent since there is no separate identifier in the data set for such schools (we assume these schools are coded as non-respondents). a) All schools in the initial school sample Table 4.5: School type by response (number of schools): approach (a), 2000 School LEA maintained school Independent school Total Did not respond 64 7 71 responded 102 7 109 Total 166 14 180 Note: for one responding school no information on school type is available. Table 4.6: Differences in means: approach (a), 2000 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac Mean*100 49.79 14.76 69.97 66.23 59.34 54.51 Nonresponding school Mean*100 44.34 17.30 67.82 62.58 58.55 57.10 KS3_aps KS4_mean Mean 32.82 4.54 Mean 32.62 4.66 Responding school Difference p Observations -5.45 2.54 -2.15 -3.65 -0.79 2.59 0.31 0.45 0.56 0.33 0.86 0.63 180 181 176 177 178 180 -0.2 0.12 0.79 0.64 178 180 Note: weighted by ONS school weight, sch_bwt. p gives the significance level at which we can just reject the null hypothesis that the values are the same between responding and non-responding schools (adjusted Wald-test). As in 2003 we cannot reject the hypothesis that mean values in the two groups show no significant differences. And unlike in 2003 there are no significant differences in SDs between responding and non-responding schools at the 5% level, although in several cases we can reject the null at the 10% level. - 71 - Table 4.7: Differences in variances: approach (a): 2000 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 19.41 14.26 16.08 16.00 17.34 19.44 SD2 22.42 19.09 17.16 17.51 21.18 23.41 p 0.19 0.01 0.55 0.41 0.07 0.09 Responding & nonresponding school SD 20.65 16.32 16.50 16.64 18.95 21.03 KS3_aps KS4_mean SD1 2.93 0.89 SD2 3.55 1.05 p 0.08 0.13 SD 3.19 0.95 NonResponding Test of responding school SD1=SD2 school Note: results are for unweighted data. Figure 4.5: Distributions of KS3 and KS4 scores: approach (a), 2000 KS3 point score (ks3_aps) Mean GCSE point score (ks4_mean) Not responding school Responding schools Not responding school Responding schools 0 5 10 15 20 25 30 35 40 Mean GCSE point score 45 50 55 60 0 1 2 3 4 5 Mean GCSE point score Note: unweighted data. Kernel density estimates. b) All responding and non-responding schools (initial, first, second replacement) Table 4.8: School type by response (number of schools): approach (b), 2000 LEA maintained school Independent school Total Not responding 135 15 150 Responding 141 12 153 Total 276 27 303 Note: for one non-responding and two responding school no information are available on school type. 6 7 8 - 72 - Table 4.9: Differences in means: approach (b), 2000 Responding school Mean*100 Male 48.29 FSM 13.45 ELevel5fg 70.67 MLevel5fg 67.58 SLevel5fg 61.98 KS4_5ac 57.93 KS3_aps KS4_mean Mean 33.23 4.70 Non-responding school Mean*100 41.96 16.44 70.13 64.68 60.46 56.83 Difference p Observations -6.33 2.99 -0.54 -2.9 -1.52 -1.1 0.12 0.19 0.86 0.37 0.68 0.78 303 306 295 296 297 303 -0.37 -0.08 0.58 0.71 297 303 Mean 32.86 4.62 Note: weighted by ONS school weight, sch_bwt. Table 4.10: Differences in variances: approach (b), 2000 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 19.55 14.14 16.57 16.60 18.11 20.00 SD2 22.25 17.05 16.59 17.77 21.18 22.51 p 0.11 0.02 0.98 0.41 0.06 0.15 Responding & nonresponding school SD 20.96 15.71 16.60 17.28 19.77 21.27 KS3_aps KS4_mean SD1 3.27 0.90 SD2 3.37 1.00 0.70 0.22 SD 3.33 0.95 NonResponding Test of responding school SD1=SD2 school Note: results are unweighted. The results show little change from analysis (a), both as regards mean and variance. There are no significant difference in means or standard deviations at the 5% level for achievement variables. - 73 c) All responding schools and non-responding second replacement schools Table 4.11: School type by response (number of schools): approach (c), 2000 School LEA maintained schools Independent school Total Non-responding 31 responding 141 Total 172 3 34 12 153 15 187 Note: for two responding schools school type information is missing. Table 4.12: Differences in means: approach (c), 2000 Responding school Mean*100 Male 48.29 FSM 13.46 ELevel5fg 70.67 MLevel5fg 67.58 SLevel5fg 61.98 KS4_5ac 57.94 KS3_aps KS4_mean Non-responding school Mean*100 42.67 17.06 69.77 64.19 59.19 53.57 Mean 33.23 4.70 Mean 32.90 4.51 Difference p Observations -5.62 3.6 -0.9 -3.39 -2.79 -4.37 0.40 0.29 0.86 0.57 0.69 0.52 187 189 181 182 182 187 -0.33 -0.19 0.80 0.55 182 187 Note: weighted by ONS school weight, sch_bwt. p gives the significance level at which we can just reject the null hypothesis that the values are the same between responding and non-responding schools (adjusted Wald-test). Table 4.13: Differences in variances: approach (c), 2000 Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac SD1 19.55 14.14 16.57 16.60 18.11 20.00 SD2 22.03 13.92 14.78 16.89 21.64 21.21 p 0.38 0.96 0.41 0.90 0.20 0.67 Responding & nonresponding school SD 19.98 14.18 16.32 16.77 18.91 20.26 KS3_aps KS4_mean SD1 3.27 0.90 SD2 3.24 1.01 0.95 0.43 3.28 0.93 NonResponding Test of responding school SD1=SD2 school Note: results are for unweighted data. - 74 - With approach (c), we cannot reject the null hypothesis of no difference in mean values at even the 10% level. The same applies to tests of the standard deviation. In contrast to 2003, the differences in means between respondents and non-respondents in approach (c) seem not too dissimilar to those in approach (a). Finally, as in 2003, we use the pupil as the unit of analysis and test for differences in means between pupils in responding and non-responding schools, using approach (c) – see Table 4.14. Here we do find significant differences: pupils in responding schools have higher mean achievement. For example, the percentage of pupils with 5 or more GCSEs at A*-C is five percentage points higher in responding schools and mean KS3_aps for these pupils is five percent higher. In general, the differences are more marked than in 2003 and, also in contrast to 2003, they are always in one direction: pupils in responding schools have higher mean achievement. Table 4.14: Differences in means: approach (c) pupil as unit of analysis, 2000 Responding school Mean*100 Male 49.51 ELevel5fg 71.66 MLevel5fg 67.62 SLevel5fg 63.39 KS4_5ac 54.60 KS3_aps KS4_mean KS4_pts Mean 33.29 4.51 42.42 Non-responding school Mean*100 47.48 64.67 59.17 53.39 49.58 Mean 31.68 4.32 39.94 Difference p Observations -2.03 -6.99 -8.45 -10.00 -5.02 0.01 0.00 0.00 0.00 0.00 33,008 31,253 31,253 31,217 33,008 -1.61 -0.19 -2.48 0.00 0.00 0.00 31,051 33,008 33,008 Note: ONS school weights are applied to the data. Adjusted Wald-test. The clustering of students within schools is taken into account. 4.2.2 Comparison of responding and non-responding pupils The analysis of response bias at the student level includes 5,164 sampled PISA pupils in 153 schools of which 4,120 responded (79.8%) and 1,044 did not (20.2%). For all achievement variables there are significant differences at the 1% level or lower in both means and standard deviations. Respondents display a higher mean score and a lower dispersion in scores. These differences both contribute to produce large differences between respondents and non-respondents in the percentages at or above KS3 level 5 or with 5+ GCSEs A*-C, which are about 11 and 14 percentage points respectively (Table 4.15). The higher mean and lower dispersion of respondents is particularly clear in Figure 4.6 which shows the distributions of KS3_aps and KS4_mean scores. - 75 - Table 4.15: Differences in means and percentages, 2000 Non-responding Difference pupil Obs. Mean*100 936 45.65 -4.25 878 64.64 -10.21 875 58.80 -11.93 878 55.98 -10.57 936 44.78 -14.41 Responding pupil Male ELevel5fg MLevel5fg SLevel5fg KS4_5ac Obs. 3976 3681 3686 3682 3976 Mean*100 49.90 74.85 70.73 66.55 59.19 KS3_aps KS4_mean 3660 3976 Mean 33.77 4.69 Mean 32.16 4.03 866 936 -1.61 -0.66 P Observations 0.02 0.00 0.00 0.00 0.00 4912 4559 4561 4560 4912 0.00 0.00 4526 4912 Note: the ONS student weights are applied and standard errors allow for clustering by PSUs (PISA school). The χ2-test is used for tests for binary variables (no weights and PSU taken into account) and the adjusted Wald test for continuous variables. Table 4.16: Differences in variances, 2000 Responding pupils KS3_aps KS4_mean Mean 33.77 4.68 Non-responding pupils SD1 6.06 1.56 Mean 32.17 4.03 SD2 6.78 1.97 Responding & non-responding pupils Mean SD 33.46 6.24 4.56 1.66 Test of SD1=SD2 P 0.00 0.00 Note: results are unweighted. Clustering of pupils within schools is not taken into account in the statistical test. Figure 4.6: Distributions of KS3 average points and mean GCSE point score, 2000 KS3 points (ks3_aps) Mean GCSE point score (ks4_mean) Not responding pupils Responding pupils 0 5 10 15 20 25 30 35 40 KS3 average point score Not responding pupils Responding pupils 45 50 55 60 Note: data are unweighted. Kernel density estimates. 0 1 2 3 4 5 Mean GCSE point score 6 7 8 - 76 - 4.3 Using the OECD weights and our own response weights We compare use of the OECD weights and weights we calculate based on results of logistic regression modelling of the probability of pupil response. The OECD weights take into account the level and pattern (i.e. association with achievement) of school response and the level of pupil response. Our weights in addition take into account the variation of pupil response with achievement level. 4.3.1 Calculation of our non-response weights a) Modelling the response probability Table 4.187shows the result of logistic regression models for all sampled pupils of the probability of pupil response. Models 1a and b specify response as a function of gender, achievement score (a quadratic in KS4_pts), region and school size. Model 1a differs from Model 1b only in the estimation of the standard error. Standard errors are adjusted to take into account the clustering of pupils within schools in Model 1a while unadjusted standard errors are given in Model 1b. Model 2 omits the regional variable but includes a set of dummy variables for (results not reported) and a dummy variable on private schools (this variable was not significant in Models 1a and 1b). Model 2 is based on a smaller sample size since 69 pupils in schools where all pupils responded and 5 pupils in schools where no pupil responded were omitted.20 In contrast to the 2003 data, no information on free school meals, language spoken at home or ethnicity is available for use in the modelling. The coefficients of the school dummy variables in Model 2 are significant at the 5 percent level for only 21 out of 155 schools, which is substantially less than in the analogous model for the 2003 data. The quadratic specification for KS4_pts works well but the turning point comes at a lower score level than in 2003. The predicted probability of response varies somewhat less with KS4_pts than in 2003 – compare Figure 4.7 with Figure 3.7. In contrast to 2003, school size and gender are statistically significant. 20 Pupils in schools in which the pupil response rate was less than 25% in principle are omitted since these schools are treated in PISA as not responding to the survey. However, some of these individuals have remained in the sample we use. - 77 - Table 4.17: Logistic regression models of pupil response, 2000 (1a) 0.278 (0.089)*** 0.087 (0.008)*** -0.082 (0.009)*** -1.303 (0.227)*** 0.484 (0.157)*** -1.059 (0.713) Male Ks4pts ks4_pts squared/100 Missings West Midlands School size/100 (1b) 0.278 (0.076)*** 0.087 (0.007)*** -0.082 (0.008)*** -1.303 (0.142)*** 0.484 (0.124)*** -1.059 (0.415)** (2) 0.267 (0.086)*** 0.096 (0.008)*** -0.089 (0.009)*** -2.119 (0.192)*** Private school Constant -0.439 (0.213)** 5164 Observations Pseudo R-squared log-lklhd F statistic -1.423 (0.552)*** -0.480 (0.436) 5090 0.14 -2214.27 -0.439 (0.154)*** 5164 0.06 -2439.06 32.71 Note: the variable school size gives the number of 15 year olds in schools (this number is estimated with the population data and is not equal to the variable ‘mos’ that gives the number of 15 year olds at the time of the construction of the sampling frame). Model 1a takes clustering of pupils in schools for the estimation of standard errors into account. Model 2 also includes 155 school dummies. Figure 4.7: Predicted probability of pupil response by GCSE point score, 2000 predicted probability of response 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 Ks4pts 50 60 70 80 Note: Predicted probabilities based on Model 1. Other variables set to their mean values. The graph displays predicted probabilities for values of the GCSE points scores between the 5th and 95th percentiles of the sample values. - 78 - b) Weighting for non-response We construct a student non-response weight by calculating the inverse of the predicted probabilities for responding pupils. Summary statistics are given in Table 4.18. Weight1 is the non-response weight based on Model 1 and weight2 is based on Model 2. The weights based on Model 2 display much more variation due to the inclusion in that model of the set of school dummies. In contrast to 2003, we then trim the highest values to a maximum equal to the mean multiplied by the ratio of the OECD maximum student weight to its mean. This results in maxima of 586 for weight1 and 577 for weight2 (affecting the values for 16 and 17 observations respectively). Table 4.19: Summary statistics of non-response weights derived from logistic regression, 2000 Variable weight1 weight2 Obs 4120 4051 Mean 1.253 1.257 Std. Dev. 0.197 0.379 Min 1.080 1.012 Max 3.006 10.448 Note: weight1 gives the inverse of the response probability derived from Model 1 (Table 4.21); weight2 refers to Model 2. These constructed non-response weights have smaller means and lower standard deviations than in 2003. This suggests that their use will lead to smaller changes in results than in 2003. c) Construction of a combined weight As in 2003 we combine the weights just described with the OECD school weight (‘wnrschbw’ in 2000) and allow also for the sample design. The weight for pupil j in school i is calculated as follows, where Mi is the total number of 15 year old pupils in school i and mi is the number of sampled pupils: R_WEIGHT1ij = wnrschbwi * Mi/mi * weight1ij R_WEIGHT2ij = wnrschbwi * Mi/mi * weight2ij Table 4.19 compares summary statistics for the ONS student weight, the OECD student weight and our two new weights (taking into account the trimming of the extreme values described above).21 21 As noted in Chapter 2, five schools have values of the OECD school weight of 0 despite a positive student weight (for a reason we do not understand) and this explains the minimum of 0 for our weights in Table 4.19. - 79 - Table 4.19: Summary statistics of different student weights, 2000 Variable ONS student weight stu_wt OECD student weight w_fstuwt R_WEIGHT1 R_WEIGHT2 Obs Mean Std. Dev. Min Max CV 4120 .02868 .00136 .0286 .05 0.047 4120 135.98 48.614 33.12 588.56 0.358 4120 4051 135.46 133.39 77.79 68.72 0 0 1934.40 1167.03 0.574 0.515 Note: R_WEIGHT1 and R_WEIGHT2 are the product of the OECD school weight and other weight factors as described in Section 3.3.1. The OECD school weight has the value of zero for 90 responding pupils in 5 schools. For those students R_WEIGHT1 and R_WEIGHT2 are also equal to zero. 4.3.2 Results Table 4.20 shows the result of applying the three sets of weights: ONS student, OECD student and our own calculated weights. We also show results for unweighted data. Obs Standard deviation Mean Table 4.20: KS3 and KS4 scores for responding pupils, different student weights, 2000 ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 For ONS and OECD weight For R_WEIGHT1 For R_WEIGHT2 KS3_aps KS4_mean KS4_pts 33.77 33.60 33.28 33.35 6.06 6.15 6.32 6.34 3,660 3,573 3,510 4.69 4.67 4.58 4.59 1.56 1.58 1.68 1.69 3,976 3,887 3,820 44.62 44.40 43.29 43.45 17.40 17.59 18.68 18.71 3,976 3,887 3,820 Note: the number of observations in 2 refers to calculations with R_WEIGHT2, the number of observations in 1 refer to all other weight applications. Mean achievement scores are somewhat lower with the OECD student weights than with the ONS weights. The application of our own weights pushes the means down further, close to the means for all sampled pupils in responding schools in 2000 shown in Table 4.1, the values of which we reproduce below.22 The application of our weights also pushes up the standard deviations to about the level for all sampled pupils shown in Figures 4.1 to 4.3. 22 This comparison glosses over the fact that our weights include an adjustment for school non-response via their incorporation of the OECD school weights. - 80 KS3_aps KS4_mean KS4_pts 4.4 33.46 4.56 43.24 Response bias in PISA scores We estimate the impact of response bias on PISA scores in 2000 using the three approaches outlined in Chapter 3, Section 3.4: (a) z-scores, (b) regression adjustment and (c) response weights. The mean PISA scores in reading, maths and science for the final sample of responding pupils in responding schools in England in 2000 are given in Table 4.21. These are results obtained by applying the ONS student weights, and not the OECD student weights (results obtained with the latter are very similar – see Table 4.24 below). Table 4.21: PISA means and standard deviations (ONS weights), 2000 PISA reading Mean SD 524.13 99.26 PISA maths Mean SD 529.27 90.59 PISA science Mean SD 534.27 96.57 Note: mean scores are calculated with the average of the five PVs. The standard deviations are the average of the standard deviations calculated separately for each PV. The calculations are based on the final PISA sample of respondents but are weighted with the ONS student weights. As for 2003, we now try to assess how these scores might have changed if there had not been any response bias to the survey at the pupil level. 4.4.1 Z-score method The procedure for the adjustment is outlined in Chapter 3, section 3.4: we take the differences in KS3 and KS4 means between responding and sampled students, standardise these by the KS3 and KS4 standard deviations and then multiply by the PISA standard deviations in table 4.27. Table 4.22: Calculation of bias in the PISA means: z-score method, 2000 7. 6. 5. 3. 2. 1. Multiplied Multiplied 4. Multiplied SD Mean Mean with SD with SD (Mrwith SD responding sampled sampled PISA PISA Ms)/SD PISA read pupils=Mr pupils=Ms pupils science maths KS3_aps 33.77 33.46 6.24 0.050 4.96 4.53 4.83 KS4_mean 4.68 4.56 1.66 0.072 7.15 6.52 6.95 Note: the KS3 and GCSE mean scores are calculated on unweighted data, but weighting with ONS weights does not make a big difference. - 81 - The difference in mean KS3 points between responding pupils and sampled pupils translates into a difference of the mean PISA scores of about 5 points. As in 2003, the difference in mean KS4 points translates into rather larger differences in PISA scores, about 7 points. Taking the adjustment based on the KS3 results, mean England PISA scores in 2000 allowing for student non-response can be estimated to be 519 instead of 524 for reading, 525 instead of 529 for maths, and 529 instead of 534 for science. 4.4.2 Regression method We regress PISA scores for 2000 respondents on their KS3 (or KS4) scores. We use the coefficient of the KS3 (or KS4) as the PISA equivalent of a KS3 (or KS4) point, and then apply this to the difference in the KS3 (or KS4) means for respondents and all sampled pupils. The regressions are reported in Chapter 5. As in 2003, PISA scores and KS3 and KS4 scores are strongly correlated. For example, PISA reading has a correlation of 0.83 with KS3_aps and 0.81 with KS4_mean. Regressions of PISA reading, maths and science on KS3_aps yields slope coefficients of 12.64, 11.31 and 12.07 respectively (using weighted data). Applying these values to the differences in means in KS3_aps of respondents and sampled pupils in Table 4.22 yields estimates of bias in mean PISA points of 3.9, 3.5 and 3.7 respectively for reading, maths and science. Differences estimated by using GCSE scores are greater – see Table 4.23. As in 2003, these figures are smaller than the figures obtained with the z-score method. Table 4.23: Calculation of bias in PISA means: regression method, 2000 Ks3_aps KS4_mean Reading 3.92 5.97 Maths Science 3.51 3.74 5.38 5.56 4.4.3 Use of response weights The third method to estimate the extent of bias is to calculate mean scores with and without use of the response weights applied in Section 3.3 – the OECD student weights and our own weights based on the logistic regression modelling of student response. Application of the OECD weights reduces the means by between 0.25 and 0.75 of a point and increases the standard deviations by about 1 point. Application of our weights has a substantially larger impact in both cases. Table 4.25 estimates response bias in mean scores by comparing the results using the ONS weights with the results using our weight R_WEIGHT1. The upward bias in PISA scores in the sample of respondents is estimated as about 6 in the mean. The downward bias in the standard deviation is estimated at about 4 points. - 82 - Standard deviation Mean Table 4.24: PISA scores for responding pupils, different student weights, 2000 ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 ONS STU_WT OECD w_fstuwt R_WEIGHT1 R_WEIGHT2 PISA maths 529.27 529.06 523.56 524.30 90.59 91.71 94.87 94.57 PISA reading 524.13 523.41 518.02 518.22 99.26 100.34 103.91 103.62 PISA science 534.27 533.46 528.32 527.99 96.57 97.50 100.76 100.67 Obs 2292/4120/2284 2292/4120/2284 2240/4030/2234 2203/3961/2197 2292/4120/2284 2292/4120/2284 2240/4030/2234 2203/3961/2197 Note: the standard deviation is calculated as the mean of the five standard deviations of each plausible value. The mean value refers to the pupils’ mean across all five plausible values. The three numbers in the last column give the number of observations used for maths, reading and science respectively. In PISA 2000, reading scores are available for all pupils but maths and science scores for only half of them. This leads to different OECD weights depending on whether reading, maths or science scores are calculated. These different weights are applied in our analyses. Table 4.25: Calculation of bias in PISA means and standard deviations: response weights method, 2000 Mean Standard Deviation reading +6.1 -4.7 maths +5.7 -4.3 science +6.0 -4.2 Note: the table shows differences between means and standard deviations in Table 4.31 calculated with the ONS student weight and R_WEIGHT1. Table 4.26 summarises our results, showing the estimated upwards bias in mean PISA scores resulting from pupil non-response. In the case of the response weight method, the estimate also takes into account the impact of non-response at the school level (since our weight R_WEIGHT1 includes the OECD school weight). An estimate of about 5 or 6 points seems appropriate (based on the regression method and the KS4 results or the response weights). Table 4.26: Summary of estimate of bias in PISA means: different methods, 2000 z-score method regression method response weight Ks3_aps Ks4_mean Ks3_aps Ks4_mean R_WEIGHT1 R_WEIGHT2 Reading 5.0 7.3 3.9 6.0 6.1 5.9 Maths 4.6 6.6 3.5 5.4 5.7 5.0 Science 4.8 7.1 3.7 5.6 6.0 6.3 Note: estimates taken from Tables 4.22, 4.23 and 4.25. - 83 - 4.4.4 Implications How does the estimated bias in the PISA means compare with the estimates of their standard errors? The published standard errors of mean PISA scores in England in 2000 were 3.0 for reading, 2.9 for maths and 3.2 for science; and standard errors for the standard deviation were 1.7 for reading, 1.8 for maths and 2.3 for science (Gill et al, 2002: Tables 3.1, 4.1, 5.1, A3.1). Our estimates of the biases are therefore about double these figures, as in 2003. But, again as in 2003, the impact of the bias on England’s position in a ranking of OECD countries by mean scores is small. Mean scores that were lower by 6 points would have moved England by just one place in a ranking for reading, three places for science, and would have left England’s place in the ranking unchanged for maths. Our guess is that had the OECD been faced with the evidence on the relative sizes of the bias and the standard errors at the time, the data for England for 2000 would have been judged in the same way as the data for 2003: unreliable for estimates at the national level. However, it is not clear that there would have been any need for DfES to have uncovered and then presented that evidence. The pupil response rate for England in 2000 of 81 percent (see Table 1.1) was just above the required minimum threshold of 80 percent. By this criterion, pupil response in England in 2000 was at a level that was considered satisfactory. This demonstrates the problem in relying on arbitrary threshold levels. (See also Sturgis et al 2006 on this subject.) It also begs the question as to the extent of response bias in countries with pupil response rates that were little higher than the rate in England. Australia, Canada, Germany, Ireland, the Netherlands and the US all had rates of between 84 and 86 percent (OECD 2001a: 235). - 84 - - 85 - 5 Changes over time This chapter compares results for 2000 and 2003. We start in Section 5.1 with an overview of changes in means and standard deviations of domestic achievement measures (KS3 and KS4) for each of the five groups identified at the end of Chapter 1. Section 5.2 then summarises the key results of the bias analyses presented in Chapters 3 and 4. Section 5.3 compares regression models for 2000 and 2003 in which PISA scores for respondents are regressed on domestic achievement scores (KS3 and KS4). Our aim is to see if there has been a change in relationship between the two. In comparing results for 2000 and 2003 we try to allow for differences in the definitions of the national databases for the two years. Chapter 2 described how 2000 national data set excludes pupils with no GCSE entries and includes pupils with special educational needs. The 2003 data include pupils with no GCSE entries and in our analysis of the 2003 data we excluded pupils with special educational needs in order to proxy permitted PISA exclusions. In order to compare results for 2000 and 2003 more directly we therefore revised the 2003 national database to exclude pupils with no GCSE entries and to include children with special educational needs in normal schools. (Pupils in special schools continued to be excluded from both the 2000 and 2003 datasets.) The results for 2003 in this chapter therefore are slightly different from those in Chapter 3. It is important to note that the 2000 and 2003 national databases differ even after this adjustment. For example, the 2003 database is still based on a 14 month birth window (although this seems not to be a big issue) and there may be other differences that we are unaware of. 5.1 Summary of results for different stages of sampling and response Figure 5.1 shows the mean and standard deviations for the three domestic achievement variables on which we have focused, KS3_aps, KS4_mean, and KS4_pts, for each of the five groups identified in Chapter 1: (i) the population of all 15 year old pupils, (ii) all pupils in sampled schools, (iii) all pupils in responding schools, (iv) sampled pupils in responding schools, and (v) responding pupils in responding schools. The lines with the solid black symbols in each diagram show results for 2003. In the case of the two ‘quantity’ measures, KS3_aps and KS4_mean, the population values for the mean scores in 2003 [group (i)] are above those for 2000, reflecting a rise in average domestic achievement between the two years. (The standard deviations are also higher in 2003, showing greater dispersion of achievement in the later year.) The means for KS3_pts in the population are the same in the two years. - 86 - Figure 5.1: means and standard deviations of KS3 and KS4 variables, 2003 and 2000 KS3 score per subject (KS3_aps):mean KS3 score per subject (KS3_aps): SD 8 KS3 score (ks3_aps) standard deviation 35 Average KS3 score (ks3_aps) 34.5 34 33.5 33 2003 32.5 2000 2003 7.5 2000 7 6.5 6 5.5 32 5 i ii iii iv v i ii Group of pupils GCSE points per subject (KS4_mean): mean v GCSE mean (ks4_mean) standard deviation 2 4.8 Average GCSE score (ks4_mean) iv GCSE points per subject (KS4_mean): SD 4.9 4.7 4.6 4.5 4.4 2003 4.3 2000 4.2 1.9 2003 2000 1.8 1.7 1.6 1.5 1.4 1.3 i ii iii iv v i ii Group of pupils iii iv v Group of pupils Total GCSE points score (KS4_pts): mean Total GCSE points score (KS4_pts): SD 46 24 45 44 43 42 41 40 2003 39 2000 38 Total GCSE score (ks4_ptc) standard deviation Average total GCSE point score (ks4_ptc) iii Group of pupils 2003 23 2000 22 21 20 19 18 17 16 i ii iii Group of pupils iv v i ii iii Group of pupils Note: estimates are taken from Table 4.1 for 2000 and Table A4.6 in Appendix 4 for 2003. iv v - 87 Looking at the effect of pupil response, which is shown by the change in values between groups (iv) and (v), the graphs suggest that the impact was similar in the two years. The slope of this segment of the lines is broadly the same for each measure of achievement and for both the mean and the standard deviation. The impact of school response is incorporated in the change in values between groups (ii) and (iii), although we have noted that this change also reflects the impact of replacement given our definition of group (ii). Here there is a marked difference between 2000 and 2003. In 2000, the means rise as a result of school response, the amounts being more or less the same as for the impact of pupil response. In 2003 the means fall slightly. The standard deviations fall in both years. The impact of sampling is in principle shown by comparing groups (i) and (ii) for school sampling and (iii) and (iv) for pupil sampling. There is no clear pattern for school sampling. But the mean appears to rise and the standard deviation fall in both years as a result of the pupil sampling. This could be a perfectly legitimate result of the random sampling process: by chance, samples were drawn in each year in which mean achievement was higher and dispersion lower. Or it could be evidence in favour of poor sampling, with schools wrongly excluding weak pupils who should have been included. Or it may reflect the PISA exclusions policy correctly applied by schools. Or it could be the result of our inability to correctly identify the PISA target population in our definition of group (iii). (We have commented for example on the issue of the width of the birth window in 2003.) Caution is needed here. Tables 5.1 and 5.2 summarise the impact of the changes in mean and dispersion on the percentage of persons in ranges of KS3 or KS4 population percentiles. In both years it is the under-representation in the final sample, group (v), of pupils at the bottom of the KS4 distribution that is most obvious, even if one restricts attention only to the impact of pupil response as shown by the change between groups (iv) and (v). Table 5.1: Percent of pupils in Key Stage 3 (ks3_aps) percentile range of the population, 2003 and 2000 iv Percentile Point iii ii i v All sampled Range score All pupils All pupils in All pupils in Responding pupils in responding all sampled in pupils responding PISA PISA England weighted schools schools schools Schools weighted 2003 <=10 <=25 12.20 11.90 11.67 10.87 9.54 >10 – 25 >25-29 14.90 14.75 14.83 14.79 12.97 >25-75 >29-39 53.01 52.67 53.88 54.57 56.34 >75-90 >39-43 13.33 13.56 13.48 13.70 14.57 >90 >43 6.56 7.12 6.13 6.06 6.58 2000 <=10 25 15.12 15.43 13.56 12.23 10.36 >10 – 25 >25-29 18.19 18.31 17.31 18.01 17.52 >25-75 >29-37 42.87 43.69 44.20 44.69 46.15 >75-90 >37-41 15.19 14.68 15.78 15.99 16.77 >90 >41 8.63 7.89 9.15 9.07 9.20 - 88 - Table 5.2: Percent of pupils in average GCSE (ks4_mean) percentile range of the population, 2003 and 2000 Percentile i ii iii iv 2003 <=10 <=2.0 11.14 10.91 10.75 8.91 >10 – 25 >2.0-3.2 13.59 13.53 14.02 14.83 >25-75 >3.2-5.7 50.27 49.92 51.12 51.47 >75-90 >5.7-6.6 15.06 15.28 14.92 15.53 >90 >6.6 9.95 10.37 9.19 9.26 2000 <=10 <=2.105 10.00 10.11 8.91 7.81 >10 – 25 >2.105-3.238 14.98 14.86 14.09 13.79 >25-75 >3.238-5.667 50.26 50.53 51.13 51.71 >75-90 >5.6667-6.619 14.72 14.57 14.88 15.32 >90 >6.619 10.04 9.92 10.99 11.39 5.2 v 6.35 13.37 53.61 16.88 9.79 5.18 13.29 53.88 16.26 11.39 Response bias at school and pupil levels Table 5.3 summarises the results of the analyses of school and pupil level response bias in the second section of each of Chapters 3 and 4. In the case of 2000, the results are exactly as in Chapter 4. In the case of 2003, the results are slightly different from those in Chapter 3 and are based on the adjusted data for that year (results are given in full in Appendix 5). Significant differences at the 5 percent level are printed in bold. The table focuses on the means of the continuous score variables, the binary variables measuring whether someone is above or below a threshold score, and on gender and FSM receipt. In each case it is the differences in values for respondents and non-respondents that is shown. A negative difference indicates that the ‘PISA value’ is higher. That is, a negative value means that the mean is higher for schools or pupils that responded to PISA. We do not summarise results for the variance of scores. Three sets of results are shown. In the case of response by schools, we show two sets of results: (i) those with the school as the unit of analysis and looking at response in the initial sample of schools only (approach (a) in sections 3.2.2 and 4.2.2), and (ii) those with the pupil as the unit of analysis and comparing pupils in responding schools with pupils in second replacement schools that did not respond (approach (c) in sections 3.2.2 and 4.2.2). - 89 - Table 5.3: Summary of differences in means due to school and pupil response, 2000 and 2003 1. difference in school average values between schools responding in the initial sample and schools not responding 2003 2000 2. differences between pupils in responding schools (initial, first or second replacement) and pupils in schools not responding (second replacement) 2000 2003 mean*100 p mean*100 p mean*100 p mean*100 p Male -5.45 0.31 -1.77 0.78 -2.03 0.00 6.12 0.00 FSM 2.54 0.45 8.38 0.04 4.89 0.00 ELevel5fg -2.15 0.56 0.30 0.95 -6.99 0.00 -6.18 0.00 MLevel5fg -3.65 0.33 -0.82 0.86 -8.45 0.00 -2.79 0.00 SLevel5fg -0.79 0.86 0.18 0.97 -10.00 0.00 -4.70 0.00 KS4_5ac 2.59 0.63 3.87 0.57 -5.02 0.00 1.03 0.24 Mean Mean Mean Mean KS3_aps -0.20 0.79 0.07 0.94 -1.61 0.00 -0.36 0.01 KS4_mean 0.12 0.64 0.00 0.99 -0.19 0.00 0.11 0.00 3. differences between pupils responding and those not responding 2000 2003 Male mean*100 p mean*100 p -4.25 0.02 0.17 0.94 3.16 0.00 FSM ELevel5fg -10.21 0.00 -10.31 0.00 MLevel5fg -11.93 0.00 -11.61 0.00 SLevel5fg -10.57 0.00 -10.16 0.00 KS4_5ac -14.41 0.00 -15.64 0.00 Mean Mean KS3_aps -1.61 0.00 -1.64 0.00 KS4_mean -0.66 0.00 -0.60 0.00 Source: estimates are taken from Tables 4.6, 4.15 and 4.15 for 2000 and Tables A5.1 to A5.3 of Appendix 5 for the adjusted 2003 data. Note: a negative value means that the mean is higher for schools or pupils that responded to PISA. Results in columns 1 and 2 are based on data weighted with the ONS school weights, results in column 3 are based on data weighted with the ONS student weights. - 90 The following similarities and differences between 2000 and 2003 can be seen: • The mean characteristics of non-responding schools are almost always insignificantly different from those of responding schools (block 1 of the columns of results). Only the FSM variable in 2003 shows significant differences. (Note that when we used the larger samples of all responding and non-responding schools (irrespective of initial, first or second replacement status), some significant differences did begin to emerge in both years – at least at the 10% level.) • Switching to the pupil as the unit of analysis rather than the school leads to significant differences for all but one measure (block 2): the percentage of pupils with 5 or more GCSEs at A*-C in 2003. The differences between responding and non-responding schools for the achievement variables are typically much smaller in 2003. The sign for the difference for the KS4_mean variable is not the same for the two years. • The pattern and size of differences between responding and non-responding pupils is very similar in the two years for all the achievement measures (block 3). There are clear significant differences between the achievement of responding and non-responding students e.g. a difference of at least 10 percent points in the percentage of students above KS3 level 5 in all three subjects and about 15 points for. - 91 - 5.3 Associations between PISA scores and KS3 and KS4 scores This section considers whether the association between the PISA scores of respondents and their scores on the domestic achievement measures has changed over time. Table 5.4 gives the correlations between PISA scores and the continuous KS3 and KS4 measures for each year (weighted data) Table 5.4: Correlations of PISA scores and KS3 and KS4 scores, 2000 and 2003 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac Maths 0.656 0.823 0.808 0.826 0.781 0.793 0.800 0.744 PISA 2000 Reading Science 0.726 0.678 0.763 0.772 0.806 0.826 0.829 0.824 0.797 0.777 0.808 0.789 0.813 0.795 0.766 0.749 Maths 0.644 0.847 0.811 0.828 0.739 0.770 0.778 0.715 PISA 2003 Reading Science 0.692 0.653 0.773 0.804 0.781 0.807 0.806 0.812 0.748 0.737 0.784 0.770 0.791 0.775 0.720 0.715 The direction of change is summarised in Table 5.5. Table 5.5: Summary of correlation coefficients ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac Changes in the correlation between the two years PISA PISA PISA Maths Reading Science + + + = = - Note: ‘-‘ signifies that the correlation is smaller in 2003 than in 2000, ‘=’ that it is similar (+/- 0.005), ‘+’ that the correlation is greater in 2003. Correlations typically fall between the two years but the changes are only very slight. The average correlation between a PISA score and a KS3 or KS4 score is virtually unchanged: 0.78 in 2000 and 0.76 in 2003 (weighted data). We cannot explain the apparent fall in mean PISA achievement between the two years by a change in correlation between PISA and national English achievement measures. We now look at the associations in more detail using scattergrams and simple regression models. Figures 5.2 and 5.3 show respondents’ PISA scores in reading, maths and science plotted against their - 92 KS3 scores (Figure 5.2) and their GCSE scores (Figure 5.3). The regression lines reported in these diagrams are for the unweighted data while Tables 5.6a to 5.6e provide results on weighted data (using the OECD student weight). The diagrams suggest that a simple linear model relating PISA scores to KS3 or KS4 scores is a reasonable choice of specification. Models using the weighted data were used in Sections 3.4 and 4.4 as one of our methods for estimating the extent of response bias in PISA scores. Figures 5.4a and 5.4b compare the regression lines for 2000 and 2003. The lines are plotted so as to cover the range from the 5th to the 95th percentile of PISA scores in the data. The typical pattern is for the 2003 line – the dashed line – to be a simple downwards shift of the 2000 line. In other words, what typically changes is the constant term and not so much the slope coefficient. The latter clearly does however change in the case of (i) PISA reading and KS4_mean, (ii) PISA reading and KS3 reading fine grade and KS3 maths fine grade. In these cases, the slope is lower in 2003. The changes in the constant are often large. For example, the constant in the (unweighted) regression of PISA maths on KS3_aps falls from 142 to 103. One way of summarising the results is to predict PISA scores from the regressions for the two years for a given value of KS3 or KS4 score. We take the mean KS3 and KS4 scores in the 2003 data and calculate the predicted score on the basis of the 2000 regression line and again on the basis of the 2003 regression line. The results are shown in Table 5.7. There are some very big differences. The predicted PISA scores given the mean 2003 KS3_aps score differ by 26 to 33 points (depending on the PISA subject) between the two years. The predictions from the regressions with KS4_mean differ by less, 11-19 points, but this is still a substantial difference. The conclusion from this analysis is as follows. While there has been not much change in the correlations between PISA and KS3 or KS4 between the two years, there certainly has been a major change in the relationship between the two. For any given KS3 or KS4 score, a person in 2000 would on average have obtained a much higher PISA score. Of course, part of that story was obvious a priori from simply considering the changes between 2000 and 2003 in mean KS3 and KS4 scores (generally up) and in PISA scores (down). What the correlation and regression analysis does is to provide some insight into the nature of the changes. The story of a change in the constant term is about the simplest that could be found. One interpretation of it would be that for a given level of competence in KS3 or KS4, students in 2003 were ‘less good’ at PISA than in 2000. The explanation of that could be that the PISA test has changed in quality over time. But another is that a given level of KS3 or KS4 score signals a different thing in the two years. - 93 - Figure 5.2: Regression of PISA scores on KS3 average score (KS3_aps) (unweighted) 2003 800 800 700 700 600 600 PISA reading score PISA reading score 2000 500 400 300 PISA read = 93.09 + 12.62*KS3 R2 = 0.68 200 500 400 300 PISA read = 119.3 + 11.06*KS3 R2 = 0.65 200 100 100 10 20 30 40 50 10 60 20 800 700 700 600 600 PISA maths score PISA maths score 800 500 400 300 PISA maths = 142.14 + 11.34*KS3 R2 = 0.68 200 40 50 60 500 400 300 PISA maths = 102.73 + 11.55*KS3 R2 = 0.68 200 100 100 10 20 30 40 50 60 10 20 KS3 average scores 800 800 700 700 600 600 500 400 300 PISA science = 121.84+12.06*KS3 R2 = 0.68 200 30 40 50 60 KS3 average scores PISA science score PISA science score 30 KS3 average scores KS3 average scores 500 400 300 200 PISA science = 89.01 * 12.30*KS3 R2 = 0.65 100 100 10 20 30 40 KS3 average scores 50 60 10 20 30 40 KS3 average scores 50 60 - 94 - Figure 5.3: Regression results of PISA scores on GCSE scores (KS4_mean) (unweighted) 2000 2003 800 800 700 600 PISA reading score PISA reading score 700 500 400 300 200 2 3 4 5 6 400 300 PISA read = 304.23 + 43.92*GCSE R2 = 0.62 100 100 1 500 200 PISA read = 290.92+49.93*GSCE R2 = 0.66 0 600 7 0 8 1 2 3 4 5 6 7 8 Mean GCSE point score Mean GCSE point score 800 800 700 600 PISA maths score PISA maths score 700 500 400 300 PISA maths = 320.04+44.77*GCSE R2 = 0.64 200 600 500 400 300 PISA maths = 305.54+43.74*GCSE R2 = 0.60 200 100 100 0 1 2 3 4 5 6 7 0 8 1 2 Mean GCSE point score 4 5 6 7 8 800 800 700 PISA science score 700 PISA science score 3 Mean GCSE point scores 600 500 400 300 PISA science = 314.20+47.07*GCSE R2 = 0.64 200 1 2 3 4 5 Mean GCSE point score 6 7 500 400 300 200 100 0 600 8 PISA science = 47.306x + 301.52 R2 = 0.5954 100 0 1 2 3 4 5 Mean GCCSE point score 6 7 8 - 95 - Table 5.6a: Regression of PISA score on KS3 score (KS3_aps) Ks3_aps Constant Observations R-squared reading 12.643 (0.141) 92.665 (4.822) 3660 0.69 2000 maths 11.313 (0.171) 143.293 (5.839) 2038 0.68 science 12.078 (0.184) 121.446 (6.292) 2033 0.68 2003 maths 11.541 (0.133) 102.303 (4.702) 3444 0.69 reading 11.072 (0.139) 118.045 (4.896) 3444 0.65 science 12.322 (0.151) 87.450 (5.335) 3444 0.66 Table 5.6b: Regression of PISA score on GCSE score (KS4_mean) ks4_mean Constant Observations R-squared reading 49.787 (0.567) 291.221 (2.797) 3976 0.66 2000 maths 44.796 (0.715) 319.611 (3.537) 2211 0.64 science 46.327 (0.754) 317.072 (3.727) 2207 0.63 reading 43.898 (0.560) 303.679 (2.737) 3676 0.63 2003 maths 43.766 (0.584) 304.791 (2.853) 3676 0.60 science 47.335 (0.636) 300.479 (3.107) 3676 0.60 2003 maths 52.940 (1.067) 206.031 (6.094) 3469 0.42 science 58.372 (1.151) 187.792 (6.574) 3469 0.43 2003 maths 59.632 (0.637) 150.820 (3.845) 3469 0.72 science 61.691 (0.774) 150.930 (4.673) 3469 0.65 Table 5.6c: Regression of PISA scores on KS3 English fine grade score ks3Efg Constant Obs R-squared read 63.383 (0.989) 163.584 (5.613) 3681 0.53 2000 math 51.033 (1.299) 239.200 (7.357) 2045 0.43 science 56.414 (1.356) 211.249 (7.722) 2041 0.46 read 56.024 (0.991) 188.171 (5.661) 3469 0.48 Table 5.6d: Regression of PISA scores on KS3 maths fine grade score ks3Mfg Constant Obs R-squared reading 60.836 (0.849) 172.172 (4.921) 3686 0.58 2000 maths 58.881 (0.897) 188.490 (5.203) 2049 0.68 science 59.062 (1.075) 192.538 (6.220) 2048 0.60 reading 53.645 (0.747) 185.647 (4.507) 3469 0.60 Table 5.6e: Regression of PISA score on KS3 science fine grade ks3Sfg Constant Obs R2 reading 73.689 (0.893) 118.259 (4.920) 3682 0.65 2000 maths 66.207 (1.068) 164.526 (5.888) 2047 0.65 science 72.481 (1.094) 135.145 (6.016) 2041 0.68 reading 67.629 (0.919) 121.616 (5.254) 3469 0.61 Note: results are based on data weighted with the OECD student weights. 2003 maths 71.334 (0.873) 101.274 (4.990) 3469 0.66 science 77.278 (0.961) 80.049 (5.494) 3469 0.65 - 96 - Figure 5.4a: Predicted PISA achievement scores by KS3 and KS4 scores GCSE point score, ks4_mean 700 700 650 650 600 600 PISA reading score PISA reading score PISA reading KS3 average score, ks3_aps 550 500 450 2000 400 2003 350 550 500 450 2000 400 2003 350 300 300 20 25 30 35 40 45 50 0 2 4 6 8 Mean GCSE point score 700 700 650 650 600 600 PISA science score PISA science score PISA science KS3 average score 550 500 450 2000 400 2003 350 550 500 450 2000 400 2003 350 300 300 20 25 30 35 KS3 average score 40 45 50 0 2 4 6 Mean GCSE point score Note: the range of PISA scores predicted is limited at the lower level by the 5th percentile and at the upper level by the 95th percentile. The crossing point of the 2003 and 2000 lines for PISA reading is the mean GCSE point score of 2.12. 8 - 97 - Figure 5.4b: Predicted PISA achievement scores by KS3 and KS4 scores 700 700 650 650 600 600 PISA science score PISA reading score PISA science 550 500 450 2000 400 2003 550 500 450 2000 400 350 2003 350 300 300 2 3 4 5 6 7 8 9 2 3 4 English Key Stage 3 level 6 7 8 9 700 650 650 600 PISA science score PISA reading score 5 English Key Stage 3 level 700 550 500 450 2000 400 2003 350 600 550 500 450 2000 400 2003 350 300 300 2 3 4 5 6 7 8 9 2 3 4 Maths Key Stage 3 level 5 6 7 8 9 Maths Key Stage 3 level 700 700 650 650 600 600 PISA science score PISA reading score Science Key Stage 3 level Maths Key Stage 3 level English Key Stage 3 level PISA reading 550 500 450 2000 400 2003 550 500 450 2000 400 2003 350 350 300 300 2 3 4 5 6 Science Key Stage 3 level 7 8 9 2 3 4 5 6 7 8 Science Key Stage 3 level Note: Key Stage 3 level refer to fine grades. The range of PISA scores predicted is limited at the lower level by the 5th percentile and at the upper level by the 95th percentile. The crossing point of the 2003 and 2000 line for PISA reading is the English Key Stage 3 level 3.34. 9 - 98 - Table 5.7: Predicted PISA scores in 2000 and 2003 at mean values of KS3 and KS4 scores in 2003 Explanatory variable Mean score in 2003 KS3 average score (KS3_aps) 34.763 Mean GCSE points (Ks4_mean) 4.629 English KS3 level 5.621 Maths KS3 level 5.908 Science KS3 level 5.639 PISA reading PISA maths PISA science PISA reading PISA maths PISA science PISA reading PISA maths PISA science PISA reading PISA maths PISA science PISA reading PISA maths PISA science Predicted PISA score 2003 2000 regression Difference regression line line 532.2 503.0 29.2 536.6 503.5 541.3 515.8 25.5 521.7 506.9 14.8 527.0 507.4 531.5 519.6 11.9 519.9 503.1 16.8 526.1 503.6 528.4 515.9 12.5 531.6 502.6 29.0 536.4 503.1 541.5 515.4 26.1 533.8 503.0 30.8 537.9 503.5 543.9 515.8 28.1 Note: the mean score in 2003 is calculated using those pupils who participated in PISA and are weighted with the ONS student weight. Maths results cannot be compared directly between both years because contents areas used for measuring maths ability differed between 2000 and 2003. - 99 - 6 Conclusions and recommendations Our main results are summarised in Chapter 5. In this chapter we focus on the conclusions we draw from the analyses made in the report. We also make recommendations for any future analyses of response bias, for the possible design of further rounds of the survey, and for the use of the existing PISA data for England. England has experienced a problem of achieving levels of response that are considered satisfactory by the OECD, both among schools and among pupils. At first sight, the greater problem is at the school level. England’s post-replacement response rate fell below levels defined as ‘acceptable’ in both 2000 and 2003. (In fact in both years the pre-replacement school response rate was in principle so low as to be ‘not acceptable’, irrespective of replacement.) The pupil response rate in 2003 was close to the level considered acceptable by OECD and in 2000 just exceeded this threshold. For this reason it is response among schools that seems to have been the focus of DfES concerns following the 2003 PISA round (leading for example to the commissioning of the analysis by Sturgis et al 2006). However, if we look at the characteristics of responding schools, measured by mean achievement levels of their pupils, compared to those of schools that do not respond, there does not appear to be a statistically significant problem of response bias. This conclusion is based on taking the school as the unit of analysis – on the grounds that it is schools that choose whether or not to respond. If we take the pupils in the schools as the unit of analysis (and compare the mean across all pupils in responding schools with the mean across all pupils in schools that do not respond) then significant differences emerge, with pupils in responding schools having higher mean scores. This is especially true in 2000. Indeed, it is not true in 2003 for the main KS4 summary measures we focused on. The impact of response bias on results for England is larger and clearer at the stage when pupils respond, despite pupil response rates being at or near the threshold deemed acceptable by the OECD. In both 2000 and 2003, pupils that responded have KS3 and KS4 scores that display a higher mean and lower dispersion than scores for non-respondents. We estimate that the impact of the bias was to increase mean PISA scores for respondents by about 6 points in both years and to reduce the standard deviation by about 3 or 4 points. The OECD student weights appear to have little impact in reducing these biases. We consider both the direction and magnitude of these estimates as reasonably reliable. A bias of about 6 points in the mean is around twice as large as the impact of sampling error, as summarised in the estimated standard error of the mean. Setting aside the issue of the variability of this bias in other possible samples, this estimate implies that bias would dominate sampling error in an estimate of ‘total survey error’ in the mean. This finding is consistent with the criterion for exclusion of results for any country that seems to be implied in the OECD’s first international report for the 2003 PISA round. We have noted, however, that the impact of the bias on the position of England (or the UK) in a ranking of OECD countries by mean scores would have been very slight. Evidence of the extent of pupil response bias in 2000 was not available when the decision was taken on whether to include results for the UK in the international report for the 2000 data. However, the logic is that this evidence should not have been asked for, or provided, since the level of pupil response in England in 2000 reached the 80 percent threshold deemed by the OECD as ‘acceptable’. It is natural to wonder whether other countries with pupil response rates that cleared this threshold by only a slightly larger margin in either 2000 and 2003 had levels of response bias in estimates of mean - 100 scores that would also have dominated estimates of total survey error. This illustrates clearly the danger of imposing arbitrary thresholds for satisfactory response. (See also Sturgis et al 2006.) The exclusion of the UK from the international report for the 2003 round of PISA will have given a signal that the data for this year for England are of doubtful quality. The OECD do note that ‘the results for the UK are, however, accurate for many within-country comparisons between subgroups (e.g. males and females) and for relational analyses’ (OECD, 2004: 328).23 Nevertheless the headline event of the exclusion of the national level results from the international report has seriously threatened confidence in the data.24 In our view the data do have considerable value, including at the national level. An appropriate return can still be obtained on the public resources that were devoted to the collection of the data in England. However, if the data are to be widely used for national level estimates, suitable weights need to be provided that adjust for the probability of pupil response. The provision of such weights should increase confidence in the use of the data, which are already publicly available on the OECD website. The weights we have experimented with as part of our analysis, based on modelling the pupil response probabilities, provide one way forward. These weights remove most of the apparent bias that is due to pupil response, both in mean scores and in the dispersion of scores. Our recommendations cover a range of issues, including research on topics we have only touched on in this report. 1. Analysis of response in future waves of PISA should take place on data sets that are prepared and documented at an early stage. (Addressed to DfES and the England PISA contractor.) The necessary files from the PISA sampling process (including weighting variables) should be prepared and documented at the time of the survey is conducted in order to avoid problems associated with any staff turnover in the survey agency collecting the data. 2. Future analyses of response bias should investigate the impact of (i) school replacement, and (ii) the use of weights that take into account different response levels within school strata. (Addressed to DfES.) The impact of school replacement needs to be investigated rather than assumed to be neutral. Some of our comparisons of responding and non-responding schools showed that rather different results were obtained depending on whether the focus was on the initial sample of schools or on all schools that had been approached. The impact of the OECD school weights based on stratification variables on any response bias at the school level needs to be investigated systematically. 23 The rationale for this statement is presumably that mean square error in sub-samples will probably be dominated by sampling error rather than bias: while sampling error will be larger in sub-samples, bias could remain the same. 24 Simon Briscoe, Economics Editor at The Financial Times, noted at the Royal Statistical Society 2005 Cathie Marsh Memorial Lecture on ‘Public Confidence in Official Statistics’ that the exclusion of the UK from the 2003 PISA report was among the ‘Top 20’ recent threats to that confidence. (We are grateful to James Brown for this observation.) - 101 - 3. Consideration should be given to whether it is practical to stratify samples of pupils within schools by a domestic measure of achievement (KS3 or KS4). (Addressed to DfES and the PISA Consortium.) If the main problem of bias arises at the stage of pupil response, then the stratification of pupil sampling needs to be investigated. The importance of considering this possibility is underlined by the OECD’s apparent commitment to allowing response adjustments only through weights based on stratification variables (i.e. in line with response rates within strata), rather than permitting poststratification weights of the type that are used in many surveys. 4. DfES should consider ways of raising response among pupils and not just among schools. (Addressed to DfES.) It is tempting to see the school response as the bigger problem, since the after-replacement response rate of schools fell well below the level defined as acceptable by the OECD in both 2000 and 2003, while the pupil response rate was above the minimum threshold in 2000 and close to it in 2003. But it is the response biases at the pupil level that emerge most clearly in our study. 5. Criteria for exclusion of any country from the international reports for PISA need to be made explicit by OECD and clearly justified. Evidence for a decision on exclusion needs to be published. (Addressed to OECD and the PISA Consortium) The criteria for excluding results for the UK in 2003 (in terms of contributions to total survey error) were at best implicit. Such criteria should be explicit. And having stated the criteria for exclusion, the evidence on which a decision is taken to exclude a country needs also to be given clearly. The evidence in the case of the UK in 2003 was only hinted at in the published international report. 6. Response weights for responding pupils in England in both 2000 and 2003 that are based on a statistical model of pupil response of the type we present in this report should be provided for users of the data. (Addressed to DfES and to the OECD.) We have shown that satisfactory weights to adjust for pupil non-response can be calculated. These weights could be placed in files on the DfES website for users to download and merge with the microdata for England that are available from the OECD website. The OECD should be encouraged to publicise the existence of these files of weights on its website. 7. The OECD should be engaged in discussion of whether adjustment for response bias using post-stratification response weights could be used in the future to avoid excluding a country from the international report. (Addressed via DfES to the OECD.) The use of post-stratification weights is a common survey practice. OECD is presumably reluctant to permit their use on the grounds that this would open PISA to all sorts of individual country practices that would be hard to monitor. However, the use of such weights could be restricted to countries in danger of being excluded from the international reports due to problems of response, and only permitted when it could be shown that the weights are based on variables that are well correlated with PISA achievement measures, as is the case with domestic English achievement scores. - 102 - - 103 - References Biemer P and Lyberg L (2003), Introduction to Survey Quality, Wiley: Hoboken NJ Brown G and Micklewright J (2004) ‘Using International Surveys of Achievement and Literacy: A View from the Outside’ UNESCO Institute for Statistics, Working Paper 2, UNESCO: Montreal. DfES (2005), ‘PISA 2003: Sample design, response and weighting for the England sample’, unpublished document. Gill B, Dunn M and Goddard E (2002), Student Achievement in England, London: The Stationary Office. Greene W (1993), Econometric Analysis, 2nd ed, Macmillan, New York OECD (2001a), Knowledge and Skills for Life – First results from PISA 2000, OECD, Paris. OECD (2001b), PISA 2000 Technical Report, OECD, Paris. OECD (2004), Learning for Tomorrow’s World. First Results From PISA 2003, OECD, Paris. OECD (2005a), PISA 2003 Technical Report, OECD, Paris. OECD (2005b), PISA 2003 Data Analysis Manual, OECD, Paris, Available at: http://www.pisa.oecd.org/dataoecd/53/22/35014883.pdf Sturgis, P, Smith P and Hughes G (2006) A Study of Suitable Methods for Raising Response Rates in School Surveys, report for DfES. - 104 - - 105 - Appendix 1: Summary statistics for five groups of pupils in 2003 Table A1.1: All pupils in schools in England (i), 2003 Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 698031 644301 643042 647134 645396 642026 645008 642532 698031 Missing 6 53736 54995 50903 52641 56011 53029 55505 6 Mean * 100 50.017 13.778 69.674 70.537 69.825 71.079 72.026 71.400 55.788 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 642026 645008 642532 637949 698030 698031 698031 698031 Missing 56011 53029 55505 60088 7 6 6 6 Mean 5.512 5.785 5.541 34.162 42.854 36.264 4.350 5.377 SD 1.113 1.285 1.052 6.618 21.093 15.814 1.850 4.115 Note: FSM eligibility and KS3 scores are missing for about 8% of the population. These should typically be children in private schools. - 106 - Table A1.2: All pupils in all sampled schools (ii), 2003 Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 117045 112892 112013 112466 112250 111803 112106 111807 117045 Missing 6 4159 5038 4585 4801 5248 4945 5244 6 Mean*100 49.332 12.563 70.073 71.075 70.422 71.444 72.616 72.026 55.912 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 111803 112106 111807 110823 117045 117045 117045 117045 Missing 5248 4945 5244 6228 6 6 6 6 Mean 5.531 5.810 5.564 34.301 42.879 36.451 4.373 5.370 SD 1.124 1.284 1.052 6.642 20.810 15.778 1.850 4.080 Note: data are weighted with ONS school weight (sch_bwt) Table A1.3: All pupils in responding schools (iii), 2003 Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 35446 34362 34314 34293 34323 34141 34187 34179 35446 Missing 2 1086 1134 1155 1125 1307 1261 1269 2 Mean * 100 47.395 11.861 70.538 71.135 70.805 71.958 72.715 72.487 55.309 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 34141 34187 34179 33792 35446 35446 35446 35446 Missing 1307 1261 1269 1656 2 2 2 2 Mean 5.521 5.799 5.556 34.214 42.627 36.172 4.334 5.317 SD 1.097 1.267 1.024 6.467 20.609 15.477 1.815 4.091 Note: data are weighted with ONS school weight (sch_bwt) - 107 - Table A1.4: All sampled pupils (iv), 2003 Variable male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 5091 4816 4826 4825 4827 4809 4813 4812 5091 Missing 108 383 373 374 372 390 386 387 108 Mean * 100 46.280 11.357 71.190 71.681 70.991 72.308 73.007 72.340 56.354 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 4809 4813 4812 4772 5091 5091 5091 5091 Missing 390 386 387 427 108 108 108 108 Mean 5.534 5.802 5.555 34.232 43.496 36.975 4.414 5.396 SD 1.094 1.276 1.027 6.447 19.745 14.660 1.729 4.067 Note: data are weighted with ONS student weight. Unweighted and weighted results are very similar. Table A1.5: All responding pupils (v), 2003 Variable male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 3695 3475 3491 3487 3490 3481 3481 3481 3695 Missing 57 277 261 265 262 271 271 271 57 Mean * 100 46.267 10.399 74.406 75.280 74.276 75.495 76.632 75.487 60.948 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 3481 3481 3481 3455 3695 3695 3695 3695 Missing 271 271 271 297 57 57 57 57 Mean 5.619 5.908 5.636 34.757 45.769 38.744 4.605 5.799 SD 1.072 1.247 1.001 6.298 18.545 13.450 1.613 3.982 Note: data are weighted with ONS student weight. Unweighted and weighted results are very similar. - 108 - Appendix 2: Summary statistics for five groups of pupils in 2000 Table A2.1: All pupils in schools in England (i), 2000 Variable Male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 556544 511517 512950 511036 509353 511005 508585 556544 Missing 4 45031 43598 45512 47195 45543 47963 4 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 509353 511005 508585 508603 556544 556544 556544 556544 47195 45543 47963 47945 4 4 4 4 Mean*100 50.347 68.123 63.995 59.151 69.346 65.593 60.609 52.100 Mean 5.463 5.554 5.297 32.960 41.096 36.017 4.417 4.884 SD 1.138 1.242 1.082 6.540 19.042 15.000 1.715 3.877 Note: no information on free school meal eligibility available for individuals in 2000. Table A2.2: All pupils in all sampled schools (ii), 2000 Variable N Missing Mean* 100 SD male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac 91321 86629 86695 86742 86225 86362 86343 91321 4 4696 4630 4583 5100 4963 4982 4 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 86225 86362 86343 85852 91321 91321 91321 91321 5100 4963 4982 5473 4 4 4 4 50.156 68.104 63.455 58.702 69.305 65.027 60.055 52.374 Mean 5.452 5.524 5.276 32.797 41.156 36.018 4.411 4.892 1.131 1.231 1.075 6.460 19.006 14.967 1.713 3.869 Note: data are weighted with ONS school weight (sch_bwt) - 109 - Table A2.3: All pupils in responding schools (iii), 2000 Variable Male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N Missing Mean*100 27606 26248 26248 26260 26175 26173 26168 27606 2 1360 1360 1348 1433 1435 1440 2 49.512 70.371 66.155 61.910 71.657 67.619 63.391 54.595 1433 1435 1440 1615 2 2 2 2 Mean 5.529 5.611 5.359 33.294 42.424 36.950 4.511 5.073 26175 26173 26168 25993 27606 27606 27606 27606 SD 1.132 1.214 1.062 6.403 18.887 14.724 1.691 3.854 Note: data are weighted with ONS school weight (sch_bwt) Table A2.4: All sampled pupils (iv), 2000 Variable Male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 4912 4570 4572 4573 4559 4561 4560 4912 Missing 252 594 592 591 605 603 604 252 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 4559 4561 4560 4526 4912 4912 4912 4912 605 603 604 638 252 252 252 252 Mean*100 49.093 71.662 66.725 63.020 72.887 68.441 64.514 56.442 Mean 5.558 5.641 5.387 33.458 43.239 37.626 4.561 5.219 Note: data are weighted with ONS student weight (stu_wt) SD 1.089 1.190 1.037 6.239 18.529 14.326 1.666 3.824 - 110 - Table A2.5: All responding pupils (v), 2000 Variable Male ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 3976 3687 3689 3690 3681 3686 3682 3976 Missing 144 433 431 430 439 434 438 144 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 3681 3686 3682 3660 3976 3976 3976 3976 439 434 438 460 144 144 144 144 Mean*100 49.903 73.729 69.075 65.187 74.854 70.731 66.550 59.186 Mean 5.604 5.705 5.436 33.766 44.620 38.790 4.686 5.452 Note: data are weighted with ONS student weight (stu_wt). SD 1.063 1.164 1.014 6.062 17.403 13.276 1.559 3.749 - 111 - Appendix 3: Summary statistics for private and public schools The tables in this appendix show summary statistics separately for public sector and private schools in 2003 and 2000. KS3 scores are generally not available for pupils in private schools while GCSE scores are available for the whole population. Table A3.1: Number of pupils in private and public schools in 2003 and 2000 Pupils in public schools Pupils in private schools Missing Total 2003 Freq. Percent 649,190 93.00 48,841 7.00 6 0.00 698,037 100.00 2000 Freq. Percent 516,578 92.82 39,966 7.18 4 0.00 556,548 100.00 Table A3.2: Summary statistics in public schools for 2000 and 2003 Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 649190 644124 633542 635731 636171 632791 633724 633521 649190 632791 633724 633521 626215 649189 649190 649190 649190 2003 Missing Mean 0 49.950 5066 13.777 15648 69.494 13459 70.232 13019 69.632 16399 70.854 15466 71.709 15669 71.184 0 53.281 16399 5.504 15466 5.771 15669 5.533 22975 34.073 1 41.757 0 35.255 0 4.204 0 5.161 SD 1.110 1.282 1.050 6.586 20.907 15.515 1.801 4.125 N 516578 504221 503798 504378 502131 501949 502019 516578 502131 501949 502019 498819 516578 516578 516578 516578 2000 Missing Mean 0 50.301 12357 12780 12200 14447 14629 14559 0 14447 14629 14559 17759 0 0 0 0 67.907 63.644 58.923 69.115 65.231 60.366 49.330 5.454 5.540 5.290 32.862 39.922 35.013 4.273 4.634 SD 1.134 1.237 1.080 6.495 18.643 14.648 1.665 3.844 - 112 - Table A3.3: Summary statistics in private schools for 2000 and 2003 Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 48841 177 9500 11403 9225 9235 11284 9011 48841 9235 11284 9011 11734 48841 48841 48841 48841 2003 Missing Mean 0 50.906 48664 18.079 39341 81.695 37438 87.547 39616 83.154 39606 86.486 37557 89.817 39830 86.583 0 89.114 39606 6.115 37557 6.579 39830 6.098 37107 38.953 0 57.438 0 49.673 0 6.290 0 8.251 SD 1.129 1.197 1.037 6.547 17.887 13.468 1.327 2.647 N 39966 7296 9152 6658 7222 9056 6566 39966 7222 9056 6566 9784 39966 39966 39966 39966 2000 Missing Mean 0 50.941 32670 30814 33308 32744 30910 33400 0 32744 30910 33400 30182 0 0 0 0 83.004 83.315 76.449 85.364 85.645 79.165 87.900 6.085 6.368 5.857 37.970 56.260 48.997 6.275 8.105 SD 1.195 1.235 1.124 6.869 17.583 13.356 1.189 2.664 In 2000, Free School Meal data were not available in the pupil data file but in a separately provided school file. This file included 3,172 public and 887 private schools and contained information on the percentage of pupils (of all ages) eligible for free school meals in the school. On average, 17.6 percent of pupils in public sector schools receive Free School Meals. Information was missing for 43 private schools. In the other 844 private schools no pupil was receiving this benefit. - 113 - Appendix 4: Summary statistics for with adjusted data for 2003 This appendix gives the summary statistics for the 2003 national data base adjusted as far as possible to the coverage of the 2000 data (see Chapters 2 and 5). The results are slightly different from those reported in Chapter 3. These adjustments mean that pupils who were not entered for any GCSEs (or equivalent) are excluded and statemented pupils are included. Table A 4.1: All pupils in schools in England (i), 2003 adjusted Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 697411 644583 644389 648477 646769 644222 647232 644800 697411 Missing 6 52834 53028 48940 50648 53195 50185 52617 6 Mean * 100 50.34 13.61 69.40 70.31 69.79 70.67 71.65 71.21 56.07 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 644222 647232 644800 641070 697411 697411 697411 697411 Missing 53195 50185 52617 56347 6 6 6 6 Mean 5.50 5.77 5.53 34.09 43.29 36.67 4.41 5.41 SD 1.13 1.30 1.06 6.69 20.54 15.24 1.77 4.10 - 114 - Table A 4.2: All pupils in sampled schools (ii), 2003 adjusted Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 116998 112953 112261 112719 112507 112190 112503 112183 116998 Missing 6 4051 4743 4285 4497 4814 4501 4821 6 Mean*100 49.621 12.460 69.781 70.804 70.346 71.035 72.194 71.806 56.170 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 112190 112503 112183 111362 116998 116998 116998 116998 4814 4501 4821 5642 6 6 6 6 Mean 5.515 5.794 5.556 34.220 43.299 36.841 4.427 5.399 SD 1.140 1.300 1.061 6.717 20.261 15.205 1.768 4.064 Note: data weighted with ONS school weight (sch_bwt) Table A 4.3: All pupils in responding schools (iii), 2003 adjusted Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 35432 34384 34388 34364 34398 34249 34297 34280 35432 Missing 2 1050 1046 1070 1036 1185 1137 1154 2 Mean*100 47.596 11.886 70.189 70.848 70.696 71.513 72.289 72.262 55.567 ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac 34249 34297 34280 33936 35432 35432 35432 35432 1185 1137 1154 1498 2 2 2 2 Mean 5.505 5.782 5.548 34.131 43.028 36.547 4.387 5.346 Note: data weighted with ONS school weight (sch_bwt) SD 1.114 1.285 1.033 6.546 20.082 14.921 1.734 4.075 - 115 - Table A 4.4: All sampled pupils (iv), 2003 adjusted Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 5027 4753 4770 4769 4771 4755 4758 4758 5027 Missing 108 382 365 366 364 380 377 377 108 Mean * 100 46.234 11.259 71.696 72.191 71.558 72.76 73.517 72.853 57.064 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 4755 4758 4758 4719 5027 5027 5027 5027 Missing 380 377 377 416 108 108 108 108 Mean 5.548 5.817 5.568 34.319 44.044 37.441 4.469 5.464 SD 1.086 1.264 1.017 6.389 19.252 14.149 1.666 4.047 Note: data are weighted with ONS student weight (STU_WT) Table A 4.5: All responding pupils (v), 2003 adjusted Variable Male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac N 3676 3457 3478 3474 3477 3469 3469 3469 3676 Missing 57 276 255 259 256 264 264 264 57 Mean * 100 46.187 10.397 74.482 75.329 74.413 75.551 76.663 75.606 61.266 SD Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac N 3469 3469 3469 3444 3676 3676 3676 3676 Missing 264 264 264 289 57 57 57 57 Mean 5.621 5.908 5.639 34.763 46.008 38.946 4.629 5.829 SD 1.072 1.245 0.998 6.294 18.296 13.19 1.583 3.971 Note: data are weighted with ONS student weight (STU_WT) - 116 - Table A 4.6: Summary The following table summarises the results for the 2003 adjusted data (including statemented pupils, excluding pupils with no GCSE entries) by giving mean values of all variables for each group of pupils. Variable male FSM ELevel5 MLevel5 SLevel5 ELevel5fg MLevel5fg SLevel5fg ks4_5ac Mean * 100 50.34 13.61 69.40 70.31 69.79 70.67 71.65 71.21 56.07 ii All pupils in all sampled PISA schools Mean * 100 49.621 12.460 69.781 70.804 70.346 71.035 72.194 71.806 56.170 Variable ks3Efg ks3Mfg ks3Sfg ks3_aps ks4_pts ks4_ptc ks4_mean ks4_ac Mean 5.50 5.77 5.53 34.09 43.29 36.67 4.41 5.41 Mean 5.515 5.794 5.556 34.220 43.299 36.841 4.427 5.399 i All pupils in English schools iii All pupils in responding PISA schools Mean * 100 47.596 11.886 70.189 70.848 70.696 71.513 72.289 72.262 55.567 iv All sampled pupils in responding schools Mean * 100 46.234 11.259 71.696 72.191 71.558 72.76 73.517 72.853 57.064 Mean 5.505 5.782 5.548 34.131 43.028 36.547 4.387 5.346 Mean 5.548 5.817 5.568 34.319 44.044 37.441 4.469 5.464 v Responding pupils Mean * 100 46.187 10.397 74.482 75.329 74.413 75.551 76.663 75.606 61.266 Mean 5.621 5.908 5.639 34.763 46.008 38.946 4.629 5.829 Note: data in columns (ii) and (iii) are weighted with ONS school weight (sch_bwt) and data in columns (iv) and (v) with ONS student weight (STU_WT). - 117 - Appendix 5: Bias analysis with adjusted data for 2003 Table A 5.1: Difference in means between responding and non-responding schools in the initial sample: approach (a), 2003 adjusted Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac Mean*100 48.04 11.42 72.40 72.90 72.97 53.67 Nonresponding school Mean*100 46.27 19.80 72.70 72.08 73.15 57.54 KS3_aps KS4_mean Mean 34.44 4.56 Mean 34.51 4.56 Responding school Difference p Observations -1.77 8.38 0.3 -0.82 0.18 3.87 0.78 0.04 0.95 0.86 0.97 0.57 185 175 182 183 182 185 0.07 0 0.94 0.99 182 185 Note: data are weighted with ONS school weight (sch_bwt). This table can be compared to Table 3.11 that shows results for unadjusted 2003 data. Table A 5.2: Differences in means for pupils in all responding schools and pupils in second replacement non-responding schools: approach (c), 2003 adjusted Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac Mean*100 47.60 11.89 71.51 72.29 72.26 55.57 Nonresponding school Mean*100 53.72 16.78 65.33 69.50 67.56 56.60 KS3_aps KS4_mean Mean 34.13 4.39 Mean 33.77 4.50 Responding school Difference p Observations 6.12 4.89 -6.18 -2.79 -4.7 1.03 0.00 0.00 0.00 0.00 0.00 0.24 40,031 38,601 38,417 38,564 38,452 40,031 -0.36 0.11 0.01 0.00 38,155 40,031 Note: data are weighted with ONS school weight (sch_bwt). This table can be compared to Table 3.19 that shows results for unadjusted 2003 data. - 118 - Table A 5.3: Differences in means between responding and non-responding pupils, 2003 adjusted Responding pupil Male FSM ELevel5fg MLevel5fg SLevel5fg KS4_5ac KS3_aps KS4_mean Obs. 3,676 3,457 3,469 3,469 3,469 3,676 mean*100 46.19 10.40 75.55 76.67 75.61 61.27 1,275 1,351 Mean 34.76 4.63 Non-responding Difference pupil Obs. Mean*100 1,351 46.36 0.17 1,296 13.56 3.16 1,286 65.24 -10.31 1,289 65.06 -11.61 1,289 65.45 -10.16 1,351 45.63 -15.64 0 Mean 3,444 33.12 -1.64 3,676 4.03 -0.6 p Observations 0.94 0.00 0.00 0.00 0.00 0.00 5,027 4,753 4,755 4,758 4,758 5,027 0.00 0.00 4,719 5,027 Note: data are weighted with ONS student weight (STU_WT). This table can be compared to Table 3.20 that shows results for unadjusted 2003 data. - 119 - Appendix 6: Peer group effects are biased downwards in PISA data? The two stage sample design of PISA – sampling of schools and then of pupils within schools – means that the survey can be used to estimate ‘peer group’ effects on pupils’ achievement. In other words, the survey can be used to estimate the impact on an individual’s learning of the composition of the individual’s peer group at school, i.e. the pupils with whom the individual is at school. Suppose we define an individual’s peer group as all other persons of the same age in that individual’s school. PISA provides only a random sample of the peer group: a total of only 35 pupils are sampled in each school, out of what in England is on average about 150 to 200 pupils aged 15 per school. As a consequence, measures of peer group composition based on the PISA sample in each school are subject to measurement error. Since the samples drawn within each school are simple random samples, the form of this measurement error is close to the textbook case of normally distributed measurement error in an explanatory variable. (The precise situation is not quite this since the measurement error is highly correlated for all individuals within the same sampled school and the issue of any response bias also needs to be considered.) In a simple OLS regression model with one explanatory variable, the well known result is that measurement error in that variable leads to a downwards bias in its estimated coefficient. In a multiple regression model with measurement error restricted to just one explanatory variable, the coefficient of that variable is again biased downwards and all other coefficients are also biased but in unknown directions (e.g. Greene 1993: 280-4). In general, therefore, researchers using PISA data to estimate peer group effects (e.g. OECD 2001: chapter 7) will obtain estimates of those effects that are subject to measurement error bias since they have information on only a random sample of each individual’s peer group. That measurement error will also contaminate other parameter estimates. The size of these biases will be unknown. In contrast, we have been able to merge PISA data in this report with national school population data. We therefore have information on the complete peer group of 15 year olds in each school rather than just the sample provided by PISA. Hence we are able to calculate summary measures of the peer group based on this complete group. We can then compare regression results obtained with peer group variables calculated with just the PISA sample of pupils in the same school with results obtained with variables calculated with the complete peer group. A comparison of the results will show the extent of the measurement error bias in peer group effects obtained from use of PISA sample data alone. We focus on the socio-economic background of the peer group and measure this for each individual with the proportion of all other 15 year old pupils who receive free school meals (FSM) in the individual’s school. We calculate this proportion using (1) the population of all (other) 15 year olds in each school, which is the true figure, and then (2) the (responding) PISA sample of (other) 15 year olds in the school, which is an estimate. The analysis is restricted to pupils in state schools, since FSM is not received by pupils in private schools. Table A6.1 presents summary statistics for 3,318 responding pupils in the sample. (The sample size is slightly smaller than the original PISA sample size for state sector pupils since not all individuals could be matched to the England national data base – see Chapter 2.) The first variable is a simple dummy variable for individual FSM receipt and its mean shows the average FSM eligibility in the sample – 10 percent. The other two variables measure the proportion of 15 year old pupils in the peer group of each individual who receive FSM, one measure based on all 15 year old peers in the individual’s school and the other on the peers in the PISA sample for the school. - 120 - Table A6.1: Summary statistics of FSM variables used in the regression model 5th 25th 75th 95th 0.41 0.02 0.05 0.16 0.32 0.47 0 0.04 0.15 0.29 Variable Mean SD Min Max FSM receipt dummy Proportion of peer group with FSM (all 15 yr olds in school) Proportion of peer group with FSM (PISA sample) 0.10 0.30 0.00 1.00 0.11 0.09 0 0.10 0.10 0 Obs 3,318 pupils Note: the first variable shows the proportion of 3,318 PISA pupils with FSM; the second variable shows the actual proportion of 15 year old peers with FSM; the third variable shows the proportion of peers in the PISA sample with FSM. Table A6.2: Regression results: PISA maths score, 2003 FSM in receipt Female Mother has at least secondary education Mother has tertiary education Mother’s education missing More than 100 books at home proportion of peer group with FSM (all 15 year olds in the school) (1) -20.66 (2) -27.58 (5.48) (5.95) -10.05 -10.73 (3.84) (4.12) 5.05 9.18 (3.98) (4.57) 15.34 16.52 (3.41) (3.43) -23.59 -21.72 (6.30) (6.74) 43.71 48.15 (3.29) (3.68) -217.76 (29.85) -98.84 proportion of peer group with FSM (15 year olds in the PISA sample) (23.66) 509.18 489.25 (6.32) (6.43) Observations 3318 3318 R-squared 0.20 0.17 Constant Note: standard errors in parentheses. Clustering of pupils in schools is taken into account when estimating standard errors. All variables other than the peer group FSM measures are dummy variables. - 121 - 0 200 Frequency 400 600 800 Figure A6.1: Histogram of differences between the proportion of peer pupils in receipt of Free School Meals among all 15 year olds and the proportion in the PISA sample -.4 -.2 0 .2 Difference Note: 3,318 pupils in the state sector only are included. Figure A6.1 graphs the difference between the proportion of peers in receipt of FSM among all 15 year olds and the proportion among the PISA sample. The histogram is overlaid with a normal distribution with the same mean and variance as for the data. A positive difference indicates that the proportion among all 15 year olds in the school is higher than in the sample of responding pupils. As one would expect, there is a distribution of both positive and negative values. The mean is positive and the distribution displays some non-normality. Further investigation showed that these features are the result of response. If we take the difference between the measure in the population of peers and that for all sampled pupils (rather than just the responding pupils), the histogram approximates the normal much more closely and the mean of the distribution is very close to zero. We estimate a simple regression model of the sort typically used for estimating peer group effects. Pupils’ PISA maths scores in 2003 are regressed on dummies for gender, mother’s educational attainment, whether more than 100 books are reported in the home, and whether a pupil receives FSM. In addition, we include the proportion of the peer group in receipt of FSM, defining the peer group as other 15 year olds in the individual’s school. In Model 1 this is measured using all 15 year olds in the school. In Model 2 the variable is based on the PISA sample for the individual’s school. Individual receipt of FSM is estimated in Model 1 to reduce the PISA maths score by 20 points, holding other factors constant. (Strictly speaking, it is just ‘associated with’ a reduction of this size, since we are not asserting causality.) This is quite a big effect, roughly equal in absolute size to the positive impact of the mother having had tertiary education rather than having failed to complete secondary education (i.e. the sum of the coefficients on the two maternal education dummies). - 122 The Model 1 results show that greater receipt of FSM in the peer group also has a negative impact on learning achievement (and the estimated coefficient is well determined). The effect is a large one: a rise of one standard deviation in the proportion of the peer group in receipt is associated with a reduction in the maths score by about the same amount as individual receipt of FSM: 20 points.25 The results for Model 2 show that the estimated peer group effect halves when we measure social background by FSM receipt among just the other 15 year olds in the PISA sample for each individual’s school. The estimated coefficients for all other variables (except the education missing dummy) rise, sometimes quite appreciably (e.g. for individual FSM receipt, mother’s secondary education and books at home where the coefficients change by at least one standard error). These results need further investigation. A range of models needs to be experimented with, using different (including richer) specifications and exploring other measures of peer group composition. But the prima facie evidence appears to be that measurement error bias of peer group effects is an issue to be taken seriously in analysis of PISA data. 25 Note that we put aside issues of interpretation of this coefficient, i.e. whether it really does represent the impact of peers or whether the peer group variable proxies other factors such as school selection or school resources. Copies of this publication can be obtained from: DfES Publications P.O. Box 5050 Sherwood Park Annesley Nottingham NG15 0DJ Tel: 0845 60 222 60 Fax: 0845 60 333 60 Minicom: 0845 60 555 60 Online: www.dfespublications.gov.uk © The University of Southampton 2006 Produced by the Department for Education and Skills ISBN 1 84478 764 8 Ref No: RR771 www.dfes.go.uk/research
© Copyright 2026 Paperzz