No. 49 July 2001 Contents The Internet as a mode of data collection in government social surveys: issues and investigations John Flatley 1 Using the Omnibus Survey to pilot the Arts Council of England’s survey on arts attendance, participation and attitudes Tania Corbin 11 Developing a weighting and grossing system for the General Household Survey Jeremy Barton 15 The General Household Survey as a sampling frame: the example of the Survey of Poverty and Social Exclusion Joanne Maher 27 Evaluation of survey data quality using matched Census – survey records Amanda White, Stephanie Freeth & Jean Martin 39 Forthcoming conferences, seminars and courses 51 The National Statistics Methodology Series 54 Recent Social Survey Division Publications 56 The Social Survey Methodology Bulletin is produced primarily to inform staff in the Office for National Statistics (ONS) and the Government Statistical Service (GSS) about the work on social survey methodology carried out by Social Survey Division (SSD) of the ONS. It is produced by the Social Survey Methodology Unit within SSD, and staff in the Division are encouraged to write short articles about methodological projects or issues of general interest. Articles in the bulletin are not professionally refereed as this would considerably increase the time and effort to produce the bulletin: they are working papers and should be viewed as such. The bulletin is published twice a year, in January and July, and is free to staff within the ONS and the GSS. It is made available to others with an interest in survey methodology at a small cost to cover production and postage. The Office for National Statistics works in partnership with others in the Government Statistical Service to provide Parliament, government and the wider community with the statistical information, analysis and advice needed to improve decision-making, stimulate research and inform debate. It also registers key life events. It aims to provide an authoritative and impartial picture of society and a window on the work and performance of government, allowing the impact of government policies and actions to be assessed. Edited by: Lucy Haselden Prepared by: Sarah Binfoh The Internet as a mode of data collection in government social surveys: issues and investigations John Flatley 1. Introduction For those in the statistics business, web technologies offer the potential to speed the process of collecting data; to reduce the costs associated with collection and processing, and to make data accessible more widely and more timely. It is not surprising, therefore, that national statistical institutes (NSIs), such as the Office for National Statistics (ONS), are racing to embrace the Internet to enhance the service they deliver to government, other customers and to citizens. This is the business imperative. In addition, governments have added political impetus. For example, here in the United Kingdom (UK) central government has set a target for all public services to be available on-line by 2005. Similar, but more ambitious, targets have been set in Australia (2001) and Canada (2004). As the number with Internet access grows and people become accustomed to using it, there is likely to be increasing demand from the providers and users of official statistics to use the web for such purposes. Social Survey Division (SSD) of the ONS is examining the feasibility of using the web for the surveys that it undertakes. Nationally representative surveys of the private household population carried out on behalf of ONS or other government departments, forms the core business of SSD. Such surveys employ random probability sample designs and are voluntary. Surveys undertaken by SSD tend to be undertaken with trained interviewers using computer assisted interviewing (CAI). Computer assisted personal interviewing (CAPI) is often the single mode of data collection. There are examples of surveys that combine CAPI with another mode. For example, the Labour Force Survey (LFS), which is a panel survey, uses CAPI for first interview and offers a computer assisted telephone interviewing (CATI) option for subsequent interviews (there are five interviews in total). On both the Expenditure and Food Survey (EFS) and National Travel Survey (NTS), following an initial CAPI interview, respondents are asked to self-complete a paper diary. SSD is also commissioned to sample other populations, such as school pupils, in which paper and pencil interviewing continue to have a role though interest in using the web for such surveys may increase. The characteristics of government social surveys of the general population do not readily lend themselves to web data collection. Social surveys of the private household population tend to have the following requirements: • random probability sampling • long and complex questionnaires, e.g. interviews of 1 hour or more are typical • high response rates, e.g. 80% plus are a common requirement These features contrast with common practice on web surveys and raise questions about the potential use of the Internet for general population surveys. ONS, like other NSIs, has begun to consider the issues that are raised by the possible collection of data via the web on social surveys of the general population. This paper draws on initial work in which, following a literature review and discussions with others already working in the field 1 , we have identified a number of key issues. We start by considering the issue of population coverage and sampling and present the latest government statistics on Internet penetration. 1 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys 2. Population coverage and sampling 2.1. Use of Internet The National Statistics Omnibus survey started to include questions on the Internet on a regular basis in July 2000 and have been asked once every three months. The Omnibus Survey has a random probability sample design such that it yields results which are nationally representative of all adults in Great Britain. According to the January 2001 Omnibus Survey, an estimated 23 million adults in Britain have used the Internet at some time either at their own home or elsewhere, such as at work or another person’s home (ONS, March 2001). This is equivalent to half (51%) of the adult population in Great Britain and compares with a figure of 45% in July 2000 (ONS, September 2000). Eight in ten adults who had ever used the Internet by the time of the latest survey (January 2001) had accessed it in the month prior to the interview. This represents some 40% of Britain’s adult population. The Omnibus Survey shows that a higher proportion of men (57%) had used the Internet than women (45%). This gender gap is consistent by age. As expected, the proportion using the Internet declines with age. For example, among those aged 65-74 years 15% had used the Internet and just 6% among those aged 75 years and over. Overall two-thirds of all adults who had ever accessed the Internet were aged below 45 years old. Between July 2000 and January 2001 the largest increase in Internet usage was recorded for the youngest age group (up from 69% to 85%). This contrasted most strongly with those aged 75 years and over, among whom there was no growth recorded (6% at both times). 2.2. Home Internet access Another source, the FES, can provide nationally representative estimates of the number of households with home Internet access. Figures for the FES refer to the whole of the UK. A question on home Internet access has been included on the FES since April 1998. Results on Internet access are produced on a quarterly basis and the latest figures relate to the final quarter of the year 2000 (October-December). Up to March 2000 this question was asked only for households who had already said that they had a home computer. As new methods of accessing the Internet have begun to be taken up the levels of Internet access in the first quarter of 2000 may be understated. From April 2000 all households have been asked about home access to the Internet and if they have it, the means of access. In the first quarter in which such questions were included (April-June 1998), 9% of UK households were estimated to have home Internet access. This is equivalent to some 2.2 million households. Since the first quarter of 1999, the survey has recorded a steady increase in the proportion of households with access to the Internet at home. The latest estimates, for October-December 2000, are that 8.6 million households now have home access. This is equivalent to one in three UK households (35%) and is a four-fold increase on the comparable figure in 1998. Analysis of the latest quarter shows that the vast majority of those with home Internet access (98%) use a home computer to access the Internet. Take-up of new technologies, such as WAP phones and digital televisions has yet to make a large impact. Over time the growth in the use of these new technologies may be the driving force behind increases in home Internet access. With the planned withdrawal of analogue transmitters by the end of this decade it is possible that nearly all UK households will have access to digital televisions, and possibly, as a by-product, home Internet access. This may have implications for the web survey designer. Currently such forms of web access do not have the same functionality as 2 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys computers and few of the existing web surveys would run on such a platform. Another implication relates to possible differences in the profile of users. As yet there is little research on these issues. 2.3. The digital divide: unequal access to the Internet The FES demonstrates that the proportion of households with home access to the Internet varies by a range of social and demographic factors. For example, FES data show that there is a strong relationship between household income and home Internet access. Over the financial year 1999-2000, the percentage of households with access to the net at home was lowest amongst households in the four lowest deciles (ranging between 3% and 6%). In both the fifth and sixth decile groups 15% of households had home Internet access. Amongst the remaining deciles access increased steadily with income, for example rising from 22% among those in the seventh decile to 48% for those in the highest decile (ONS, July 2000). On the basis of two years data it is difficult to comment on the extent to which there is evidence of the narrowing of the gap between lower and higher income groups. In the last two years there was growth in home Internet access across all income decile groups and the pattern of unequal access was broadly similar. Analysis of the January 2001 Omnibus Survey has shown a clear relationship between usage and social class. Here social class is based on the occupation, or previous occupation, of the household reference person2 . For example, the proportion of adults who had used the Internet was higher than average (51%) among those living in households headed by a person in Social Class I, professionals and managers and Social Class II, Intermediate nonmanual (78% and 65% respectively). This contrasted with those whose household head was in one of the manual social classes, where the proportions were as follows: 37% for those in Social Class III manual, 33% for those in Social Class IV partly skilled manual and 27% for those in Social Class V unskilled manual (ONS, March 2001). Analysis of the 1999-2000 FES has shown households with the highest levels of access in 1999-2000 were those comprising couples with children (31% of couples with one child and 35% of couples with two or more children). This compared with an average figure of 19% for the whole of the UK at that time. A lower proportion of lone adult households had Internet access (7% for those with one child and 11% for those with two or more children). It is interesting to note the general pattern that, irrespective of the number of adults, rates of access were higher for households with two or more children compared with those containing one child. As expected, the lowest rates of access were found for households comprising people who were retired and living on their own or as couples (1% and 5% respectively) (ONS, July 2000). While access to the Internet remains at a relatively low level, there is no prospect of the web replacing other modes of interviewing on high quality general population surveys which require nationally representative surveys. Even if access was to reach a near universal level, such as is the case with telephones (according to the General Household Survey, 96% of households have one), there is another challenge: the issue of sampling. 2.4. Sampling issues Random probability sampling requires that each member of the population of interest has a known (non-zero) chance of being selected for the survey. In the UK, there is not a central register of the population, as is the case in some Scandinavian countries, to act as a sampling frame. For the population of adults living in private households, however, there are proxies such as the Postcode Address File (PAF) and Electoral Register (ER). In SSD, most of the general population survey samples are drawn from the PAF. Most of our surveys are carried out by our field force of trained interviewers using computer assisted interviewing (CAI) 3 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys with laptops. Selected addresses are visited by interviewers to establish whether or not the address listed contains a household that is eligible for the survey and, if so, to attempt to persuade potential respondents to participate in the survey. Surveys that use face-to-face interviewing tend to find that around 11-12% of addresses listed in the PAF are found to be ineligible. Some surveys over or under sample particular groups but because the chance of selection is known weighting strategies can be devised to adjust for known survey bias that would otherwise result. Currently, there does not exist a single listing of people with access to the Internet. Email lists exist for membership associations, such as the Association for Survey Computing (ASC) or the Social Research Association (SRA), and for employees of businesses and other organisations. If these are reasonably comprehensive and up-to-date such lists could be used to invite all, or a sample of, members to respond to a web survey. No such list of the general population exists here in the UK; nor is one likely in the foreseeable future. Even if a list were to become available for sampling purposes, we would have difficulty in ensuring that each sampled individual had a known chance of being selected because individuals may possess more than one email address at any one time. Some people with web access may not use email, even if they were required to set up an email account by an Internet Service Provider (ISP). Further, as people move from one ISP to another there is potential for email addresses once used to become dormant. We know little about the general public’s use of email addresses and we don’t have good answers to questions such as: • What is the average number of email addresses that a household or individual maintains? • What is the average life of an email address? • What is the rate at which households or individuals acquire new email addresses? In addition because of the great variety in the nomenclature of email addresses, there is little prospect of us being able to randomly generate email addresses in a manner that is possible, though not without problems, for random digit dialling. 3. Design and implementation issues 3.1. Screens and scrolling There are two main approaches to questionnaire construction that have conventionally been used on web surveys. With a screen-based approach usually one question is displayed on each screen and each time a respondent answers a particular question the screen refreshes to display the next question. Scrolling refers to the situation when the whole of the questionnaire is available at one time by allowing the respondent to scroll up and down, using a scroll bar to the right of the screen. The choice of which method to use is often related to the way in which respondents are invited to complete the survey. Screen by screen presentations are often used with on-line surveys where there is the need for continuous contact between the respondent’s computer and a web server. Off-line surveys often use a plain (Hypertext Mark-Up Language) HTML form which the respondent is able to download from a website or receive as an email attachment. The pros and cons of on-line and off-line approaches are discussed later. Here we’re concerned simply with design issues. It has been argued that scrolling is more akin to the way in which people use computers and surf the web and for that reason should be preferred (Dillman, 2000). It is true that experienced web users are likely to find a screen by screen presentation more cumbersome. They may also be frustrated if they feel their freedom to navigate is constrained. This maybe 4 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys compounded if the time delay between individual screens refreshing is perceived by respondents to be excessive. Another advantage of scrolling is that respondents can more easily form a perception of the content and the length of the questionnaire before they begin their task. This makes scrolling more like conventional paper self-completion questionnaires where the respondent can easily flick through the document before answering and can form a judgement about how long it will take them to complete. For these reasons, Dillman argues that scrolling should be preferred whenever web surveys are being used to replace paper self-completion methods. However, if web surveys are being used in a mixed mode environment, alongside CAPI or CATI, then the case for using a screen by screen construction is stronger. One could argue that it is more akin to the way in which questions are ‘served’ and answered on these modes. Off-the-shelf software is increasing the choices available to questionnaire developers and greater flexibility in questionnaire construction is becoming possible. 3.2. On-line versus off-line interviewing An on-line survey is one in which the respondent accesses the questionnaire instrument on a website and completes it, via their web browser, while connected to the Internet. Respondents can be directed to the URL either through paper instructions, for example sent by regular mail or left by an interviewer, or in an email, possibly with a hyperlink. On-line surveys need constant contact between the respondent’s computer and the web server upon which the instrument is located. Thus one needs to consider whether or not this requirement is likely to be acceptable to respondents. For Intranet surveys, for example of employees of a business or other organisation, this is likely to be perfectly acceptable. One proviso is that the connection speed between the respondent’s computer and web server must be sufficient to ensure that there are not unacceptable delays in the processing of individual questions. However, on-line methods raise a number of issues for general population surveys. The first of these concerns the length of time that a respondent is required to be on-line to complete the survey. This is likely to be of particular concern to those who pay by the minute for their time on-line. One could offer to compensate respondents for the cost of their time on-line but we need more research to find out whether or not this will be perceived as an adequate incentive for all respondents. As outlined above, on-line surveys can be problematic if the time it takes for the response to one question being transmitted to the server and for the server to respond is deemed to be excessive. For general population surveys, where few households currently have high speed Internet access, this is a key concern. This suggests that on-line surveys are only feasible for general population surveys where there are few questions being asked and the duration of the interview is short. In an off-line survey, respondents may initially go on-line to download a file, containing the survey instrument, to their local computer, completing the survey off-line before going back on-line to transmit the completed questionnaire back to the host. An alternative is for respondents to be sent a file, either by regular mail on CD or floppy disk or as an attachment to an email. Respondents need to install the file on their computer before answering the questionnaire. Once the questionnaire is completed, respondents can go on-line to return the completed questionnaire. There are other concerns, such as security or computer functionality, that may be relevant to a decision about on-line or off-line surveys. 5 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys Respondents’ competence in the use of the Internet may need to be higher to complete all the required tasks than is the case of on-line use. Off-line surveys are preferred when the survey is longer and complex. However, off-line surveys can introduce other concerns. For example, the time that it takes for the questionnaire instrument to download to the respondent’s computer may be perceived to be excessive. In addition, respondents may be unwilling to download files to their own hard drive worried that they may contain viruses or otherwise damage their computer. A possible alternative to downloading is to send respondents a file via regular mail, for example on CD or floppy disk, or as an email attachment. However, as with the download method this still requires some degree of trust in the integrity of the files being sent. It adds to the burden on respondents and for some may be enough to dissuade them to participate. Off-line surveys are likely to work best when the files that need to be installed on a respondent’s computer are relatively small. This means that there may be less opportunity to build complex instruments, for example with extensive interactive edits and multi-media, if as a result respondents are required to download large computer files. 3.3. Plain HTML versus advanced programming The simplest approach to designing web questionnaires is to provide a plain HTML form that respondents can access via their normal browser. However, plain HTML is limited. It is necessary to supplement HTML with the use of advanced programming languages, such as Java, to allow the use of interactive edits and routing and the addition of features such as pop-up help and progress bars. One could argue that the possibility of including such features represents a radical step forward compared with what is possible with conventional paper self-completion forms. However, the use of advanced programming languages can be problematic since different browsers may be inconsistent in the way in which they interpret Java and JavaScript. Further, the addition of such features necessarily results in the file size of the instrument growing. This impacts on download times and increases the risk of non-response. It is worth noting that usability tests performed by the US Census Bureau suggested that respondents rarely made use of on-line help. This was true irrespective of how it was presented, for example via buttons, hyperlinks or icons (Kanarek and Sedivi, 1999). 3.4. Security Security is a major issue for government surveys. Respondents to official surveys need to be assured that the data that they give us will be kept secure. At the same time, we need to devise systems that protect the integrity of our own data from hackers (Ramos, Sedivi and Sweet, 1998 and Clayton and Werking, 1998). The US Census Bureau has developed a three level security system including the use of encryption, authentication and a firewall. Encryption provides strong security of data on the Internet while it passes between respondent and the Bureau. Their encryption solution requires respondents to have version 4.0 or higher of either Netscape Communicator or Microsoft’s Internet Explorer and enables strong 128-bit encryption. Once received at Census bureau, the data are protected by a firewall that guarantees that others from outside cannot have access to the data (Kanarek and Sedivi, 1999). There is a tension between the strength of security and survey response. The experience of the US Census Bureau has been that the more stringent the security the lower is the response. In part, this appears to be related to technical issues, such as respondents not having the required browser version or encryption level. It is also possible that requiring the 6 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys use of usernames and passwords to access a web survey may be problematic for some respondents (Sedivi, Nichols, Kanarek, 2000) 3.5. Data quality One of the attractions of web based data collection, compared with paper self-completions, is the improvement in data quality that is offered with the use of CAI methods. If the technical issues noted above can be overcome, WEB CASI opens up the possibility for more complex questionnaires to be self-administered than is possible with paper based approaches. Complex routing can be incorporated in a way that is not feasible in a paper document. Computations and edits can be carried out at the time of the interview and inconsistent or improbable responses checked with the respondent. This is analogous of the move from pencil and paper methods to CAPI and CATI. An important aspect of data quality is the reduction of measurement error. This arises when inaccurate answers are recorded for respondents. There are different sources of measurement error, such as poor question wording and differences arising from the mode in which the interview is conducted. It is possible that there will be mode effects arising from the switch from other modes to web CASI. For example, compared with CAPI or CATI, there is a difference between the way in which information is presented to respondents – a move from aural to visual presentation. The speed with which respondents read and respond to questions can’t easily be controlled. Clearly more research is needed in this area to examine such questions. 3.6. Survey response Currently, web surveys have tended to achieve lower response than mail surveys. The US Census Bureau has found no evidence from its trials, largely among businesses, that offering a web option in addition to other modes leads to an overall increase in response rate (Sedivi, Nichols and Kanarek, 2000). Part of this is likely to be accounted for by technical problems, discussed earlier. Another possible explanation offered relates to respondents’ concerns about privacy and confidentiality (Couper, 2000). Response rates to self-completion surveys tend to be lower than to interviewer administered surveys and there is every reason to expect that this will be the case for web surveys too. CAPI and CATI achieve higher response than mail surveys because of interviewer involvement in the initial recruitment of respondents and in sustaining respondent involvement. Interviewers can be good at persuading respondents to participate by explaining the value of the survey and the respondents’ participation. During the interview itself, interviewers can develop a rapport with respondents that encourages them to continue. For conventional surveys, the proportion of respondents who refuse to complete an interview, once started, is low. This contrasts with web surveys. Opinion on the optimal length of web surveys is divided but anecdotal evidence suggests that 15-20 minutes is as much as one can expect. This compares with CAPI/CATI surveys that can range from 30-40 minutes up to one hour or more. One interesting feature about the analysis of the profile of the on-line population, presented earlier, is the over-representation of younger adults. This is one sub-group which tends to be under-represented in random probability sample surveys of the general population. WEB CASI might be a way to increase response in this sub-group. However, there is evidence to suggest that it is not a good idea, in mixed mode surveys, to offer respondents a choice of response mode. For example, the US Census Bureau tested whether or not response was boosted by offering both mail and telephone options. Overall 5% of the sample responded by phone but the overall level of response (mail and phone combined) was the same as when only a mail option was offered (Dillman, Clark and West, 1995). 7 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys Questionnaire designers need to be aware that the use of more advanced techniques may impact on respondents’ ability to respond. The more complex the questionnaire instrument is, and the more that elaborate programming is embedded within it, the greater will be the size of the resulting file that respondents will need to download to run the questionnaire. This will impact on the time it takes to download the questionnaire and is very likely to impact on response. 3.7. Design principles It has been argued that the same visual design principles that apply to the design of paper questionnaires apply for web surveys (Dillman, 2000). Respondents have the same requirement for information that is presented clearly and efficiently. Layout and design should aim for respondents to read every word of each question and in a prescribed order. Dillman had developed a set of 14 design principles for web surveys. These are: 1. Introduce the Web questionnaire with a welcome screen that is motivational, emphasises the ease of responding, and instructs respondents about how to proceed to the next page 2. Provide a PIN number for limiting access only to people in the sample 3. Choose for the first question an item that is likely to be interesting to most respondents, easily answered, and fully visible on the welcome screen of the questionnaire 4. Present each question in a conventional format similar to that normally used on paper selfadministered questionnaires 5. Restrain the use of colour so that figure/ground consistency and readability are maintained, navigational flow is unimpeded, and measurement properties of questions are maintained 6. Avoid differences in the visual appearance of questions that result from different screen configurations, operating systems, browsers, partial screen displays, and wrap-around text 7. Provide specific instructions on how to take each necessary computer action for responding to the questionnaire, and give other necessary instructions at the point where they are needed 8. Use drop-down boxes sparingly, consider the mode implications, and identify each with a “click-here” instruction 9. Do not require respondents to provide an answer to each question before being allowed to answer any subsequent ones 10. Provide skip directions in a way that encourages marking of answers and being able to click to the next applicable question 11. Construct web questionnaires so they scroll from question to question unless order effects are a major concern, or when telephone and Web survey results are being combined 12. When the number of answer choices exceeds the number that can be displayed in a single column on one screen, consider double-banking with an appropriate grouping device to link them together 13. Use graphical symbols or words that convey a sense of where the respondent is in the completion process, but avoid those that require significant increase in computer resources 8 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys 14. Exercise restraint in the use of question structures that have known measurement problems on paper questionnaires, such as check-all-that apply and open-ended questions Another view is that the web is a fundamentally different medium from paper and that much more research is required before we can determine optimal designs (Couper, 2000). Couper suggests that design issues interact with the type of web survey being conducted and the population which is being targeted. For example, an optimal design for a survey of teenagers may be quite different from that which is required for a survey of elderly people. Dealing with the variety of computer hardware and software that may be used by respondents to access the survey also needs to be kept in mind. A range of factors, such as operating system, browser type and version and Internet connection speed may affect the functionality of the survey instrument. Most of these are outside our control. The result may be that some respondents may not be able to access the questionnaire at all, others may find it slow to respond and/or the questionnaire does not display as intended. 4. Conclusion This paper has been concerned with the use of the web for government social surveys of the general population. Such surveys are required to yield nationally representative estimates. The fact that the proportion of households with Internet access remains low and that the online population differs in important respects from the general population presents a major challenge. Issues of population coverage and sampling mean that it is not possible to achieve good population coverage with stand-alone web-surveys. This suggests that in the immediate future the most likely application of the web in official social surveys is as part of a mixed mode data collection strategy alongside others, such as mail, CAPI or CATI. It is worth noting that, to date, it is with surveys of businesses that NSIs have made most progress with using the Internet as a mode of data collection. Rates of Internet access are relatively high (compared with households) and an increasing number are using the net to conduct their business. Most of the surveys that have offered a web option are relatively short (compared with social surveys) and self-completed. Here the web offers the potential to reduce costs, speed processing and improve data quality (e.g. with use of interactive edits). A number of the issues raised in this paper are common to both surveys of businesses and social surveys of the general population. However, the challenges are greater for social surveys. For general population surveys that have self-completion elements there is growing interest in reporting via the web. Offering a web option for parts of the survey that are interviewer-administered raises even more challenging questions, such as the effect on overall response rates and comparability with data collected by other modes. NSIs are starting to grapple with the issues raised in this paper and the ONS has established a project (the WEB CASI project) within SSD to investigate these issues and take this work forward. Some of the points raised in this paper will be explored in a trial study during which former respondents to one of ONS’ surveys, who reported having home Internet access at the time at which they were interviewed, will be approached to participate in a trial study. The results of our work will be shared within the social survey research community to contribute to the development of our knowledge in this area. This paper was first presented at the ASC conference: The challenge of the internet in May 2001 9 SMB 49 7/01 John Flatley The Internet as a mode of data collection in government social surveys References Couper, M.P. (2000) Web Surveys: A Review of Issues and Approaches Public Opinion Quarterly Volume 64, American Association for Public Opinion Research. Clayton, R.L. and Werking, G.S. (1998) Business Surveys of the future: the world wide web as a data collection methodology in Couper MP, Baker RP, Bethlehem J, Clark CZF, Martin J, Nicholls II W L and O’Reilly JM (eds) Computer Assisted Survey Information Collection, John Wiley and Sons. Dillman, D. Clark, J.R. and West, K.K. (1995) Influence of an invitation to answer by telephone on response rates to census questionnaires Public Option Quarterly, Volume 58, American Association for Public Opinion Research. Dillman, D. (2000) Mail and Internet Surveys, John Wiley. Kanarek, H. and Sedivi, B. (1999) Internet Data Collection at the U.S. Census Bureau, Paper to Federal Committee on Statistical Methodology conference. Office for National Statistics (July 2000) First Release: Internet Access, 1st Quarter 2000. Office for National Statistics (September 2000) First Release: Internet Access. Office for National Statistics (March 2001) First Release: Internet Access. Ramos, M. Sedivi, B. and Sweet, E.M. (1998) Computerised Self-Administered Questionnaires in Couper MP, Baker RP, Bethlehem J, Clark CZF, Martin J, Nicholls II W L and O’Reilly JM (eds) Computer Assisted Survey Information Collection, John Wiley and Sons. Sedivi, B. Nichols, E. and Kanarek, H. (2000) Web-based collection of economic data at the U.S. Census Bureau Paper presented to ICES II Conference. Notes 1 This paper draws on the work already undertaken by other NSIs, especially in the United States and Canada, and the author is grateful to those who have so willingly shared their experience. In particular, I am grateful to Howard Kanarek and Barabara Sedivi Gaul of the U.S. Bureau of the Census and other participants of the Web Surveys session at the recent FEDCASIC workshop in Washington DC for their helpful insights. 2 The person in the respondent’s household in whose name the accommodation is owned or rented, or if there two or more such people, the one with the highest personal income 10 SMB 49 7/01 Using the Omnibus Survey to pilot the Arts Council of England’s survey on arts attendance, participation and attitudes Tania Corbin 1. Introduction In order to test questions for inclusion in a survey, a pilot study is often carried out. Although this provides useful information about how an interview might work at a main stage, it is limited in its use for in-depth question testing. Piloting will give an overall impression of whether a set of questions ‘work’ but will not give insight into how questions are understood. One way of testing how questions are understood is by use of the cognitive process model. This paper describes how the cognitive process model was used to test questions for a module on the National Statistics (NS) Omnibus Survey1 for the Arts Council of England in November 2000. 2. The Cognitive Process Model Reliable survey results depend on respondents providing accurate and relevant responses to survey questions. However, this is not as straightforward as often thought because when asked a question respondents go through a three-stage cognitive process before giving a response (Willis, G. 1994). The first of these stages is comprehension of the question; what do respondents think the question is asking and how are they interpreting specific words and phrases in the question? The next stage is the retrieval of relevant information from memory; what types of information do respondents need to recall to be able to answer the question and what strategy do they employ to recall it? For example, do they count events by recalling each one individually, or do they use an estimation strategy? The final stage in the process involves respondents in making decisions about their answer. Having understood what they think the question is asking and retrieved the relevant information, respondents have to decide how to present their response. Are they willing to devote enough mental effort to answering the question accurately and are they truthful in their response? Do they amend their response in order to present themselves in a better light? 3. Background to the Arts Council Omnibus Pilot The aim of the Arts Council module on the Omnibus was to pilot a set of questions to assess the frequency of attendance at, and participation in , arts and cultural events and activities, and to measure attitudes towards the role, value and public subsidy of the arts. The results of the pilot will inform a large-scale survey to be carried out by ONS during 2001. The only national data currently available on arts attendance are those provided by BMRB’s Target Group Index (TGI) which covers only eight types of arts activities2 . The Arts Council is interested in collecting data on the complete range of arts and crafts which it funds, as well as on some non-funded activities. A wide-ranging survey was carried out in 1991 (Arts Council of Great Britain, 1991), but this needed updating to reflect the changing role of the arts in England and to provide a picture of people’s current engagement with the 11 SMB 49 7/01 Tania Corbin Using the Omnibus Survey to pilot the Arts Council of England’s survey arts. In addition, the Arts Council is required to report on attendance at arts events as part of its Funding Agreement with the Department for Culture, Media and Sport (DCMS). The Omnibus Survey pilot data were analysed by the Question Testing Unit 3 who were able to draw on their experience of cognitive interviewing and analysis. This article describes the advantages of using this combination of methods and the value that was gained for the Arts Council. The pilot module was designed to allow the Arts Council to assess the questions by asking both the interviewers and respondents how well they worked. The data came from three different sources: • questions in the administrative section of the questionnaire asking interviewers to record how respondents had dealt with the showcards used in the module and how they had reacted to the module as a whole 4 • ‘cognitive questions’ placed within the body of the questionnaire aimed at gaining information about the thought processes respondents went through to arrive at their answer5 • general comments on the module by the interviewers highlighting elements of the questionnaire they had found problematic and suggesting ways in which they felt it could be improved. Because the pilot module included cognitive questions it provided not only structural data about the questionnaire and survey process but also information about: • how respondents interpreted questions, • whether or not respondents were able to recall the information required to answer questions, and • whether or not respondents found questions sensitive or embarrassing. 4. Using the cognitive process model on the Arts Council Pilot Problems with survey questions can often be located in one or more of the steps in the cognitive process discussed above. The cognitive questions in the Arts Council pilot module revealed no major instances of stage 1 problems apart from a few respondents who thought that the phrase ‘public funding’ , when applied to the arts, meant that it was ‘financed by means of ticket sales’. The Arts Council module included questions about respondents’ frequency of participation in and attendance at arts and cultural events that relied on respondents finding a successful strategy for recalling this information. Respondents were usually able to provide an answer to these questions with little difficulty and the pre-coded responses appeared to work well. However, a later question6 asked respondents how easy or difficult they had found it to answer these questions and responses here varied. It was common for respondents who rarely attended arts and cultural events to report that recall of frequencies had been fairly easy: “Very easy, I never go to these sorts of things”, while those who were very interested or active in the arts had found it more difficult. In other words the question worked at the second stage in the cognitive process for some, but not all, respondents. As the majority of respondents did provide an answer to the frequency 12 SMB 49 7/01 Tania Corbin Using the Omnibus Survey to pilot the Arts Council of England’s survey question it is likely that some of the answers were based on estimates rather than the recall of individual events. Findings such as these provide a context for the substantive data and could be used to amend questions to make them work better for a larger number of respondents. Questions which may seem innocuous can nevertheless be sensitive. For respondents who were less interested in the arts, questions on participation and attendance at cultural events caused feelings of guilt and embarrassment and in one case were the cause of an argument between the respondent and her spouse. There was evidence that people who had not taken part in any of the listed activities attempted to show themselves in a better light by trying to fit the things they had done into the available pre-codes. For example, a number of respondents said that they had not read anything on a list of novels but then qualified this by saying that they had read other books, autobiographies or texts which could be classed as novels.7 It seemed important to these respondents to be able to say that they had participated in some artistic activities. Respondents who reported doing nothing related to the arts or culture often expressed concern at being thought of as “thick” or “uncultured”. 5. Lessons learned from the pilot Incorporating cognitive questioning in an NS Omnibus module has proved to be an effective and an efficient strategy for piloting a questionnaire. Not only have our customers at the Arts Council been provided with information about the survey process and some substantive results, but also insights into how respondents understood some of the questions. The latter have been particularly useful in identifying questions requiring amendment and in suggesting the form the amendments should take. As a result of the pilot, activities and events such as such as visiting the library, museum or gallery, stately home, castle or palace, or well-known gardens have been added to the survey with the aim of making it more inclusive and less off-putting for respondents. The questionnaire will also ask respondents whether they have ‘read for pleasure’ in the last 12 months. Those who say they have will then be asked for details of the kind of books they have read. More generally, the language used on showcards to describe events and activities has been simplified and the next survey will be accompanied by more detailed interviewer instructions. 6. References Arts Council of Great Britain - Extracts from the 1991 RSGB Omnibus Survey. Available on-line only at www.artscouncil.org.uk/publications/pdfs/rsgb.pdf Willis, G. B. - Cognitive Interviewing and Questionnaire Design: A Training Manual March 1994 Corbin, T. and Ursachi, K. - Report on the Arts Council Pilot Survey carried out in November 2000 National Statistics Omnibus Survey - January 2001 Notes 1 The Omnibus is a multi-purpose survey developed by ONS which allows customers to receive results quickly while retaining the hallmark of high quality - a random probability sample, high response rates and advice on questionnaire design. It is 13 SMB 49 7/01 Tania Corbin Using the Omnibus Survey to pilot the Arts Council of England’s survey now the only Omnibus survey available that uses a stratified random probability sample. See the website at www.statistics.gov.uk. 2. These are: plays, opera, ballet, contemporary dance, jazz, classical music, art galleries/exhibitions and ‘any performance in a theatre’, plus film). 3 The Question Testing Unit use a variety of qualitative methods, including cognitive and in-depth interviews, focus groups, expert reviews and forms design, to improve survey and questionnaire design and quality. 4. Interviewer Questions a “Could you tell me about any problems you had using the showcards, e.g. were some of them too long?” b “Finally, could you comment on whether the respondent found this module interesting. Were there any parts they seemed to particularly enjoy or not enjoy?” 5. Respondent Questions a. “Earlier we asked you about what arts and cultural events you had been to. Was there anything else that you thought of as arts or culture which should have been mentioned but wasn’t?” b. “Thinking now about the questions which asked you how often you had been to these events and how often you participated in arts and cultural activities. Was it easy or difficult to remember the number of times you had been to or participated in these events?” c. “Thinking now about the question which asked you about your attitude towards the public funding of the arts in this country as a whole. Can you tell me what types of arts/cultural events you were thinking about when you answered this question?” d “And may I just check, what is your understanding of public funding? What I mean is, where do you think the money comes from?” 6. “Thinking now about the questions which asked you how often you had been to these events and how often you participated in arts and cultural activities. Was it easy or difficult to remember the number of times you had been to or participated in these events?” 7. The question about reading referred to in the text was: I would like you to look at this card and tell me which, if any, of these things you have done in the last 12 months (Please include things like community events and festivals but exclude anything you have done as part of your job, at school or 6th form college). a Buy a novel, other work of fiction, play or book of poetry for yourself b Read a novel, other work of fiction, pay or book of poetry c Write any stories or plays d Write any poetry e None of these 14 SMB 49 7/01 Developing a weighting and grossing system for the General Household Survey Jeremy Barton 1. Introduction This is a report of methodological work carried out in response to the recommendations of the General Household Survey (GHS) review conducted in 1997. The study investigated the effect of incorporating two types of weighting procedures to adjust for nonresponse bias and sampling bias on the General Household Survey. It follows on from a previous report by Elliot (1999) which investigated the use of a single type of weighting procedure (calibration weighting). In the analyses presented here, we investigated pre-weighting the sample using simple nonresponse weights (sample-based weighting) prior to computing calibration weights (population based weighting). The nonresponse weight groups were created by using GHS data matched with (1991) Census records. Two sets of weights were produced: one based on calibration weighting alone and one produced by preweighting with Census-linked nonresponse weights prior to calibration. This article will detail the methodology used to form the nonresponse groups and show the probable effect the sets of weights will have on nonresponse biases. 2. Background This paper deals with weighting methods, the methodology generally applied to compensate for missing data due to total nonresponse (all survey information for an individual or household is completely absent) and not item nonresponse (information for particular survey questions is missing). The latter tends to be dealt with using imputation methods. Weighting for nonresponse involves applying a weight to each of the respondents so that they also represent a certain number of nonrespondents that are similar to them in terms of survey characteristics. The size of the weights will depend on the level of under-representation of respondents in predetermined groups known as ‘weighting classes’. The weighting classes tend to be based on characteristics that are known to be correlated to response. Two forms of weighting procedures will be covered by these analyses. Firstly, sample-based weighting procedures use information known about the non-respondents to calculate the weights. Unfortunately here lies the main problem of the method, since by definition nonrespondents will have little useful information recorded about them in the survey that can be used for this purpose. Population weighting adjustment, the second methodology covered, uses auxiliary information to measure the extent to which the distribution of the responding sample over several variables differs from the distribution of the known population over the same variables (and time period). This method is sometimes known as post-stratification. The known population distributions are usually obtained from a reliable source such as the Census mid-year estimates or, as in these analyses, the population projections (adjusted by the latest known population estimates) used to gross up the Labour Force Survey (LFS). Accurate and up-to-date population data tend to be available only for a small number of variables such as age, sex and region – these are the ones used for these analyses. Not all SSD surveys presently weight to compensate for non-response, but two that do are the LFS and the Expenditure and Food Survey (EFS). The LFS uses a population-based weighting procedure, grossing up to population estimates within local authorities, age and sex groups, and regions. The LFS can use local authorities in this procedure because it has a 15 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS relatively large sample size (approximately 60,000 households per quarter). It uses the calibration method to calculate the weights. Calibration is explained more fully in section 5. The EFS uses a sample-based weighing procedure combined with a population-based procedure. The sample-based weights are calculated using the results of the Census-linked nonresponse study (Foster, 1994). The population-based weights then adjust the samplebased weights so that the sample distribution matches the population distribution in predefined weighting classes based on region, age-group and sex. The calibration method is also used to produce these weights. Variations of both the LFS method and the EFS method, as applied to the GHS, are investigated in this paper. 2.1. Problems with the use of weighted data The main aim of nonresponse weighting is to reduce nonresponse bias and hence to increase the value of the survey data. There are some disadvantages however. Some weights can be time-consuming to calculate and some statistical procedures can be more difficult to apply and interpret when the data are weighted. Some consideration must also be given to the presentation and interpretation of time series and trend data. Weighted data cannot be meaningfully compared to unweighted data from previous years – any change in the estimates may be solely due to the introduction of weighting. It is not uncommon to present weighted and unweighted data together in the same report. 3. Data sources There are two sources of data that were used in this study, one for the sample-based procedure and one for the population-based procedure. 3.1. Sample-based data sources We used information from the 1991 Census linked (at address level) to 1991 GHS survey data. Using Census information restricts us to using survey data from around the time the Census was carried out, i.e. April 1991, since we cannot assume that household information correct in 1991 will remain correct in 1998 (or perhaps more importantly that the same household is still living at the same address). However, we can compute weights based on 1991 data and use these to compensate for nonresponse on surveys in years after this. This makes the assumption that relative nonresponse in the weighting classes has remained the same between 1991 and the year the survey was carried out. Using Census data has the advantages of providing a greater number of variables to create weighting classes, and that many of these variables will be good predictors of nonresponse. We also considered using information from the postcode address file (PAF) sampling frame, current to the year of the survey (1998), to form weighting classes, since information for both respondents and nonrespondents is present. This has the benefit of providing up-to-date information, but with the drawback that only a small number of variables will be available to produce weighting classes (region and area type: metropolitan/non-metropolitan and ACORN 1 ). Another drawback is that the PAF will only contain information at the postcode unit or postcode sector 2 level and not at the sampling unit (i.e. address ) level. Our analyses (not presented in this report) concluded that weights based on this method were no different to those based solely on calibration methods. However, further investigation of PAF-based nonresponse weighting should be considered. 3.2. Population-based data The population-based source of data we used was the LFS control totals taken from current population projections. These are projections based on mid-year estimates, and have been 16 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS adjusted to exclude residents of institutions (from Census estimates), since these people are outside the scope of the GHS. 4. Deriving weighting groups using Census-linked data 4.1. The Census-linked dataset A GHS dataset from January to June 1991 was linked to Census information at address level from the same year. This provided information on the Census characteristics of respondents and non-respondents to the GHS. Foster (1994) first used this data in a study on Censuslinked nonresponse3 . In order to be able to make recommendations for the current GHS we had to make sure that any variables we used to calculate the nonresponse weights were comparable to variables on the current (1998) survey. This meant that some categories needed redefining on some variables, for example, Standard Region had since become Government Office Region (GOR) with slightly different boundaries. Variables used in the Census-linked analysis of response were restricted to those which had GHS equivalents (in 1998) and which showed fairly close agreement with the GHS equivalent in 1991 (see Foster (1994)). Table 1 shows the variables. Table 1: Variables used in Census-linked analysis of nonresponse Variable Basic Government Office Region (11 categories) Number of cars Sex of HOH Marital status of HOH Household type 1 Household type 2 Household type 3 Pensioner household Lone Parent household Type of building Tenure Central Heating Age of HOH No. of persons in hh No. of adults in hh No. of children in hh No. of >60’s in hh No. of dependent children in hh Age of youngest dependent child Social class of HOH Economic status of HOH SEG of HOH 17 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS Table 2: Weighting classes formed in the CHAID analysis Level 1 split Level 2 split Level 3 split Level 4 split Region North East Merseyside Yorks & Humbs W Midlands South East Scotland No. of Cars 0 or 1 No. of dependent children 0 or 1 Household type 1 1 adult 16-59 Youngest 5-15 3+ adults no child Household type 1 2 adults 16-59 Youngest 0-4 2 adults, 1 or 2 60+ 1 adult only 60+ Region North West E Midlands Eastern South West Pensioner HH No pensioner in HH Region London 2 3 No. of dependent children 2 or more No. of Cars 2 or more Pensioner HH Pensioner in HH Weight class 1 4 SEG (grouped) Skilled Manual Partly-skilled manual Unskilled manual & others Not employed in last 10 years SEG (grouped) Professional Manager/employer Intermediate/jnr No. of adults 1 No. of adults 2 or more 5 6 7 Social Class I II IV Not employed in last 10 years Social Class IIInm IIIm V Other 8 9 10 Type of building Detached Semi-detached Terraced Converted flat/other Type of building Purpose built flat 11 12 Region Wales 18 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS 4.2. Identifying the weighting classes The software package Answer Tree was used to produce groups that differed in terms of response rates based on their census characteristics. The CHAID 4 analysis, of which this is a part, uses chi-squared statistics to identify optimal splits or groupings of independent variables in terms of predicting the outcome of a dependent variable, in our case response. The resulting groups are mutually exclusive and cover all the cases in the sample. The analysis was set so that the smallest category was at least 100 (2.7% of total sample size) in order to prevent forming many very small groups. Table 2 shows the exact way the variables were used to produce the twelve weighting classes. For example, weight class 1 was formed of households in the North East, Merseyside, Yorkshire and Humberside, the West Midlands, the South East, and Scotland; that had one or no car; one or no dependent children; and that also contained either 1 adult aged 16-59, or a youngest child aged 5-15, or 3 or more adults but no children. 4.3. Calculating the nonresponse weights The CHAID nonresponse weight Wi for weighting class i is calculated as: Wi = ( Ri + NRi ) Ri where, Ri = No. of respondents in weight class i, NRi = No. of nonrespondents in weight class i For the nonresponse weights to be most effective the following should hold true: • response should vary as much as possible between the weighting classes; • the survey variables with likely nonresponse bias need to be correlated to the classes; • the distributions of the survey variables should differ as little as possible between respondents and non-respondents within each class. CHAID ensures that the first rule is true. Clearly the second rule will hold true for those variables used to produce the weighting classes, and for those obviously related to them. For those variables not obviously correlated, e.g. tenure, possession of consumer durables, chisquare tests were used to check that there was variation between the weighting classes. All variables that were analysed appeared to be highly correlated to weighting class. The final rule is more difficult to assess since we do not have survey information on non-respondents, but by defining classes using variables known to be correlated to survey variables we can minimise this variation. 5. Production of calibration weights in CALMAR Population-based weighting schemes address additional potential sources of bias to samplebased weighting schemes (e.g. sample non-coverage), and this is why the two types of procedures are sometimes used in conjunction. The population-based weights we produced were computed using a calibration procedure called CALMAR (Sautory, 1993), which is a SAS-based macro. The procedure, also known as raking ratio or rim weighting, is as follows. The sample needs to be divided into weighting classes similar to those produced for nonresponse weighting. The difference will be that the weighting classes will have known population totals. The population figures used are the LFS control totals used to gross the LFS. The calibration process produces weights that, when applied, will produce a 19 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS distribution in the sample weighting classes that matches the distribution in the equivalent population weighting classes to within a preset level of precision. The process is not straightforward because two sets of variables are used: age-groups within sex; and region. Normally using two variables would produce a 2-dimensional crossclassification table where each cell in the table would be a different combination of age/sex and region. However, because the GHS sample is relatively small, the likelihood of small or null cells (cells with no observations) becomes high. Such instances would produce either very large weights, or in the case of null cells, would not be able to produce any weights. Calibration methods avoid this by adjusting the sample distribution to the population distribution for each variable in turn and then repeats the process until the marginal distributions of the sample (the distributions for each variable) eventually match that of the population distributions. The weighting classes used were those that were recommended for the GHS by Elliot (1999) 5 and comprised 12 age/sex categories and 5 regions as follows (Table 3): Table 3: Weighting classes used for CALMAR analysis Age/sex 0-4 5-14 15-24 Male 15-24 Female 25-44 Male 25-44 Female 45-64 Male 45-64 Female 65-74 Male 65-74 Female 75+ Male 75+ Female Region London Scotland Wales Other metropolitan Other non-metropolitan Since we are using calibration weighting in conjunction with nonresponse weights, the sample (the 1998 responding GHS dataset) first needs to be weighted by the nonresponse weights calculated above. These weights are then scaled by n/N, where n is the responding sample size (persons) and N is the total population size we are grossing to. The calibration procedure effectively re-adjusts these weights. This means that we do not need to combine the nonresponse weights with the calibration weights by multiplication – the weight produced by CALMAR is the final weight to apply to the GHS dataset. The CALMAR weight is often known as a grossing weight because it can weight the sample up to the population. Two sets of weights were produced by CALMAR: weight a was produced without preweighting with the nonresponse weight; and weight b was produced by first pre-weighting by the Census-linked nonresponse weight. Both weights were approximately normally distributed as described in Table 4. Table 4: Distribution descriptives of weights a and b Weight Mean a b 2.79 2.83 Standard deviation 0.50 0.60 20 Range 0.91 – 7.49 0.36 – 6.92 No. of grossedup households 24,110,000 24,440,000 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS Note that the means of the weights are different. This is because we are calibrating households to individual population estimates. Note also that for this analysis the population totals used in the calibration were scaled down by a factor of 1000. The figures given for ‘No. of grossed-up households’ are those which would have arisen if the weights had not been scaled. 6. Results The 1998 GHS responding sample was weighted by the two weights (a & b) in turn and the weighted distributions of key variables were compared with each other and to the unweighted distributions. Absolute percentage point differences between the weighted distributions and the unweighted distributions were computed and are shown in tables A1 to A3 in Appendix A. The distributions of variables weighted by weights a and b differ noticeably from the unweighted distributions, so clearly both the nonresponse-only weighting and the nonresponse + CALMAR weighting are having an effect. Additionally, it appears that in most instances the weighting is moving the estimates in the same direction for both sets of weights. Exceptions to this are: • • • • • • • • • percentage of one-adult-only households; rents from local authority; rents from housing association; car ownership (any level); satellite TV ownership; percentage of HoHs in the Professional SEG group; percentage of HoHs in the employer and manager SEG group; percentage of HoHs in the partly/unskilled manual SEG group; percentage of adults who were divorced. All the other estimates are weighted in the same direction (or have no effect) when weighted by either weight a or b, although the relative size of the change differs between the two. The absolute difference of the weighted to unweighted estimate is, in most cases, small for both weights. However for some estimates the absolute differences between weighted and unweighted estimates are quite marked. Table 5 lists estimates where the absolute difference was greater than 1%. The differences between estimates weighted by weight b and the unweighted estimates tended to be higher than those for estimates weighted by weight a. This is what we would want if we believed that the differences represent real biases inherent in the unweighted data. However we cannot be absolutely sure that either weight is moving the estimate in the direction required or is of the correct magnitude to remove bias, and therefore cannot be sure that the weighting is reducing/eliminating the bias or increasing it. We do have some indication for some of the variables of the likely direction of the bias, assuming that nonresponse biases on the GHS had remained fairly consistent over time. This indicator is drawn from the original Census-linked nonresponse work carried out by Foster (1994) and also re-produced in Elliot’s (1999) report on calibration methods. The Census correction factor is the ratio of the set sample estimates to the responding sample estimates. This ratio is also used to compute the Census-linked CHAID weights within each weighting class. We would therefore expect the ratio of the estimates weighted by the Census-linked CHAID weight b to be similar to the Census correction factor, especially for those variables that were used to form the weighting classes. This seems to be the case for most of the variables where there was a Census correction factor from the Foster report. This doesn’t prove that weight b (nonresponse + CALMAR) is the correct weight to use since this 21 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS assumes that the relative biases have remained constant since 1991. However, if we felt that the biases that were found in the 1991 dataset still hold true, then we would expect that weighting by b would reduce the biases for these variables. Table 5: Estimates with absolute differences between weighted and unweighted of more than 1% Absolute difference weight a - unweighted 2 adults in hh No children in hh Youngest child aged 0-4 3 or more adults, no children in hh Owns property outright Marital status – single Marital status –married & living with spouse % men aged 18-24 economically active % women aged 18-24 economically active Absolute difference weight b - unweighted 1 person in hh 2 people in hh 1 adult in hh 2 adults in hh No children in hh Youngest child aged 0-4 1 adult & no children in hh 2 adults & no children in hh 2 cars in hh Owns property outright Owns drier Marital status – single Marital status –married & living with spouse % men aged 18-24 economically active % men current cigarette smokers An effect of sample-based weights is that, unlike population-based weights, they increase the variances of the survey estimates. The size of the increase is proportional to the variance of the weights themselves. The variance in weights can be exacerbated by having many weighting classes with small numbers of observations within them. This has not been the case in this study and we would therefore expect the variance increase to be small. Simple estimates6 calculate the increase in variance to be in the region of 2%. This should be offset by the decrease in variance due to population weighting, as reported by Elliot (1999), for the majority of variables. 7. Recommendations We have recommended the use of a dual weighting scheme. The benefits of population (calibration) weighting schemes have been reported in detail by Elliot (1999). Estimates produced by a dual scheme and by population weights only are similar for most GHS variables. There was some concern that the relative response within the nonresponse weighting classes may have changed since 1991, and that this may lead to application of incorrect weights. Indeed, even survey to survey the relative response within certain groups of the population will also vary, so the use of weights not based on the actual sample to which they are applied are always likely to lead to a certain amount of imprecision. However, we take the view that the weights are likely to be reducing bias, at least to some degree, rather than increasing it, and that this can only be beneficial. 22 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS References Elliot, D. (1999) Report of the Task Force on Weighting and Estimation, GSS Methodology Series. Elliot, D (1991) Weighting for non-response. A survey researcher’s guide, New Methodology Series 17. OPCS Foster, K (1996) A comparison of census characteristics of respondents and non-respondents to the 1991 FES. Survey Methodology Bulletin, 38, 9-17. Foster, K (1994) The General Household Survey report of the 1991 census-linked study of survey non-respondents. OPCS (unpublished) Foster, K (1994) Weighting the FES to compensate for non-response, Part 1: An investigation into census-based weighting schemes. OPCS. (unpublished) Kish, L (1965) Survey Sampling. Wiley. Sautory, O (1993) La Macro CALMAR – redressement d’un echantillon par calage sure marges. Series des documents de travail de la direction des statistiques et sociales, No. F9310. INSEE, Paris. Notes 1. ACORN is a geodemographic classification of postcode areas used extensively in market and social research produced from (1991) Census information and updated annually. ACORN represents a ‘half-way house’ between area-level classifications such as region and area type, and household-level classifications such as building type and SEG of HoH. 2. A postcode unit is the area specified by the 7 character postcode, e.g. “SW1V 2QQ”. A postcode sector is the area specified by the first 5 characters of the postcode unit, e.g. “SW1V 2”. Therefore sectors cover a much larger area than units. For sampling, small sectors are usually grouped with larger sectors to provide a minimum number of addresses within them. 3. It should be noted that approximately 4% of the households were unmatched. The majority of households were matched by address information. However it is likely that in a further small proportion of these, the households at the matched address were not the same in the Census as in the GHS. For example, of those households that were matched and were responding (on the GHS) 10% contained a different number of persons in the household on the GHS than on the Census form. See Foster (1994) for more information. 4. CHAID is an acronym that stands for Chi-squared Automatic Interaction Detection. 5. The LFS uses a slightly differently defined region variable. Also, the LFS and FES use age-groups 5-15, 16-24 and so on. 6. Estimates of increase in variance (L) are based on the following formula from Kish (1965) for the ratio of the variance of a mean in a randomly weighted simple random sample (SRS) to that in an unweighted SRS: (∑ n )(∑ n k ) L= (∑ n k ) 2 h h h 2 h h where n h is the sample size in weighting class h and k h is the weight for class h. 23 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS Appendix A Table A1: Household level variable distributions % of all households Weighted by: Absolute difference (%) weighted – unweighted Ratio of weighted /unweighted Census corr. factor Una = Calmar only weight- Weight Weight Weight Weight Weight Weight b = nonresponse+ Calmar ed a b a b a b Household size 1 person 2 persons 3 persons 4 persons 5 or more persons 28.6 35.8 15.3 14.0 6.3 29.0 35.2 15.6 14.0 6.1 30.5 34.5 15.6 13.6 5.9 0.4 -0.6 0.3 0.0 -0.2 1.9 -1.3 0.2 -0.4 -0.4 1.01 0.98 1.02 1.00 0.97 1.07 0.96 1.01 0.97 0.94 1.03 1.00 1.01 0.97 0.98 Number of adults * 1 adult 2 adults 3 adults 4 or more adults 34.3 51.9 10.2 3.7 33.8 50.8 10.9 4.5 35.7 49.2 10.7 4.4 -0.5 -1.1 0.7 0.8 1.4 -2.7 0.5 0.7 0.99 0.98 1.07 1.22 1.04 0.95 1.05 1.19 1.02 0.98 1.04 1.01 Number of children * No children 1 child 2 children 3 or more children 71.1 12.3 11.5 5.1 72.4 12.4 10.8 4.4 72.6 12.6 10.5 4.3 1.3 0.1 -0.7 -0.7 1.5 0.3 -1.0 -0.8 1.02 1.01 0.94 0.86 1.02 1.02 0.91 0.84 1.01 0.99 0.95 0.92 Age of youngest child * No children Youngest aged 0-4 Youngest aged 5-9 Youngest aged 10-15 71.1 13.0 8.2 7.7 72.4 11.7 8.0 7.9 72.6 11.5 8.0 7.8 1.3 -1.3 -0.2 0.2 1.5 -1.5 -0.2 0.1 1.02 0.90 0.98 1.03 1.02 0.88 0.98 1.01 1.01 0.95 0.98 0.98 28.6 5.7 33.1 15.1 3.7 9.4 4.4 29.0 4.8 32.8 14.8 3.2 10.6 4.8 30.5 5.2 31.7 14.5 3.1 10.4 4.6 0.4 -0.9 -0.3 -0.3 -0.5 1.2 0.4 1.9 -0.5 -1.4 -0.6 -0.6 1.0 0.2 1.01 0.84 0.99 0.98 0.86 1.13 1.09 1.07 0.91 0.96 0.96 0.84 1.11 1.05 1.03 1.00 0.99 0.96 0.92 1.04 1.02 27.6 44.0 22.8 5.6 27.2 43.7 23.1 6.0 28.2 44.6 21.7 5.5 -0.4 -0.3 0.3 0.4 0.6 0.6 -1.1 -0.1 0.99 0.99 1.01 1.07 1.02 1.01 0.95 0.98 1.03 1.00 0.95 0.98 Household type (Census) * 1 adult only 1 adult with child(ren) 2 adults only 2 adults with 1,2 children 2 adults with 3+ children 3 or more adults only 3 or more adults with child(ren) Car ownership * No car 1 car 2 cars 3 or more cars * variable used to form weighting classes 24 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS Table A1 (continued): Household level variable distributions % of all households Weighted by: Absolute difference (%) weighted – unweighted Ratio of weighted /unweighted Census corr. factor Una = Calmar only weight- Weight Weight Weight Weight Weight Weight b = nonresponse+ Calmar ed a b a b a b Tenure (harmonised) Owns outright Buying on mortgage Rents from LA Rents from HA Rents privately 28.0 41.2 16.4 5.2 9.3 26.9 42.1 15.9 5.1 10.0 26.3 41.5 16.5 5.3 10.3 -1.1 0.9 -0.5 -0.1 0.7 -1.7 0.3 0.1 0.1 1.0 0.96 1.02 0.97 0.98 1.08 0.94 1.01 1.01 1.02 1.11 Ownership of consumer durables Satellite TV VCR Washing machine Drier Dishwasher Microwave oven CD player Home computer 29.2 85.0 91.9 52.5 23.8 78.5 68.4 33.6 29.3 84.9 91.5 52.0 23.8 78.2 69.4 34.6 29.0 84.6 91.1 51.4 22.8 77.9 69.2 34.0 0.1 -0.1 -0.4 -0.5 0.0 -0.3 1.0 1.0 -0.2 -0.4 -0.8 -1.1 -1.0 -0.6 0.8 0.4 1.00 1.00 1.00 0.99 1.00 1.00 1.01 1.03 0.99 1.00 0.99 0.98 0.96 0.99 1.01 1.01 Central heating 90.3 90.3 90.0 0.0 -0.3 1.00 1.00 0.99 0.99 1.02 1.00 1.05 0.99 * variable used to form weighting classes Table A2: Head of household level variable distributions % of all Head of Households Weighted by: Absolute difference (%) weighted – unweighted Ratio of weighted /unweighted Census corr. factor Una = Calmar only weight- Weight Weight Weight Weight Weight Weight b = nonresponse+ Calmar ed a b a b a b SEG of Head of Household * Professional Employer or manager Intermediate non-manual Junior non-manual Skilled manual Partly/unskilled manual 7.4 19.9 12.4 12.4 26.4 21.4 7.6 20.0 12.4 12.2 26.7 21.1 7.3 19.4 12.4 12.3 27.0 21.6 1.03 1.01 1.00 0.98 1.01 0.99 0.99 0.97 1.00 0.99 1.02 1.01 1.03 1.01 1.00 0.98 1.01 0.99 0.99 0.97 1.00 0.99 1.02 1.01 0.96 0.99 0.98 1.00 1.01 1.05 Address one year ago Lived elsewhere 10.5 11.2 11.5 1.07 1.10 1.07 1.10 1.03 * variable used to form weighting classes 25 SMB 49 7/01 Jeremy Barton Developing a weighting and grossing system for the GHS Table A3: Individual level variable distributions % of all individuals (unless stated) Weighted by: Absolute difference (%) weighted – unweighted Ratio of weighted /unweighted Census corr. factor Una = Calmar only weight- Weight Weight Weight Weight Weight Weight b = nonresponse+ Calmar ed a b a b a b Marital status (% adults) Single Married & living with Married and separated Divorced Widowed 24.7 56.4 2.7 7.6 8.6 27.2 54.6 2.5 7.4 8.3 27.7 53.5 2.7 7.7 8.3 2.5 -1.8 -0.2 -0.2 -0.3 3.0 -2.9 0.0 0.1 -0.3 1.10 0.97 0.93 0.97 0.97 1.12 0.95 1.00 1.01 0.97 93.1 2.0 3.3 93.1 2.0 3.1 93.1 2.1 3.0 0.0 0.0 -0.2 0.0 0.1 -0.3 1.00 1.00 0.94 1.00 1.05 0.91 13.2 11.0 75.8 13.6 10.8 75.5 13.4 10.7 75.9 0.4 -0.2 -0.3 0.2 -0.3 0.1 1.03 0.98 1.00 1.02 0.97 1.00 51.0 70.8 7.9 50.9 69.0 7.9 50.9 68.1 7.8 -0.1 -1.8 0.0 -0.1 -2.7 -0.1 1.00 0.97 1.00 1.00 0.96 0.99 50.7 61.8 4.0 51.5 63.4 3.8 51.7 62.7 3.8 0.8 1.6 -0.2 1.0 0.9 -0.2 1.02 1.03 0.95 1.02 1.01 0.95 Limiting longstanding illness Male 18.7 Female 21.1 18.2 20.8 18.5 21.0 -0.5 -0.3 -0.2 -0.1 0.97 0.99 0.99 1.00 Consulted NHS GP in 14 days before interview Male Female 11.8 16.6 11.5 16.7 11.5 16.7 -0.3 0.1 -0.3 0.1 0.97 1.01 0.97 1.01 Current cigarette smokers (% adults) Men Women 28.2 26.1 29.1 26.0 29.6 26.5 0.9 -0.1 1.4 0.4 1.03 1.00 1.05 1.02 Weekly alcohol consumption (% aged 18 or over) Men > 21 units Women > 14 units 26.8 14.6 27.5 14.8 27.7 14.7 0.7 0.2 0.9 0.1 1.03 1.01 1.03 1.01 Ethnic group White Black Indian, Pakistani, Bangladeshi Education level (% 16-69 not in full-time education) Degree or equivalent Higher qualification GCE or below Economic activity Men 16-17 18-24 65 and over Women 16-17 18-24 65 and over 26 0.94 0.94 1.01 SMB 49 7/01 The General Household Survey as a sampling frame: the example of the Survey of Poverty and Social Exclusion1 Joanne Maher 1. Introduction The practice of using an existing survey to identify a sample of people or households with certain characteristics is commonly used as a relatively inexpensive method of carrying out special population surveys when an alternative sampling frame is not readily available. The use of these follow-up surveys raises some interesting methodological issues which are explored in this article in connection with a survey of Poverty and Social Exclusion (PSE) which was carried out as a follow-up to the 1998/99 General Household Survey2 (GHS). This article will describe the background to the PSE, the methodological issues raised and considerations made by conducting the survey as a follow-up to the GHS, principles which can also apply to other follow-up surveys. Because aspects of this work have not been published elsewhere, this article gives a fuller report of the project for researchers who are interested in the PSE3 itself as well as in the general methodological issues raised. A follow-up survey is designed to re-interview former survey respondents who are chosen on characteristics identified from responses given at an earlier interview. As such it enjoys the benefits (and inherits the difficulties) of the ‘parent’ survey as well as presenting a number of methodological issues peculiar to this type of survey. The issues that need to be addressed include sample design, questionnaire design, respondent burden and non-response bias (including opportunities for weighting). These issues are foremost in many survey designs but take on another dimension in the context of the follow-up survey. The aims of the follow-up survey are constrained by the parent survey’s design and content but are also enhanced by the rich source of information on which the follow-up survey can be based. The follow-up survey for the PSE was drawn from respondents to the 1998/9 GHS, who were interviewed in detail about their circumstances and views on a range of issues associated with poverty and social exclusion. By using the GHS it was possible to: • Draw a sample with known characteristics; • Use GHS data for cross-topic analysis in conjunction with PSE data; • Reduce the length of the PSE questionnaire by not repeating questions previously asked in the GHS interview; • Identify characteristics of PSE non-responders in the GHS dataset. 2. The Sample Findings from the PSE preparatory research called for a reliable source of income data, on which to measure poverty on the basis of income. The PSE needed to interview enough people in low income households to allow analysis of this group. If a readily available and useable list of such households had been available, then this could have been used as a sampling frame or a large scale sift could have been carried out to identify such households. However, no such reliable frame exists and a sift would have been expensive, particularly as income data is difficult to collect reliably in this way. A follow-up survey was, therefore, deemed the most efficient way of identifying the sample and the GHS was the survey that collected all the information necessary for a follow-up. As the authors of Poverty and Social Exclusion in Britain 4 note, “the PSE survey makes major use of income data from the GHS but measures poverty in terms of both deprivation and income level: whether people lack 27 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE items that the majority of the population perceive to be necessities, and whether they have incomes too low to afford them.” The PSE benefited from using the GHS as a sampling-frame by being able to select a sample with known characteristics. The sample design had to meet three main objectives: 1. Sufficient cases were required for the analysis of key variables by sub-groups. 2. Sufficient cases were required for separate analysis of households and individuals in Scotland. 3. Sufficient cases of low income households and respondents were required to examine their characteristics. The fact that the GHS dataset contains income details, household composition and regional information allowed the sample to be selected on these key variables. Identifying individuals involved a three-stage process5 : 1. a number of areas were selected from all those used for the 1998/9 GHS; 2. a number of households were selected from each of the areas; 3. one individual was chosen from each sampled household. 3. Sampling from edited data 3.1. Calculating equivalised income The sample gave a greater probability of selection to people in lower income groups. An equivalised income measure was developed by Jonathan Bradshaw of York University, in conjunction with ONS. The McClements equivalence scale, which is used as the standard by ONS, was felt not to be appropriate for the PSE, as it does not assign sufficient weight to children, particularly young children. The scale used for the PSE was designed to take account of this. Each member of the household was assigned a value, shown in Table 1: Table 1 Equivalised income scale Type of household member Equivalence value Head of household Partner Each additional adult (anyone over 16) Add for first child Add for each additional child If head of household is a lone parent, add 0.70 0.30 0.45 0.35 0.30 0.10 The values for each household member were added together to give the total equivalence value for that household. This number was then divided into the gross income for that household. For example, the equivalence value for a lone-parent household with two children is 0.7 + 0.35 + 0.3 + 0.1 = 1.45. If the household’s gross income is £10,000, its equivalised income is £6,897 (=£10,000/1.45). The income, household composition and relationship information collected on the GHS allowed the equivalised income to be created for the purpose of drawing the sample. However, it was not only in the sample design that information collected during the GHS 28 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE interview was put to use in the design of the PSE. GHS data was also used to enhance PSE fieldwork procedures, questionnaire design and weighting for non-response. 3.2. Selecting areas Because the GHS is a nationally representative sample of private households in Britain, any probability sub-sample from the GHS will also be nationally representative. For the smaller sample size required for this survey, it would have been most economical to sub-sample the GHS areas and select all GHS respondents in those areas. However because of the requirement to oversample lower income households, a larger sample of areas was required. In fact, 70% of GHS areas in England and Wales were selected (360 areas from a total of 518 ). All of the 54 Scottish areas were sampled to provide sufficient cases for separate analysis of the Scottish data. To allow for variation in income within the areas the list of PSUs was sorted on area and equivalised income quintile groups before any selections were made. 3.3. Selecting Households A sample of 2846 households was taken from the selected areas. PSE required sufficient cases in lower income households to allow analysis of those found to be in poverty by an income based measure. However, for the purposes of testing theories of poverty and social exclusion, a sample of all income groups was required. For this reason households were selected in different proportions according to household income. The equivalised income of the household was grouped into quintiles which were then sampled in the following proportions: Table 2 Proportions sampled in each quintile group Quintile group Proportion sampled Bottom quintile (lowest income) Fourth quintile Third quintile Second quintile Top quintile (highest income) 40% 30% 10% 10% 10% 3.4. Selecting individuals One adult aged 16 or over was selected at random from each sampled household using a Kish grid. This was done in preference to interviewing all eligible adults to reduce interview length and costs. It was also believed that when asking opinion questions (on which the PSE was heavily based), household members may influence each other’s answers; the risk of this was hoped to be reduced by interviewing one person. 3.5. Partial and proxy interviews Only those who had given a full interview in the 1998/9 GHS were eligible for selection. As the intention was to analyse the PSE in combination with GHS data, proxy and partial interviews were excluded from the sample because of their limited usefulness for analysis. By removing proxies and partials from the sample item non-response in the GHS data was reduced for PSE cases when analysing combined PSE and GHS data. 29 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE 4. Contacting the respondent An advantage of using a survey such as the GHS as a means of drawing a follow-up sample is that a more personalised approach could be adopted when re-contacting the GHS respondent than could normally be used when contacting an unknown household at a sampled address for the first time. Advance letters were addressed to the selected respondent, except when a name was not given during the GHS interview, when the letter was addressed to ‘the resident’. Where contact telephone numbers were available, interviewers made initial contact with the respondent by telephone. Once an appointment was made with the respondent the interviews were conducted face-to-face. In the event of a broken appointment, interviewers were instructed to make a maximum of two visits to an address before recording a noncontact, unless they were already in the area and could make an extra call without driving out of their way. 5. Questionnaire Design Choosing a survey design based on a follow-up of the GHS meant that detailed information was already available on those topics covered by the GHS. Questions asked during the GHS interview need not be repeated again in the follow-up survey. The 1998/9 GHS questionnaire held 68 questions on income which, if included again in the PSE would have taken some time from the PSE interview that could be used for other questions. By following up the GHS respondents the PSE was able to concentrate on operationalising concepts of poverty and social exclusion within the survey instrument and was able to reduce the length of the PSE questionnaire. As the follow-up interviews6 took place between 6 and 18 months after the original GHS interview, a small number of follow-up questions were included in the PSE questionnaire to record changes to the household composition, tenure, employment and income. Where appropriate this was carried out by the technique of dependent interviewing. “Dependent interviewing can be used to remind the respondent of previously reported information or to probe for inconsistent responses between data provided in the current interview and data previously provided” (Mathiowetz & McGonagle, 2000). Questions were included in the PSE questionnaire which asked whether there had been any changes to the respondent’s individual and household income since the GHS interview. The amount by which the income had changed and reasons for the change in income were also recorded. It was possible to feed the date of the GHS interview into the question to aid the respondent’s recall. The PSE questionnaire included questions relating to housing conditions. The GHS housing tenure question was fed-forward to the PSE questionnaire to check for changes in tenure over time. Questions were repeated in the PSE for the purposes of verification, but did not have to be asked again unless the situation had changed7 . The PSE dependent interviewing was designed to remind the respondent of the answer they gave during the GHS interview. This approach was adopted to aid the respondent’s recall and reduce the length of the interview. Such proactive interviewing is adopted to reduce respondent burden; however it is not always considered suitable for follow-up interviewing as it allows the respondent to agree too readily with his/her previous response but it was believed that the tenure question would not be disadvantaged by this approach. Little research into the effects of dependent interviewing and what constitutes best practice exists (Mathiowetz and McGonagle, 2000). The effects of rewording questions were therefore carefully considered and care was taken to change the question wording as little as 30 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE possible. This approach seemed to have worked well and interviewers did not report any difficulties with feed forward questions. Dependent interviewing is not always an appropriate approach. It is not appropriate where a respondent is asked an opinion-based question and they may change their response over time. For example, the GHS health questions were repeated in the PSE questionnaire as experience from other surveys, such as the Health Education Monitoring Survey (HEMS), showed that respondents gave different answers on their health status over time. Of particular interest in HEMS were those who changed their reports of having a limiting longstanding illness. Work by ONS’s Question Testing Unit has shown that the question asking whether the respondent has a longstanding illness that limits their activities presents a number of difficulties to the respondent, which may explain why there is variation in the answers given over time. These health questions were repeated in the PSE interview as future analysis of health and social exclusion was anticipated. It would have been possible to route respondents to the PSE questions on health and wellbeing and use of health and social services by their answers to the GHS health questions. However, it was feared that routing in this way would increase the risk of asking respondents inappropriate questions or under-reporting health problems in the PSE questionnaire, where the respondent’s health might have changed or their opinion on their health changed since the GHS interview. Comparison of the GHS and PSE responses to the limiting and long-standing health questions showed that some respondents’ opinion about their health had changed between the two surveys. The responses can be seen in Table 4 below. Although the distribution of longstanding illness for PSE and GHS respondents were broadly similar (see Table 3 below) there was some variation in the answers given by individuals between surveys (see Table 4). For example, of those who reported a limiting long-standing illness in the GHS 14% reported that they did not have a longstanding illness in the PSE (see Table 4 below). These changes could reflect real changes in health but could also be attributed to context effects or to the fact that answers to questions of this nature are often affected by how a person feels on the day they are interviewed and this illustrates a common finding, that such questions should not be treated as factual. Table 3 The distribution of longstanding illnesses for PSE and GHS respondents GHS (%) PSE (%) 30.2 15.7 54.0 100.0 1534 32.0 14.0 54.0 100.0 1534 Limiting Longstanding Illness Non-Limiting Longstanding Illness No Longstanding Illness Total base (numbers) 31 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE Table 4 GHS responses to Limiting Longstanding illness by PSE responses PSE GHS % Limiting Longstanding Illness % Non-Limiting Longstanding Illness % No Longstanding illness % Total Limiting Longstanding Illness Non-Limiting Longstanding Illness No Longstanding Illness Don’t know Total % Base 22.9 4.0 5.1 32.0 3.1 6.5 4.5 14.1 4.2 5.3 44.3 53.8 0 30.2 464 0 15.7 541 0.1 54.0 829 0.1 100 1534 6. Interview length and Respondent Burden Respondent burden was an issue for the content of the PSE. In particular, the interview length was of greatest concern for respondent burden and the ultimate effects on the response rate. ”Long interviews, or unpredictable lengths, can interact with interviewer behaviour, and can limit the number of interviews which can be completed in any one day. Where fieldwork periods are restricted, or days in an area limited to control costs, response may be affected.” (Martin and Matheson; 1999) A great deal of work was done to keep the questions on the PSE to a minimum to reduce respondent burden. The average length of interview was 60 minutes. With older respondents or those who had literacy problems, it took about 90 minutes. Questions requiring a lot of thought or those involving difficult concepts, such as assessments of absolute and overall poverty, added to the length of interview for some respondents. Three types of data collection were adopted in PSE: face-to-face interviews, a self-completion module and a card-sorting exercise. The mixed mode interview was adopted to allow a change of pace of the interview in an attempt to keep the respondent’s interest, but for some this may have become burdensome. 7. Response rate The length of the questionnaire affected the response rate. ONS interviewers are required to give an assessment of how long the interview is likely to take when making an appointment, to ensure that respondents set aside sufficient time. Some sampled individuals refused to take part on hearing that the interview was likely to last for an hour. The relatively short field period (a month), meant interviewers also did not have sufficient time to call back on many households to attempt refusal conversion. Of the 2,431 eligible individuals, 1,534 (63%) were interviewed, the vast majority completing a full interview. One of the drawbacks of using follow-up surveys as a means to identify special populations is that it suffers from two waves of non-response; that of the original survey and that of the follow-up. As such this response rate was disappointing, and may reflect some of the factors outlined above. However, the availability of information about non-responders means that it is possible to compensate for non-response by weighting. 32 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE 8. Weighting the data As noted earlier, the PSE interviewed one person per household, over-sampled households in Scotland, and over-sampled households in the lowest quintile groups of equivalised income. Several weights were calculated to allow for the probability of selection and also to compensate for non-response. The final weight is made up of four elements: a weight for country, a weight for income quintiles, a weight for the probability of selection of individuals from households, and a weight for non-response. Details of the weighting for probability of selection are presented in Appendix 1. A benefit of using the GHS as a parent survey for PSE was that it was possible to compensate for non-response by weighting. To weight for non-response at this stage, response rates were analysed to see how they varied according to categories defined by 1998 GHS data. Variables that were compatible with the PSE were selected from the GHS data to investigate the effect of non-response in PSE. The variables examined were as follows: • Sex • Age • Marital status • Number of adults in the household • Government Office Region • Tenure • Type of accommodation • Number of vehicles • Household type • Limiting longstanding illness • Equivalised income (quintiles) • Economic status (ILO definition) • Age of head of household • SEG of adults • SEG of head of household • Highest educational qualification Consumer durables (TV, satellite/cable receiver, video recorder, freezer, washing machine, tumble drier, dishwasher, microwave, telephone, CD player, computer). CHAID (Chi-squared Automatic Interaction Detector) Answer Tree was used to split the sample down to find the groups which best explain the variation in (weighted) response. It is the compatibility of the subjects covered in the GHS with those in the PSE that allowed a suitable set of variables to be selected. Within these groups weights are formed on the reciprocals of the (weighted) response rates and multiplied by the weight for the probability of selection of individuals from households to create final weight. The variables which Answer Tree selected (from the list above) as having an effect on the response rate were: • Age of Head of household 33 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE • Socio-Economic Group of Head of Household • Government Office Region • Equivalised income • Consumer durables (Satellite/cable receiver, CD player, microwave, washing machine, tumble drier) • Limiting longstanding illness • Household type Appendix 2 shows the eighteen weighting classes that were produced from the output and details of the variables selected by Answer Tree in column2. The percentages, response rate and weight for non-response are also shown. 9. Conclusion The use of the GHS as a sampling frame for the PSE highlights some of the common issues inherent in any follow-up survey. The GHS provided a high quality data source for the PSE in a variety of ways. The GHS provided a sampling frame which could represent particular groups in the proportions required for analysis where no other sampling frame was readily or economically available. The challenge of the sample design in using equivalised income as the basis for selection could be met by the GHS income data. The compatibility of the GHS and PSE topics made the GHS an appropriate source of data not only as a sampling frame but as a rich data source for further analysis. Use of the GHS data allowed the PSE interview to concentrate on operationalising different concepts of poverty and social exclusion rather than taking up interviewing time asking many of the questions carried on the GHS again. The consideration was not only for interview length but also because repeating questions unnecessarily would have reduced respondent tolerance of the survey. Interviewers reported that PSE respondents were concerned about the length of the interview and the impact of this on their limited leisure time. With any follow-up survey, there will be an element of ‘double non-response’, that is the non-response inherited from the parent survey plus the additional non-response from the follow-up. However, the parent survey can provide us with detailed information about nonresponders to the follow-up and this can feed into a weighting scheme. The usefulness of a survey as a sampling frame will always depend on the group of the population that is needed for the follow-up survey but in the case of the PSE, the GHS has proved to be an efficient way of identifying the refined sample. References Mathiowetz, N. and McGonagle, K.(2000) An Assessment of the current state of dependent interviewing in household surveys, Journal of Official Statistics p401 Martin, J and Matheson. J (1999) Response to declining response rates on government surveys, Survey Methodology Bulletin , Issue 45 Notes 1 The author would like to thank the following people for the work they contributed to the design and implementation of the methods adopted in the PSE survey: Charles Lound, Dave Elliot; Angie Osborn, Alan Francis, Olu Alaka, Karen Irving and Michaela Pink. 34 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE 2 The General Household Survey (GHS) has collected information on a variety of topics over the last 30 years. The survey is primarily designed to provide information for policy use by its sponsoring government departments. Its usefulness lies in the potential for cross-topic analysis, and this potential has been exploited by many GHS data users. The multi-purpose design and relatively large sample size of the GHS has also benefited those wanting to conduct research into a particular subsample of the population (for example those aged 65 and over). In this way the GHS has been used not only as a data source for analysis but has also been employed as a sampling frame for follow-up surveys. 3 The PSE was carried out by Social Survey Division of the Office for National Statistics on behalf of a consortium of the Universities of York, Bristol, Loughborough and the London School of Economics who were funded by the Joseph Rowntree Foundation. The PSE aimed to measure poverty and social exclusion in Britain today, focussing on deprivation indicators based on a societal definition of the necessities of life. The team at the four universities conducted preparatory research, defining concepts and testing ways to operationalise them, including question testing work, before commissioning ONS to advise on the survey design and to conduct the fieldwork. The consortium of academics had conducted research into poverty in conjunction with MORI in the Breadline Britain Surveys in 1983 and 1990. PSE was designed to update these surveys. In addition to the follow-up to the GHS described in this paper the ONS Omnibus Survey also asked a representative sample of the population of Great Britain their views on what constitutes the necessities of life in present-day Britain in June 1999. 4 From Poverty and Social Exclusion in Britain – D. Gordon et. al. 2000, p.10 5 The sample was drawn from edited GHS data. This was necessary due to the complexity of the data that was used to select the sample. The use of edited, rather than raw data, has implications for the timescale of such projects. 6 The 1998/9 GHS and PSE were conducted as a computer-assisted personal interview (CAPI) using a Blaise program. Answers recorded during the GHS interview were fed forward into the PSE CAPI interview for the interviewer to verify with the respondent and update where circumstances had changed. Feeding information from one CAPI questionnaire to another helped maintain the smooth flow of the interview. 7 For example, in PSE the tenure question was asked in this way: “Last time we spoke you said that you occupied your accommodation in the following way. Buying it with the help of a mortgage. (fed-forward answer from GHS interview) Is this still correct?” 1. Yes 2. No If the respondent answered ‘no’ the GHS tenure question was asked again in full 35 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE Appendix 1 Weighting for the probability of selection Since the GHS is an equal-probability sample of households and individuals, it was only necessary to build weights to account for each stage of the subsampling from the GHS sample. Weighting for countries 360 areas out of 518 were selected in England and Wales and all PSUs in Scotland. Therefore the initial weight, Wt1, has a value of 518/360 in England and Wales and a value of 1 in Scotland. Weighting by quintile group Households in different income quintile groups (as defined by the data collected in the 1998 GHS) were sampled at different rates. For each quintile group the sampling rate was used to create a second weight, Wt2, which is a product of Wt1 and the sampling rate. For the lowest quintile group: Wt2=Wt1*1. For the fourth quintile group: Wt2=Wt1*4/3. For the top three quintile groups and those who refused the GHS income section: Wt2=Wt1*4. Weighting by household size. The weight needed here depends on the type of analysis to be carried out. Three weights have been created, each of which is a product of Wt2 and household size and composition. 1. Analysing sampled adults. Here the chance of being included for the survey, given that the household is sampled, is ‘1’ divided by the number of adults in the household. So the weighting factor should be the number of adults in the household: Wt3a = Wt2*number of adults in household 2. Analysing families. Here the chance of a family unit being included, given that the household is sampled, is equal to the number of adults in the family divided by the number of adults in the household. Wt3b = Wt2*number of adults in household/number of adults in family 3.Analysing adults in a sampled family. Here each adult can be included in the file either by being sampled themselves, or because their spouse is sampled. Therefore given that the household is sampled, for single adults, the probability is one divided by the number of adults in the household and for adults in couples the probability is two divided by the number of adults in the household. Wt3c If the respondent has a partner, Wt3c =Wt2*number of adults in household/2. If the respondent does not have a partner, Wt3c=Wt2*number of adults in household. 36 SMB 49 7/01 Joanne Maher The GHS as a sampling frame: the example of the PSE Appendix 2 Weighting for unit non-reponse Weighting Class Description Response Rate (%) Weight for Nonresponse Weighted Base Age 17-28 1 2 Socio- Economic Group of head of household Professional; Employer-manager; skilled manual; semi-skilled manual; <missing> Intermediate non-manual; junior non-manual; unskilled manual 32 3.13 986 48 2.09 384 58 65 1.72 1.54 559 807 27 3.75 499 61 36 1.64 2.78 910 389 29 3.42 253 60 52 1.68 1.92 6019 330 79 56 1.27 1.78 825 699 Age 29-40 3 4 5 6 7 8 Government Office Region NE; Merseyside; E Midlands; SW; Wales; Scotland; Satellite/Cable TV Have satellite/cable TV Do not have satellite/cable TV NW; Yorkshire & Humberside; W. Midlands; Eastern; London; SE CD Player Have CD Player Household type 1 pers; 2+ unrelated adults, m. couple – no children; 2 + families; cohabiting couple – no children M. couple - dep children; lone parent; lone parent – indep children; same sex cohab; cohab couple – dep children Tumble drier Have tumble drier Do not have tumble drier Do not have CD player Age 41-63 9 10 Video recorder Have video recorder Do not have video recorder Age 64-70 11 12 Government Office Region NE; E Midlands; W Midlands; SE; SW; Wales NW; Merseyside; Yorkshire & Humberside; Eastern; London; Scotland 37 SMB 49 7/01 Joanne Maher Weighting Class The GHS as a sampling frame: the example of the PSE Description Response Rate (%) Weight for Nonresponse Weighted Base Age 71-100 13 14 15 Equivalised income (Quintiles) 1 (top), 2, 3 77 1.30 649 70 1.44 306 53 1.88 203 51 1.96 275 Quintile 5 (bottom); <missing> Washing machine Have washing machine Do not have washing machine 47 38 2.15 2.65 617 204 All 56 Quintile 4 Microwave Have Microwave Limiting Longstanding Illness Have non-limiting longstanding illness; no longstanding illness Have limiting longstanding illness 16 17 18 Do not have microwave 38 14914 SMB 49 7/01 Evaluation of survey data quality using matched Censussurvey records Amanda White, Stephanie Freeth and Jean Martin 1. Introduction In England and Wales the Office for National Statistics (ONS) is responsible for conducting the decennial Census of Population. Following the censuses in 1971, 1981 and 1991 investigations into survey nonresponse on major government household surveys were carried out by the Social Survey Division (SSD) of ONS which carries out many of the major government household surveys. Addresses sampled for surveys taking place around the time of the censuses were linked with individual census records for the same addresses. Since all the Census variables are available for both responding and nonresponding addresses, this provides a very powerful means of investigating the characteristics of nonrespondents, measuring nonresponse bias and evaluating methods of adjusting for the bias. For the studies carried out following the 1971 and 1981 censuses, the actual matched datasets were not available to the methodologists investigating nonresponse, only specified aggregate tables, which limited the analysis to descriptive comparisons of the characteristics of respondents and nonrespondents, and measurement of bias in terms of census characteristics. However, in 1991 matched micro records were made available under strict confidentiality arrangements which allowed much more detailed statistical modelling to be undertaken to investigate the interrelationship between different variables which relate to nonresponse bias and to assess different methods of re-weighting data to compensate for the bias. Five surveys were included in the study in 1991, allowing comparison of results for surveys with very different designs. Key results are presented in Foster (1998). A more ambitious study is planned in connection with the 2001 Census. Some 8-10 surveys are likely to be included, with very varying survey designs and response rates. The response rates are expected to range from just under 60% to over 80%, based on current levels. More importantly, our knowledge of factors contributing to survey nonresponse has advanced considerably in the past ten years suggesting enhancements to the basic study design, while more powerful and sophisticated statistical modelling techniques are now available for the analysis. This paper outlines our plans for the study following the 2001 Census and presents the results of the pilot for some aspects of the study. 2. Design of the 2001 Census-linked Study of Survey Nonresponse The 2001 study builds on the previous census-linked studies of survey nonresponse carried out by ONS and by Groves and Couper (1998). The key feature of the 2001 study is the collection and use of a significant amount of auxiliary data to supplement the information available from the census. Advances in research on non-response have led to a better understanding of the probable causal mechanisms that determine likelihood of response. Groves and Couper (1998) have put forward conceptual models of the factors determining likelihood of contact and likelihood of co-operation given contact. They list four broad categories of influence: • area characteristics • household characteristics 39 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality • survey design features • interviewer characteristics Each of these combines with the others to affect both likelihood of contact, the interaction between the household and the interviewer and hence the likelihood of co-operation given contact. Their work emphasises that it is not demographic characteristics per se which determine non-response; rather that people with certain characteristics are likely to lead lifestyles or hold attitudes which determine how easy they are to find at home or persuade to take part in a survey. Interviewers have a huge influence on response outcomes. The attitudes and strategies they bring to their work and their detailed behaviour at the household have been shown to be major determinants of response outcome. The 2001 census-linked study of survey non-response plans to measure as many factors as possible which are likely to influence response outcomes in order to explore how area, household, interviewer and survey design characteristics interact to impact on non-response. 3. The data to be collected The data to be used in the study will be drawn from census records and data of survey interviews carried out around the time of the 2001 Census (which took place on 29 April). The census data for households sampled for the participating surveys will be extracted and matched with the corresponding survey data and other information relevant to the analysis. The study will collect the following information: • Information about the areas sampled for the surveys • Information about households • Survey design features • Information about the interviewers • Information about interviewer behaviour and outcome of visits to each sampled address. Figure 1 Design of matched dataset for 1991 study Response outcome information Census variables MATCHING BY HOUSEHOLD Nonrespondents 40 Survey responses SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality Survey responses 3.1. Information about the area Much of the information available about the areas sampled for the surveys in the study will come from the census itself but some other information may also be available. We plan to use the following area level information: • population density • whether urban or rural • crime rate • unemployment rate • proportion of owner occupiers • proportion of multi-occupied units We will also ask interviewers to record information about the area that might be predictive of ease of contact or gaining co-operation. In addition, we will investigate various area classifications that are currently available. Area level information can be linked to samples for any general population survey we carry out and is therefore available for routine nonresponse adjustment (and for sample stratification and analysis). This study will allow us to assess how well it compensates for non-response bias and how it may best be used. 3.2. Information about households The study’s database will have all the variables included in the census. We also aim to ask interviewers to record their observations about characteristics of the selected addresses/households which might be predictive of ease of contact (eg presence of entry phone or other barriers to access) or gaining co-operation. 41 SMB 49 7/01 Interviewer characteristics Nonrespondents Response outcomes Doorstep interaction and interviewer behaviour Interviewer observations MATCHING Census variables Area information Figure 2 The design for the 2001 study White, Freeth & Martin Evaluation of survey data quality 3.3. Survey design features The surveys that are likely to be included in the study are all carried out by face-to-face interview but vary with respect to the following design features: • whether information is required about all the individuals in the household or a selected adult • whether all adults are to be interviewed in person or whether proxy information is allowed • the response rules which determine the response rate • the average length of interview • the length of the field period • the survey topics covered • whether non interview data collected is included (e.g. diaries) We will record survey design information on the database for use in the analysis. 3.4. Interviewers characteristics We plan to carry out a survey of interviewers, repeating an earlier survey of 1998 (Martin and Beerten, 1999) which ONS carried out as part of an international project (Hox, 1999). Based on the earlier results we expect to collect the following data: • socio-demographic characteristics • length and nature of survey experience • performance grade (based on response and other factors over a year) • confidence in ability to gain response • knowledge of techniques which encourage response • reports of calling strategies • reports of strategies used to persuade people to respond 3.5. Interviewer behaviour and outcome of calls As with the earlier studies, we will record the information about the visits interviewers make to each sampled address and the outcome of each call. We will also ask interviewers to record other potentially useful information about their own behaviour and the interaction with respondents. We expect to have the following information: • time of day and day of week of each call • outcome of each call • number of calls to make initial contact • number of calls to complete interview after making contact 42 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality • reasons given for not granting an interview • questions and comments of respondent 4. Sample size The sample size for each participating survey will be similar to the 1991 study - around 5,000 addresses. We plan to carry out a lot of analyses separately for each survey and a sample of 5,000 addresses will yield a large enough number of non-responding households for survey level analyses. As in previous years, responding households will not be subsampled because this will have adverse effects on the statistical models to be developed. Addresses in Scotland and Wales will be included for surveys that cover these countries. 5. Timing of the study Ideally we want to match addresses which will be contacted as near to the census night (April 29 2001) as possible. This was carried out in previous years by including addresses selected for survey interviews in the months on either side of census night. In 2001, we propose to match census data with the data of addresses selected for interview from April and to start collecting the data outlined in section 3 in May. Matching addresses selected for interview before April is not feasible due to data compatibility problems linked to the adoption in April of the National Statistics Socio-economic Classification (NS-SEC) and the modification of a number of classificatory questions used on government surveys in Britain. 6. Analysis plan Organisations sponsoring the surveys included in the study will be paying for their participation and will commission outputs for their particular survey. As with the 1991 study they will be offered a range of analyses from which to select. On the 1991 study all survey sponsors wanted basic descriptive analysis of the characteristics of nonrespondents and extent of nonresponse bias but only one commissioned detailed investigation of alternative methods of nonresponse adjustment. Apart from survey specific analyses we plan to use this rich dataset to examine influences on nonresponse across the surveys. This follows the approach we used for the analysis of the effect of interviewer characteristics on nonresponse carried out recently (Martin and Beerten, 1999). For that study we did not have any information at household level whereas for this we will have a great deal. In planning these analyses we will take account of the hierarchical structure of the dataset, using multilevel modelling to distinguish effects at the different levels. Essentially we have three levels of information (Figure 3): a) interviewer – characteristics and attitudes b) assignment – survey design features, area characteristics c) household – interviewer observations, interviewer behaviour, household characteristics 43 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality Figure 3 Data structure Interviewer Survey/ Area Interviewer Survey/ Area Survey/ Area Survey/ Area Hhold Hhold Hhold Hhold Hhold Hhold Hhold Hhold Outcome Outcome Hhold Outcome Hhold Outcome Hhold Outcome Hhold Outcome Hhold Outcome Hhold Outcome Hhold Our plan is to explore two aspects of non-response separately: whether the interviewer made contact or not; and for each household where contact was made, whether the household cooperated or not. Multi-level modelling is most suitable for investigating the determinants of response. With so many variables potentially available we need to have clear hypotheses about the likely relationships between them in order to decide in which order to enter variables and to interpret the results. 7. Output options Reports giving the results relating to a particular survey will be prepared for the organisations sponsoring each participating survey. Other methodological papers and reports comparing the results across surveys and drawing more general conclusions about the nature of non-response will also be produced. The analysis options we will offer include: a) Descriptive comparisons of characteristics of non-respondents: comparisons of characteristics in terms of census variables of responding, non-contacted and refusing households. b) Measurement of non-response bias in terms of census characteristics: summaries of over or under-representation of different subgroups in terms of census characteristics. c) Comparison of the above with results from previous years (if available) to identify changes over time. d) Logistic regression modelling to determine relative influence of different census variables on response outcomes. 44 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality e) Multilevel modelling to determine the relative effects of area, household and, if available or appropriate, survey design and interviewer level variables on response outcomes. f) Development of weighting schemes based on the above analyses. Ideally these will incorporate information that could be collected routinely by interviewers as well as the information used in current non-response adjustment. 8. Development and implementation of the study Development work for the study started last year. We have had discussions with colleagues in ONS Census Division about the arrangements required to match records. We have also carried out a lot of work to design the procedures for collecting the auxiliary data required for analysis. These were piloted in January and February 2001. The main purpose of the pilot was to: • check that the procedures operate satisfactorily when used simultaneously on a number of large continuous surveys. • assess the time interviewers need to collect the information to ensure that the module can be fitted into interviewers’ assignments. • obtain suggestions for improving the design to minimise burden on interviewers and to help develop training material for the main stage. Nine interviewers and one interviewer supervisor working on a number of government household surveys carried out by ONS were involved in the pilot. The pilot interviewers provided feedback and advice on question wording and layout, and practical matters such as how to handle the material in the field. The key findings of the pilot were: • interviewers can obtain the information required from observation or from their introductory conversation with the informant. • the information required must be recorded as soon as practicable because the details can easily be forgotten. Interviewers working on the pilot recommended recording the information immediately after the contact with the household or on the same day if immediate recording is not possible. • the procedure for recording the information has to be designed to encourage interviewers to record the information immediately after the contact. This means that a compact and user-friendly paper interviewer observation form has to be developed because it is not always safe or convenient for interviewers to use their laptop computer in their car. We incorporated the findings of the pilot in re-designing the procedures and circulated the revised interviewer observation form to three of the original pilot interviewers, eleven other interviewers and an interviewer supervisor. This second round of consultation produced some suggested simplification to the lay-out of the form which were incorporated into the final design. (Appendix A). The collection of auxiliary information has begun and most of the surveys will have completed the data collection by the end of September 2001. The census data will be extracted for matching in 2002 after Census Division has completed processing the census data. We expect to begin analysis in 2003 and to produce the first survey specific outputs later that year. 45 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality References Barton, J. (1999) Effective calling strategies for interviewers on household surveys. Survey Methodology Bulletin, 44, London: Office for National Statistics Campanelli, P., Sturgis, P. and Purdon, S. (1997) Can you hear me knocking? An investigation into the impact of interviewers on survey response rates. London: Social and Community Planning Research. Foster, K. (1998) Evaluating nonresponse on household surveys. GSS Methodology Series No. 8. London: Government Statistical Service. Groves, R.M. and Couper, M. P. (1998) Nonresponse in household interview surveys. New York: Wiley. Hox, J. J. (1999) The Influence of Interviewer Attitudes and Behaviour on Household Survey Nonresponse: an International Comparison. Invited paper presented at the International Conference on Survey Nonresponse, October 28-31 1999, Portland, Oregon. Lynn, P., Clarke, P., Martin, J. and Sturgis, P. (1999) The Effects of Extended Interviewer Efforts on Nonresponse Bias. Invited paper presented at the International Conference on Survey Nonresponse, October 28-31 1999, Portland, Oregon. Martin, J. and Beerten, R. (1999) The Effect of Interviewer Characteristics on Survey Response Rates. Paper presented at the International Conference on Survey Nonresponse, October 28-31 1999, Portland, Oregon. Morton-Williams, J. (1993) Interviewer Approaches. Cambridge: Dartmouth. 46 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality Appendix A The Interviewer Observation Form The Interviewer Observation Form is a double-sided A5 booklet. We have only reproduced the key pages here; the pages for recording calls 02 to 10 are broadly the same as the ones for recording call 01. 47 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality 48 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality 49 SMB 49 7/01 White, Freeth & Martin Evaluation of survey data quality 50 SMB 49 7/01 Interviewer Observation Form Survey Area Address Household Month Is this a Reissue? Yes 1 No 2 Interviewer Name Interviewer Number Please complete this form for every telephone or visiting call made at the selected address. This includes the contact when an interview is obtained. The questions refer to the introductory conversation with the member of the household. This is the conversation that took place between the time you introduced yourself and the time you either started the interview, or ended the contact. Office for National Statistics, 1 Drummond Gate, London SW1V 2QQ. Call 01 If Q3 = 1 to 8, → Q11. If non-contact (Q3 = 9 to 13), → Q4. Otherwise, → Q5. Q1. Date of call (ddmmyyyy) Q2. Time of call (24h: hh.mm) HQ refusal Ineligible: not built /demolished/derelict Ineligible: vacant/empty Ineligible: non-residential Ineligible: residential but no resident hhld Ineligible: communal estab / institution Unable to establish eligibility Address temporarily inaccessible due to weather or other causes No answer on the phone or got answerphone No face-to face contact with anyone at address No contact with anyone at the address but spoke to a neighbour Contact made with sampled household but not with a responsible resident Contact made with responsible resident but not with selected respondent Refusal at intro/bef int - by hhld member Refusal at intro/bef int - by proxy Refusal at intro/bef int – DK hhld or proxy Refusal at intro/bef int - by selected resp Refusal during interview No interview due to language, age, infirmity, disability etc. Appointment made Interviewer withdrew to try again later Appointment broken Placement interview / checking /reminder call completed (diary surveys only) Partial household interview completed Hhld interview completed but non-contact with one or more elements Hhld interview completed but refusal or incomplete interview by one or more elements Full co-operation / interview Other outcomes 2 Card/message left at address 1 2 3 4 5 6 7 Don’t know 5 1 Message left on the phone 2 Message left on the answerphone 3 No card/message left 4 If non-contact with anyone at address (Q3 = 9 to 11), → Q10. Otherwise, → Q5. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Q5. How did you first contact the household at this call? Code ALL that apply Spoke through entryphone/ intercom 1 Spoke through closed door/ window letter box/ door on a chain 2 Spoke through open door/ window 3 Spoke to informant who was outside the household unit 4 Went /invited inside the household unit and spoke to informant 5 Spoke over the telephone Q8. Did the main person make any of the following comments during your introductory conversation? Code ALL that apply Main person did not comment 8 9 10 16-34 2 35-59 3 60 and over 4 Q4. Did you leave a card or message at the address / on the phone / on the answerphone? Q3. Outcome of this call Code ALL that apply Q7. What is the approximate age of the main person? Less than 16 1 6 Q6 to Q8 refer to the MAIN PERSON you made contact with. Q6. Was the main person you talked to: a man/boy? 1 a woman/girl? 2 Don’t know, not sure 3 Positive/neutral comments: Received / remembered advance letter Expecting someone to call Make an appointment and come back I’ll think about it Survey topic is important / or other positive comments about the survey topic Enjoy doing surveys Other positive /neutral comments Negative comments: Not interested / can’t be bothered I’m too busy / Bad time / Just going out / About to go away I don’t know anything We are not typical Not capable / too sick/ old/ infirm Waste of time Waste of money Government knows everything Don’t trust study is confidential Invasion of privacy / too many personal questions Don’t trust surveys Never do surveys / I hate forms Already participated in surveys Negative comments about the survey topic Other negative comments 99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q9. Did the main person ask any of the following questions during your introductory conversation? Code ALL that apply Main person did not ask questions What is the purpose of the survey / What’s it all about? Who is paying for this/who is the sponsor? What will happen to the information How will the results be used? Why/how was I chosen? How long will the interview take? Who’s going to see my answers? Can I be identified? Is it confidential? Is this compulsory? Can I get more information? Can I get a copy of the results? What’s in it for me? Do I get an incentive? How much is the incentive? When / how will I get paid? Other questions 99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Q10. Diary surveys only. Non-diary surveys → Q11. Was this call a: call to make an appointment? 1 placement call? 2 checking call? 3 reminder call? 4 collection call? 5 Q11. When did you fill in the details about this call? 17 18 19 20 On a different day from the call 3 21 22 Go to Accommodation Section (p.22) Immediately after the call 1 On the same day 2 3 Accommodation A4. Please complete for all addresses including ineligibles. Are there any physical barriers to entry to the house/flat/building? Code ALL that apply Locked common entrance 1 Mainly very good Mainly good 1 2 1 Locked gates 2 Mainly fair Semi-detached 2 Terrace/end of terrace 3 Security staff or other gatekeeper Entry phone access 3 4 None 5 Don’t know 6 A1. What type of accommodation is it? House or bungalow Detached Flat or maisonette In a purpose built block Part of a converted house / some other kind of building 4 Room or rooms 6 Caravan, mobile home or houseboat Some other kind of accommodation 7 8 Don’t know / not applicable / unable to code 5 9 If flat, maisonette or rooms (A1 = 4 to 6), → A2. Otherwise, → A4. A2. Is the household’s accommodation self-contained? Yes, all rooms are behind a door that only this household can use 1 No Don’t know A3. What is the floor level of this household’s accommodation? Basement or semi-basement Ground floor (street level) 1st floor (floor above street level) Second floor Third floor Fourth floor Fifth to ninth floor Tenth floor or higher Don’t know 22 N2.* Are the houses/blocks in this area in a good or bad state of repair? 2 3 A5. Which of the following are visible at the sampled address? Code ALL that apply 1 2 3 A bit unsafe 3 Mainly bad Mainly very bad 4 5 Very unsafe 4 Unable to code 6 N7. Is this your final call to this household? N3.* How many boarded-up or uninhabitable buildings are there in this area? None One or two 1 2 1 A few 3 Security gate over front door Bars/grills on any windows 2 3 Several or many Unable to code 4 5 Other security device(s) e.g. CCTV 4 Security staff or security lodge on estate or block 5 None of these 6 Don’t know 7 The term “area” in the following questions refers to the area that you can see from the address. 1 2 3 4 5 6 7 8 9 Very safe Fairly safe Burglar alarm Neighbourhood N1.* Is the sampled house/flat/building in a better or worse condition than the others in the area? Better Worse 1 2 About the same 3 Unable to code 4 N6.* How safe would you feel walking alone in this area after dark? Yes 1 → Go to Household Section (p.24) No 2 → You are now finished until the next call N4.* Are most of the buildings in the area residential or commercial / non-residential? All residential 1 Mainly residential with some commercial or non-residential 2 Mainly commercial or other non-residential 3 Unable to code 4 N5.* Is the house/flat part of a council or Housing Association housing estate? Yes, part of a large council estate 1 Yes, part of a council block No 2 3 Unable to code / not applicable 4 23 Household information H3. Are there any children aged 5 or under? H1. Interviewer code final outcome: Full or partial interview / co-operation 1 Refusal 2 Yes, definitely 1 Possibly 2 No 3 → H2 No interview due to language, age, infirmity etc 3 Don’t know 4 National Travel Survey → H7. Other surveys → H8. H7. Does the household at present own or have available for use any cars or vans? Include any company cars or vans if available for private use. Refused 5 Non-contact 4 Ineligible HQ refusal 5 6 → H8 Yes 1 H4. Is any adult in paid work? No 2 Yes 1 Don’t know /refused 3 No 2 Don’t know /Refused 3 H2. Please enter household details. Use “DK” for don’t know and “Ref” for refused. No. of adults (aged 16 or over) No. of children (less than 16) H5. How long has the household lived at this address? You have finished at this address. Thank you for completing the questionnaire. Please key the information into your laptop at home and return the questionnaires to the office when you have completed this quota. Months → Sex M - Male F - Female Age band 1 16 - 34 2 35 - 59 3 60+ DK - Don’t know DK- Don’t know Ref - Refused Ref - Refused Adult 1 Adult 2 Adult 3 Adult 4 OR Years H8. Please check that you have completed the Accommodation & Neighbourhood Section (p.22-23) and enter the number of calls made to this address → Don’t know/ refused 99 H6. Do you know or think the occupants are: Code from observation Code ALL that apply White 1 Mixed 2 Adult 5 Asian (Indian, Pakistani, Bangladeshi, other) 3 Adult 6 Black (Caribbean, African, other) 4 Adult 7 Chinese and other ethnic group 5 Don’t know 8 24 25 Forthcoming conferences, seminars and courses 1. SRA training days 19 th September 2001 Introduction to qualitative data analysis This training day follows on from the Introduction to Qualitative Research training day. Introduction to Qualitative Data Analysis is aimed at people with little experience in this area. The day will cover theoretical as well as practical approaches to qualitative data analysis. Practical sessions will cover the interpretation of qualitative data. There will be time for questions and discussions throughout the day. 7 th November 2001 Telephone research methods in social research Use of the telephone as a methodology for social research is increasing, as it can provide a cost effective and easily controllable way of conducting fieldwork. This training day will help participants to identify when a telephone methodology would be appropriate and when it should be avoided. It is primarily aimed at those with relatively little experience of research in this field, although the day will provide an overview of telephone research methods and their benefits and drawbacks and will provide an opportunity to listen and debate with experienced practitioners. Topics covered will include: a comparison of telephone and faceto-face methods; sampling issues including sample sources and coverage; practicalities of conducting telephone research from recruiting and training interviewers to maximising response rates; and researching sensitive topics by telephone. All events will be held at the London Voluntary Resource Centre, 356 Holloway Road, London. Courses cost £65 for members, £110 for non-members (includes membership of the SRA for one year) and £16 for students/unwaged For further information on the contents of the courses contact: Joanne Maher Office for National Statistics 1, Drummond Gate London SW1V 2QQ Email: [email protected] Or visit the SRA’s website: http://www.the-sra.org.uk 2. Centre for Applied Social Surveys CASS is currently developing a program of courses for the next academic year. Full details will be available in September. 51 SMB 49 7/01 Forthcoming conferences, seminars and courses Jane Schofield Department of Social Statistics University of Southampton Southampton SO17 1BJ Tel: 02380 593048 Fax 02380 593846 3. Royal Statistical Society Seminars 2-5 September 2002 University of Plymouth will be hosting the 2002 RSS conference For more information on any of the courses listed above or to book a place contact Celia Macintyre e-mail [email protected] The Society has accepted an invitation to host an Ordinary Meeting organised by the Research Section, at the IMS Annual Meeting in Banff, Canada, July 27-August 1, 2002, subject to availability of a suitable paper. Papers for reading at Ordinary meetings organised by the Research Section may be submitted at any time although, to be considered for the Banff meeting, submission by July 31, 2001 is likely to be necessary. Further information about Research Section read http://www.maths.soton.ac.uk/staff/JJForster/RS/rsobj.html papers is available at 4. The Cathie Marsh Centre for Census and Survey Research 27 September 2001; 22 January 2002; SPSS for social scientists (1day) 20 March 2002 3 October 2001; 14 January 2002 Surveys and Sampling (1 day) 10 October 2001; 18 January 2002 Questionnaire Design (1 day) 24 October 2001; 30 January 2002 Exploratory Data analysis (1 day) 28 November 2001; 6 February 2002 Basic Data Analysis (1 day) 7 December 2001 STATA for SARS (1 day) 27 February 2002 Introduction to STATA (1day) 23 November 2001 Analysing Hierarchical Surveys (1 day) 23 November 2001 Analysing Hierarchical data (1 day) 15- 16 January 2002; 4-5 June 2002 Multiple Regression (2 days) 13 February 2002; 3 July 2002 Logistic Regression ( 1 day) 4 March 2002 Conceptualising Longitudinal Data Analysis (1 day) 52 SMB 49 7/01 Forthcoming conferences, seminars and courses 5 March 2002 Introducing Longitudinal Data Analysis (1 day) 13 March 2002 Factor Analysis (1 day) 10 April 2002 Cluster Analysis (1 day) 8 May 2002 Multilevel modelling (1 day) 23-35 January 2002 Design and Analysis of Complex surveys (3 days) 25-27 March 2002 Longitudinal Data Analysis (3 days) 31 October – 1 November 2001; 8 – 9 Demographic forecasting with POPGROUP January 2002 For further information on the above courses or to book a place contact Kate Thomas Tel 0161 275 4736 Or visit Http:\\les1.man.ac.uk/ccsr/setvices.htm 5. Miscellaneous Courses and Conferences 12-14 September 2001 7 th International Blaise Users Conference Washington, USA http://www.blaiseusers.org 16-19 October 2001 Q2001 Symposium: Achieving Data Quality in a Statistical Agency: A Methodological Perspective Quebec, Canada http://www.statcan.ca/english/conferences/sym posim2001/index.htm 18 February – 8 March 2002 Scaling and Cluster Analysis: 31 st Spring Seminar at the Zentralarchiv Köln, Germany http://www.gesis.org/en/index.htm 16 November 2001 What have we learned from SARS: a one day conference to showcase some of the most important and innovative research findings based on the 1991 SARS MANDEC centre, University of Manchester http://www.les1.man.ac.uk/ccsr/cmu/conferenc eprograme.html 53 SMB 49 7/01 The National Statistics Methodology Series This new series, aimed at disseminating National Statistics methodology quickly and easily, comprises monographs with a substantial methodological interest produced by members of the Government Statistical Service. Currently available 1. Software to weight and gross survey data, Dave Elliot 2. Report of task force on seasonal adjustment 3. Report of the task force on imputation 4. Report of the task force on disclosure 5. GDP: Output Methodological guide, Peter Sharp 6. Interpolating annual data to monthly or quarterly data , Michael Baxter 7. Sample design options for an integrated household survey, Dave Elliot and Jeremy Barton 8. Evaluating non-response on household surveys, Kate Foster 9. Reducing statistical burdens in business, Andrew Machin 10. Statistics on Trade in Goods, David Ruffles 11. The 1997 UK Pilot of the Eurostat Time Use Survey, Patrick Sturgis and Peter Lynn 12. Monthly statistics of Public Sector Finances – a methodological guide, Jeff Golland, David Savage, Tim Pike, Stephen Knight 13. A review of Sample Attrition and Representativeness in Three Longitudinal Surveys, Gad Nathan 14. Measuring and Improving Data Quality, Vera Ruddock 15. Gross Domestic Product: Output Approach, Peter Sharp 16. Report of the Task Force on Weighting and Estimation, Dave Elliot 17. Methodological Issues in the Production and Analysis of Longitudinal Data from the Labour Force Survey, P S Clarke and P F Tate (Winter 1999) 18. Comparisons of income data between the Family Expenditure Survey and the Family Resources Survey, Margaret Frosztega and the Households Below Average Income team (February 2000) 19. European Community Household Panel: Robustness Assessment Report for United Kingdom Income Data, Waves 1 and 2(March 2000) 20. Producer Price Indices: Principles and Procedures, Ian Richardson (March 2000) 21.Variance Estimation for Labour Force Survey Estimates of Level and Change, D J Holmes and C J Skinner 22. Longitudinal Data for Policy Analysis by Michael White, Joan Payne and Jane Lakey 54 SMB 49 7/01 The National Statistics Methodology Series 23. GSS Report of Survey Activity 1997, Survey Control Unit. 24. Obtaining information about drinking through surveys of the general population, Eileen Goddard 25. Methods for automatic record matching and linkage and their use in National Statistics, Leicester Gill 26. Designing surveys using variances from Census data, Sharon Bruce, Charles Lound and Dave Elliot Forthcoming GSS Report of Survey Activity 1998 GSS Report of Survey Activity 1999 Evaluation criteria for statistical editing and imputation, Ray Chambers Copies are available free of charge to GSS members, otherwise £5, from National Statistics Direct (Tel 01633 812 078). If you would like to submit a monograph please send an e-mail for details of house style and procedures to [email protected]. 55 SMB 49 7/01 Recent Social Survey Division Publications 1. Family Resources Survey, Annual Technical Report 1999/2000 Mark Rowland and Mark McConaghy ISBN 1 85774 422 5 2. Contraception and Sexual Health, 1999 Fiona Dawe and Howard Meltzer ISBN 1 85774 444 6 3. Drinking: Adults’ Behaviour and Knowledge in 2000 Deborah Lader and Howard Meltzer ISBN 1 85774 443 8 4. Psychiatric Morbidity Among Women Prisoners in England and Wales Maureen O’Brien, Linda Mortimer, Nicola Singleton and Howard Meltzer ISBN 1 85774 442 X 5. Overseas Travel and Tourism: Business Monitor MQ6 Data for Q3 2000 ISBN 0 11 538063 9 6. Overseas Travel and Tourism: Business Monitor MQ6 Data for Q4 2000 ISBN 0 11 538064 7 Copies of the above publications can be obtained from National Statistics Direct Tel:01633 812078 Fax: 01633 2762 In addition a selection of recently released ONS publications, including some of those listed above, are available to download from www.statistics.gov.uk/OnlineProducts/default.asp. 7. Assessing people's perceptions of their neighbourhood and community involvement. Part 1: A guide to questions for use in the measurement of social capital based on the General Household Survey module. Melissa Coulthard, Alison Walker, and Antony Morgan, This guide is available, free of charge from: Marston Book Services, P O Box 269, Abingdon, Oxon OX14 4YN. Tel: 01235 465565 Fax: 01235 465556. or http://www.hda-online.org.uk/downloads/pdfs/peoplesperceptions.pdf 56 SMB 49 7/01 Interviewer Observation Form Survey Area Address Household Month Is this a Reissue? Yes 1 No 2 Interviewer Name Interviewer Number Please complete this form for every telephone or visiting call made at the selected address. This includes the contact when an interview is obtained. The questions refer to the introductory conversation with the member of the household. This is the conversation that took place between the time you introduced yourself and the time you either started the interview, or ended the contact. Office for National Statistics, 1 Drummond Gate, London SW1V 2QQ. Call 01 If Q3 = 1 to 8, → Q11. If non-contact (Q3 = 9 to 13), → Q4. Otherwise, → Q5. Q1. Date of call (ddmmyyyy) Q2. Time of call (24h: hh.mm) HQ refusal Ineligible: not built /demolished/derelict Ineligible: vacant/empty Ineligible: non-residential Ineligible: residential but no resident hhld Ineligible: communal estab / institution Unable to establish eligibility Address temporarily inaccessible due to weather or other causes No answer on the phone or got answerphone No face-to face contact with anyone at address No contact with anyone at the address but spoke to a neighbour Contact made with sampled household but not with a responsible resident Contact made with responsible resident but not with selected respondent Refusal at intro/bef int - by hhld member Refusal at intro/bef int - by proxy Refusal at intro/bef int – DK hhld or proxy Refusal at intro/bef int - by selected resp Refusal during interview No interview due to language, age, infirmity, disability etc. Appointment made Interviewer withdrew to try again later Appointment broken Placement interview / checking /reminder call completed (diary surveys only) Partial household interview completed Hhld interview completed but non-contact with one or more elements Hhld interview completed but refusal or incomplete interview by one or more elements Full co-operation / interview Other outcomes 2 Card/message left at address 1 2 3 4 5 6 7 Don’t know 5 1 Message left on the phone 2 Message left on the answerphone 3 No card/message left 4 If non-contact with anyone at address (Q3 = 9 to 11), → Q10. Otherwise, → Q5. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Q5. How did you first contact the household at this call? Code ALL that apply Spoke through entryphone/ intercom 1 Spoke through closed door/ window letter box/ door on a chain 2 Spoke through open door/ window 3 Spoke to informant who was outside the household unit 4 Went /invited inside the household unit and spoke to informant 5 Spoke over the telephone Q8. Did the main person make any of the following comments during your introductory conversation? Code ALL that apply Main person did not comment 8 9 10 16-34 2 35-59 3 60 and over 4 Q4. Did you leave a card or message at the address / on the phone / on the answerphone? Q3. Outcome of this call Code ALL that apply Q7. What is the approximate age of the main person? Less than 16 1 6 Q6 to Q8 refer to the MAIN PERSON you made contact with. Q6. Was the main person you talked to: a man/boy? 1 a woman/girl? 2 Don’t know, not sure 3 Positive/neutral comments: Received / remembered advance letter Expecting someone to call Make an appointment and come back I’ll think about it Survey topic is important / or other positive comments about the survey topic Enjoy doing surveys Other positive /neutral comments Negative comments: Not interested / can’t be bothered I’m too busy / Bad time / Just going out / About to go away I don’t know anything We are not typical Not capable / too sick/ old/ infirm Waste of time Waste of money Government knows everything Don’t trust study is confidential Invasion of privacy / too many personal questions Don’t trust surveys Never do surveys / I hate forms Already participated in surveys Negative comments about the survey topic Other negative comments 99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q9. Did the main person ask any of the following questions during your introductory conversation? Code ALL that apply Main person did not ask questions What is the purpose of the survey / What’s it all about? Who is paying for this/who is the sponsor? What will happen to the information How will the results be used? Why/how was I chosen? How long will the interview take? Who’s going to see my answers? Can I be identified? Is it confidential? Is this compulsory? Can I get more information? Can I get a copy of the results? What’s in it for me? Do I get an incentive? How much is the incentive? When / how will I get paid? Other questions 99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Q10. Diary surveys only. Non-diary surveys → Q11. Was this call a: call to make an appointment? 1 placement call? 2 checking call? 3 reminder call? 4 collection call? 5 Q11. When did you fill in the details about this call? 17 18 19 20 On a different day from the call 3 21 22 Go to Accommodation Section (p.22) Immediately after the call 1 On the same day 2 3 Accommodation A4. Please complete for all addresses including ineligibles. Are there any physical barriers to entry to the house/flat/building? Code ALL that apply Locked common entrance 1 Mainly very good Mainly good 1 2 1 Locked gates 2 Mainly fair Semi-detached 2 Terrace/end of terrace 3 Security staff or other gatekeeper Entry phone access 3 4 None 5 Don’t know 6 A1. What type of accommodation is it? House or bungalow Detached Flat or maisonette In a purpose built block Part of a converted house / some other kind of building 4 Room or rooms 6 Caravan, mobile home or houseboat Some other kind of accommodation 7 8 Don’t know / not applicable / unable to code 5 9 If flat, maisonette or rooms (A1 = 4 to 6), → A2. Otherwise, → A4. A2. Is the household’s accommodation self-contained? Yes, all rooms are behind a door that only this household can use 1 No Don’t know A3. What is the floor level of this household’s accommodation? Basement or semi-basement Ground floor (street level) 1st floor (floor above street level) Second floor Third floor Fourth floor Fifth to ninth floor Tenth floor or higher Don’t know 22 N2.* Are the houses/blocks in this area in a good or bad state of repair? 2 3 A5. Which of the following are visible at the sampled address? Code ALL that apply 1 2 3 A bit unsafe 3 Mainly bad Mainly very bad 4 5 Very unsafe 4 Unable to code 6 N7. Is this your final call to this household? N3.* How many boarded-up or uninhabitable buildings are there in this area? None One or two 1 2 1 A few 3 Security gate over front door Bars/grills on any windows 2 3 Several or many Unable to code 4 5 Other security device(s) e.g. CCTV 4 Security staff or security lodge on estate or block 5 None of these 6 Don’t know 7 The term “area” in the following questions refers to the area that you can see from the address. 1 2 3 4 5 6 7 8 9 Very safe Fairly safe Burglar alarm Neighbourhood N1.* Is the sampled house/flat/building in a better or worse condition than the others in the area? Better Worse 1 2 About the same 3 Unable to code 4 N6.* How safe would you feel walking alone in this area after dark? Yes 1 → Go to Household Section (p.24) No 2 → You are now finished until the next call N4.* Are most of the buildings in the area residential or commercial / non-residential? All residential 1 Mainly residential with some commercial or non-residential 2 Mainly commercial or other non-residential 3 Unable to code 4 N5.* Is the house/flat part of a council or Housing Association housing estate? Yes, part of a large council estate 1 Yes, part of a council block No 2 3 Unable to code / not applicable 4 23 Household information H3. Are there any children aged 5 or under? H1. Interviewer code final outcome: Full or partial interview / co-operation 1 Refusal 2 Yes, definitely 1 Possibly 2 No 3 → H2 No interview due to language, age, infirmity etc 3 Don’t know 4 National Travel Survey → H7. Other surveys → H8. H7. Does the household at present own or have available for use any cars or vans? Include any company cars or vans if available for private use. Refused 5 Non-contact 4 Ineligible HQ refusal 5 6 → H8 Yes 1 H4. Is any adult in paid work? No 2 Yes 1 Don’t know /refused 3 No 2 Don’t know /Refused 3 H2. Please enter household details. Use “DK” for don’t know and “Ref” for refused. No. of adults (aged 16 or over) No. of children (less than 16) H5. How long has the household lived at this address? You have finished at this address. Thank you for completing the questionnaire. Please key the information into your laptop at home and return the questionnaires to the office when you have completed this quota. Months → Sex M - Male F - Female Age band 1 16 - 34 2 35 - 59 3 60+ DK - Don’t know DK- Don’t know Ref - Refused Ref - Refused Adult 1 Adult 2 Adult 3 Adult 4 OR Years H8. Please check that you have completed the Accommodation & Neighbourhood Section (p.22-23) and enter the number of calls made to this address → Don’t know/ refused 99 H6. Do you know or think the occupants are: Code from observation Code ALL that apply White 1 Mixed 2 Adult 5 Asian (Indian, Pakistani, Bangladeshi, other) 3 Adult 6 Black (Caribbean, African, other) 4 Adult 7 Chinese and other ethnic group 5 Don’t know 8 24 25 Subscriptions and Enquiries The Survey Methodology Bulletin is published twice yearly, in January and July, price £5 in the UK and EIRE or £6 elsewhere. Copies of the current and previous editions are available from: Ann McIntosh Survey Methodology Bulletin Orders Social Survey Division Office for National Statistics D1/15 1, Drummond Gate London SW1V 2QQ [email protected] Telephone: (020) 7533 5500 www.statistics.gov.uk Social Survey Division Office for National Statistics 1, Drummond Gate London SW1V 2QQ www.statistics.gov.uk ISBN 1 85774 447 0 ISSN 0263 - 158X Price £5
© Copyright 2026 Paperzz