The application of alternative modes of data collection on UK Government social surveys Report on a workshop held at the Office for National Statistics, 8 – 9 December 2009 Peter Betts Office for National Statistics March 2010 [blank page] Contents 1 2 3 4 5 6 7 8 9 Introduction............................................................................................................2 1.1 The Quality Improvement Fund Project ........................................................2 1.2 Workshop objectives......................................................................................2 1.3 Workshop participants ...................................................................................3 1.4 Workshop agenda...........................................................................................4 Context setting .......................................................................................................4 Literature review and consultation.........................................................................5 3.1 GSS survey taking traditions and drivers for change to survey designs........5 3.2 Available mode choices, including mixed-mode designs, and their advantages and disadvantages....................................................................................6 3.3 Sampling, response and estimation................................................................7 3.4 Mode effects on measurement error and approaches to mixed-mode questionnaire design...................................................................................................8 3.5 Survey transitions.........................................................................................12 Discussion of key issues and information sharing...............................................14 4.1 Reflections on the literature review .............................................................14 4.2 Presentation on recent work at Statistics Netherlands .................................17 4.3 Gaps in the evidence base ............................................................................19 Introduction to the case studies............................................................................20 5.1 Labour Force Survey....................................................................................20 5.2 British Crime Survey ...................................................................................21 Survey specific challenges and plans...................................................................23 Scope for alternative/mixed mode designs for GSS surveys ...............................25 7.1 LFS ideas .....................................................................................................25 7.2 BCS ideas.....................................................................................................27 7.3 General ideas................................................................................................27 7.4 Other GSS surveys.......................................................................................28 7.4.1 Welsh Assembly Government .............................................................28 7.4.2 Department for Transport ....................................................................28 7.4.3 Scottish Government............................................................................29 7.4.4 Northern Ireland Statistics and Research Agency ...............................30 7.4.5 Communities and Local Government ..................................................30 Summary of discussions and emerging themes ...................................................31 8.1 LFS...............................................................................................................31 8.2 BCS ..............................................................................................................31 8.3 Emerging themes .........................................................................................31 References............................................................................................................32 1 1 Introduction This report is an account of a workshop held at the Office for National Statistics (ONS) on December 8th and 9th 2009, on the application of alternative data collection modes on UK government social surveys. 1.1 The Quality Improvement Fund Project The workshop was held as part of a project under the UK Statistics Authority’s Quality Improvement Fund (QIF), which is designed to help the Government Statistical Service (GSS) address matters relating to statistical quality. The project aim was to consider whether it is feasible to implement alternative data collection designs by making, for example, greater use of telephone, internet or postal modes. The project was led jointly by ONS and the Home Office. It arose from considerations regarding the general decline in social survey response rates and the high cost of conducting social surveys, particularly by face-to-face interviewing, the predominant mode used on GSS social surveys. Work was focused on two main areas, modes of data collection and sampling design and estimation. The workshop complemented and informed the completion of two other strands of the project underway at the time: a review of literature on the subjects of alternative/mixed mode data collection and consultation with National Statistical Institutes (NSIs) and other research organisations. The overall objectives of the three strands were to review the current state of play with regard to alternative or mixed modes of data collection and consider their application on GSS social surveys. This workshop report is intended to be a straightforward description of the proceedings. It should be read in conjunction with the two other project outputs. More information on the project background and the findings from the literature review and consultation exercise are presented (with full references) in Betts and Lound (2010b). The third output is a report for the GSS consolidating and reflecting on the findings from the review/consultation and the workshop to provide a research-based view of the applicability of new modes of data collection in GSS social surveys including the likely feasibility and benefits (Betts and Lound, 2010a). 1.2 Workshop objectives The workshop objectives were: • To share information about the likely success of alternative or mixed household survey modes on UK government social surveys, drawing on other countries' experiences and academic research. • To achieve some consensus on the likely scope for adopting this way forward in UK household surveys or what we need to study more. 2 1.3 Workshop participants The workshop included academic/international expert researchers, and survey managers, researchers and methodologists from across the GSS, including the devolved governments in Wales, Scotland and Northern Ireland. Academic / international researchers: Dirkjan Beukenhorst Statistics Netherlands Mick Couper Survey Research Centre, University of Michigan Debbie Griffin US Census Bureau Peter Lynn Institute for Social and Economic Research, University of Essex Government Statistical Service: Dennis Roberts ONS; Director, Surveys and Administrative Sources Martin Brand ONS; Deputy Director, Survey Methodology Division Charles Lound ONS; Methodology Consultancy Service Peter Betts ONS; Data Collection Methodology - Census and Social Surveys Sue Dalton ONS; Data Collection Methodology - Census and Social Surveys Jacqui Jones ONS; Methodology Directorate Roeland Beerten ONS; Social Surveys Division Ed Dunn ONS; Social Surveys Division Dean Fletcher ONS; Social Surveys Division Steve Woodland ONS; Social Data Collection – Field Support and Control John Flatley Home Office; British Crime Survey Debbie Moon Home Office; British Crime Survey Cath Roberts Welsh Assembly Government; Welsh Health Survey Kevin Sweeney Head of Central Survey Unit, Northern Ireland Statistics and Research Agency Alex Stannard Scottish Government; Coordinator Suzanne Cooper Communities and Local Government; Citizenship Survey Abby Sneade Department for Transport; National Travel Survey Survey Harmonisation Thanks are given to the following members of Social Survey Division who took notes during the workshop: Tom Anderson, Kathryn Ashton, Colin Hewat and Clare Watson. 3 1.4 Workshop agenda The workshop was divided into two substantive parts. Following an introduction and context setting, the first day focused on the current ‘state of play’ in mode-related research and application internationally. It included a presentation of the literature review and consultation exercise, as a work-in-progress, followed by discussion of the issues raised, and identification of issues and other research evidence not yet covered. The second day considered the application of alternative modes on UK government social surveys. Survey specific challenges and plans from around the GSS were discussed, and practical ideas for the design of surveys were put forward. The Labour Force Survey (conducted by ONS) and the Home Office’s British Crime Survey were used as case studies, and issues from other GSS departments were also considered. The emerging themes arising from the workshop were identified. 2 Context setting Martin Brand began the workshop by covering the project background, workshop objectives, as described above. The context in which the project arose was then set out by Dennis Roberts. The drivers of the move to consider alternative modes for GSS social surveys were: • the rising cost of traditional data collection methods: additional visits by interviewers to sampled addresses are required to make contact and gain cooperation, and interviews are longer and more complex; • reducing budgets: for example, in ONS five per cent efficiency savings are required each year over five years; and • the continuing need to meet the requirements of survey sponsors, data users and policy makers in terms of quality of measurement, timeliness and relevance. The basic survey model, particularly in ONS, had been followed for decades. Samples of addresses are drawn from the Postcode Address File; and interviews are primarily conducted face-to-face, with some use of telephone interviewing for follow-up interviews, but little use made of the internet. This raised questions as to what is the right way to go forward and what groundwork was needed. Other participants affirmed that similar drivers applied outside government and in other countries. • The American Community Survey and other US surveys face falling response rates and/or difficulty obtaining a large proportion of response by cheaper mail or telephone modes, with fewer landline telephone numbers and increased mobile phone use. More diverse living situations and language needs also increase costs. Annual data is only released eight months after fieldwork ends. • At Statistics Netherlands thirty per cent efficiency savings are required, which will be achieved mostly by reducing face-to-face interviews. • The research community cannot continue with past models but does not know how to move forward. There are potentially many solutions; modes may be a partial one. A diverse approach to data collection is needed, but that undermines standardisation which underpins efficient collection. 4 • Longitudinal or panel surveys offer efficiencies due to the lower cost of follow-up interviews and the possibility of mixed mode designs (collecting household and contact information at the first wave). 3 Literature review and consultation Peter Betts and Charles Lound presented the findings of their literature review and consultation exercise, which had been circulated to participants in advance as a workin-progress. Further detail on the findings and evidence, and all the references, which are not provided in this account of the presentation, can be found in the literature review and consultation report (Betts and Lound, 2010b). The presentation did not include some sections of the report which are included in the final version. The final version includes additional content, on survey management and logistical issues, ‘future proofing’ against social and technological trends, and details of work being conducted in other NSIs and research organisations. It also incorporates additional references suggested by workshop participants. 3.1 GSS survey taking traditions and drivers for change to survey designs • Numerous continuous and ad hoc surveys are conducted across the GSS. Many are cross-sectional but increasingly longitudinal designs are used (including panels and follow-ups). Surveys are conducted in-house by ONS and contracted out by other departments to survey research organisations. • Data collection modes: GSS surveys are primarily face-to-face; some telephone; paper exceptions; some mixing of modes; little internet. • Coverage and sampling: Most surveys are of the general population (UK, GB, lower geography); utilising probability samples (most use the Postcode Address File as the frame); all household members are interviewed or a selection within. • There is some need for comparative measures including: harmonised questions and outputs; Eurostat/ILO requirements; Equalities Data Review; and over time within longitudinal exercises. • The key drivers of change - as previously stated - are twofold. o Response rates are in decline: people harder to contact or less willing to participate. Not consistent across GSS – some departments maintaining rates better than others. Measures to mitigate: interviewer training, incentives, new modes. o The high cost of face-to-face interviewing in a climate of efficiency pressures. • The literature review and consultation exercise was conducted by searching journals, research web sites, conference proceedings; reference to text books; contacting NSIs and other organisations. 5 3.2 Available mode choices, including mixed-mode designs, and their advantages and disadvantages • A widening range of mode choices is available. The basic distinction is between interviewer- and self-administration; but within each are proliferating computerised/automated methods: including computer-assisted personal interviewing (CAPI), computer-assisted telephone interviewing (CATI), computerassisted web interviewing (CAWI), computer-assisted self interviewing (CASI), audio-CASI, interactive voice recognition (IVR), telephone data entry (TDE). There are more portable computing devices available. Use of paper methods also continues. • Factors determining mode choices include access to the survey population (coverage, sampling frame), costs, time, questionnaire (length, complexity, topic, burden), and the non-statutory nature of surveys in the UK. Hence in the GSS interviewers have largely been used due to their value in making contact with respondents and gaining response. • Sources of survey error include coverage, nonresponse, measurement and processing. • The apparent benefits of alternative or mixed mode designs are cheaper, faster surveys with better coverage, less bias and higher response. • The potential problems of alternative or mixed mode designs relate to inferential and comparability challenges: the need for probability samples; incomplete coverage and lack of sampling frames for internet and telephone; and measurement differences by modes. • Thus trade-offs are needed between costs and quality. • Mixed-mode design options include the following. o Mixing data collection only. o Mixing communication with sample members (e.g. prenotification and reminder letters, emails, postcards) and data collection. These are known as ‘mixed-mode systems’. Numerous systems are possible, with different mode mixes being made within or across i) different samples, and/or ii) different time periods and/or iii) the different survey stages: precontact phase; main data collection phase; follow-up phase. o Mixed mode data collection can be concurrent, or sequential (e.g. from cheapest to most expensive). • The main decision to be made is whether to: o mix communication with a single data collection mode: a “win-win” situation as it results in better coverage and/or response rates but with no mode effects (differences in the responses collected in different modes of collection); or o to mix data collection modes: positive with regard to greater coverage, higher response rates and lower costs; but negative with regard to mode effects and confounds between the effects of mode, selection bias, nonresponse biases and time. 6 3.3 Sampling, response and estimation • • • • • • • • The key challenge for internet and telephone data collection relates to nonobservation and representation: coverage bias, lack of sampling frames and selection effects. Internet coverage: There is a ‘digital divide’ – different characteristics of those with internet access (70% of UK households in 2009) and those without. They relate to education, affluence, sex, age, ethnicity, economic activity, health, household size and region. A question remains as to whether bias will reduce as coverage increases. Telephone coverage: 7% of UK households are without a fixed landline; mobileonly households are biased towards younger people; and people are increasingly unlikely to include their number in directories. Probability sampling must be used in official statistics to avoid self-selection bias. o There is no suitable sampling frame for internet users or telephone numbers (landline or mobile). o But there is scope for using internet and telephone modes within a probability sampling framework and mixed-mode survey system (e.g. a postal advance letter including a web link). o Random Digit Dialling and Dual-frame sampling can be used in the UK, but with implications for response rates and measurement error. Interviewers have important roles beyond asking questions and recording answers, including gaining co-operation; establishing eligibility and sub-sampling within the household. Response and nonresponse bias: Nonresponse error relates to response rate and difference between responders and non-responders. Both of these factors vary by mode. Biases affect reliability of estimates and generalisability of findings. Web surveys alone ‘do not offer an antidote’ to the decline in response rates. Many studies explore response rates and nonresponse bias in single modes and in mixed-mode survey systems. o Single mode comparisons show lower response rates by internet than mail/telephone and telephone rates to be lower than face-to-face. o Mixed-mode design doesn’t increase overall response. o In concurrent designs (offering respondents choice of response mode) other modes tend to be chosen in preference to the internet. o Sequential mixed mode designs usually progress from cheap to expensive modes, following-up nonresponders in a new mode. They do not increase the overall response rate but can obtain a large proportion of response by cheaper modes. o Some differences in respondent characteristics have been observed between face-to-face, telephone, internet and mail responders (with mixed evidence for comparisons between face to face and telephone). Miscellaneous findings relating to response rates and nonresponse bias included the following. o Mixed mode data collection is more difficult when including all household members. o The proportion of response by internet mode is highest when it is the only option offered first in a sequential design. 7 Telephone reminders to respond in a self-completion mode are not effective. o Leaving interviewers to ‘mop-up’ people who are harder to contact or persuade may have an impact on their motivation. o Pre-notification is better by post not email, being harder to ignore. o Web follow-up surveys to interviewer surveys result in significant accumulated sample loss. o Interview length is asserted to affect nonresponse, though topic interest or salience may be positively correlated with length. Premature termination may be more likely in telephone and internet modes. The typical length of face-to-face surveys may be a barrier to moving them to other modes, but more needs to be known about sensitivity to length in different modes. o Respondent mode preferences: The literature is inconsistent about preferences for interviewer- or self-administration. Weighting and statistical adjustment: ways to adjust for biases – undercoverage, nonresponse, systematic mode effects – will be required. Possible approaches are design-based (traditional random sampling) and model-based (opt-in/volunteer samples where inclusion probabilities are modelled). o Coverage and non-response bias can be addressed by controlling on demographic variables; implemented in a GREG-type estimator. This needs population distributions. o Alternative approaches include propensity score variables – sociodemographic, behavioural or attitudinal (these have been described as webographic information); this reduces bias; but also reduces the effective sample size. A break in a time series can be estimated by embedded experiment, parallel run or time series methods to evaluate change. The estimated break can be used to backcast or forecast results. o • • 3.4 Mode effects on measurement error and approaches to mixed-mode questionnaire design • • • Mode effects have been defined as “The effect that using a specific mode has on the responses that are obtained in that mode” (International Handbook of Survey Methodology). The interactions between features of modes, questions and respondents are complex but involve three main factors: o media related factors: social norms, and the ‘locus of control’ over the survey environment (which lies with interviewer or respondent); o the information transmission or communication channel: how the cognitive stimulus is presented and received – visual/aural; nonverbal cues, paralanguage (verbal emphasis; visual presentation); and o interviewer effects: how the presence or absence of an interviewer affects privacy, motivation to take part, legitimacy of the survey and assistance that can be provided. There are further psychological effects co-varying with mode: o the respondent’s comfort and the cognitive effort required or possible; 8 the cognitive response process (comprehension; retrieval of information; judgement as to how it maps onto the question format, and provision - or potentially editing or withholding - of a response); and o interactions of mode with question type, task difficulty, respondent ability and respondent motivation. These can all affect the amount and nature of measurement error. o • • The first main type of mode effect is ‘satisficing’, where a respondent’s response processing is not thorough (weak satisficing - responses given will be merely satisfactory) or where stages of the process are skipped (strong satisficing – responses may be arbitrary). Satisficing relates to: o the task difficulty or cognitive burden, which varies by topic and question type, complexity or difficulty; o respondent ability; and o respondent motivation. • The different types of satisficing are as follow. o Acquiescence: tendency to agree with assertions – less effort than to disagree. o Provision of no opinion and don’t know responses. o Non-differentiation: rating a series of items to same scale point. o Random selection of response category. o Selection of the middle/neutral point of a scale. o Extremeness: choosing extreme points on attitude scales. o Response order effects: the effect of the order in which response options are presented. ‘Primacy’ is the tendency to select items near start of list when visually presented. ‘Recency’ is the tendency to select from end when read aloud only. It is more likely when using long lists or when respondents have lower education, comprehension problems or are older. Unordered lists are more susceptible than ordinal. • Research predicts satisficing to be greater: o in self-administration than interviewer administered mode; o in telephone than face-to-face; and o in internet than postal mode. • The evidence from the literature reviewed broadly supported the theory but was not overwhelming and sometimes inconsistent. • The second main type of mode effect is social desirability bias, where the true answer is not reported by the respondent in order to portray him/herself favourably or to meet perceived researcher expectation. o It is associated with questions on sensitive behaviours, attitudes and health and the actual or perceived privacy or anonymity of the survey. o It is greater in interviewer-administration than self-administration and appears to be greater by telephone than face-to-face. o There is debate relating to: whether it is a characteristic of question or a personality trait; what the desired values/norms are; and whether it is subject to conscious control or relates to shortcutting, recall error or self-deception. 9 • Evidence from the literature supported the hypothesis: social desirability bias was reported to be higher in interviewer- than self-administered modes; and in telephone than face-to-face. But there were exceptions. Survey topics in which it was found included health status, behaviour and problems; mistrust in government; alcohol consumption and drug use; employment and television watching. • Other measurement differences are found which it is not possible to attribute to satisficing, social desirability bias or some other factor. They include item nonresponse being higher in self- than interviewer-administered modes, and higher in paper than computerised self-administration. A number of confounding factors in mode studies have been identified. o Confounding mode with sampling and selection. There are two types of mode study: comparisons of unique survey systems (with respect to their coverage, sample frame and design, nonresponse bias and question design) – here it is not possible to generalise or predict to other settings; and experimental designs which isolate mode from selection and nonresponse biases, controlling for survey characteristics and using random assignment of cases to mode. o Confounding mode with questionnaire construction, including the following. Question format effects: the tendency to construct questions and use design conventions differently in different modes, which can change the stimulus. For example, when asking a series of items using forced-choice (yes, no) or ‘check all that apply’. The latter tends to be used in visual presentation, but forced-choice has been found to result in better data in all modes. Context and order effects: the location or order in which questions are presented can cause measurement differences. They can relate to mode in that in paper questionnaires it is possible for the respondent to see previous and subsequent questions, whereas in interviewer-administered modes the order is controlled. In internet surveys, either scenario is possible depending on how the instrument is programmed. o Recall/memory effects, relating to the elapsed time between behaviour of interest and being asked about it varying by mode; or because use of recall aids differed between modes. o Interviewer standardisation: different practices may exist between face-to-face and telephone interviewers, such as their freedom to repeat or alter question wording or explain meaning. o Visual layout: additional meaning is given to self-administered questions by paralanguage (e.g. layout, symbols, graphics, fonts, scale labels) which needs to be considered when translating questions across modes. • • Assessing the magnitude, direction and impact of mode effects were identified in the literature as challenges. Some techniques to deal with them were proposed. 10 o o o o o o • • • • Tests need to control for respondent characteristics, if differential nonresponse results in sample composition differing by mode weighting/correction factors are needed. ‘Empirically based adjustment’ can be informed by embedded experiments in longitudinal surveys. Regression analysis can eliminate factors including sample characteristics; if mode remains a significant factor it can be regarded as a mode effect. The magnitude of differences needs to be reported. Significant differences between estimates may not matter; what is important is the effect on substantive interpretation, for example, of the relationship between variables. Assessing which mode obtains ‘better’ quality data can be done by internal and external validation and with prior knowledge of the expected direction. In conclusion, there exists a ‘dichotomy’ between interviewer modes and selfcompletion, but no mode is superior all round. Internet mode appears to be more like mail than telephone but comparisons are unclear. More controlled experiments are needed. The implications of the findings are that optimum design has to weigh up the advantages/disadvantages of mixing modes: o lower costs, better coverage and higher response rates; against o measurement errors compounded with differential coverage/nonresponse error. This raises the question of whether instruments which are insensitive to mode can be developed. There are various approaches to designing question(naires) for mixed mode data collection. Widely referenced is Don Dillman’s ‘unimode construction’ approach, with its nine principles: o Make all response options the same across modes and incorporate them into the stem of the survey question. o Avoid inadvertently changing the basic question structure across modes in ways that change the stimulus. o Reduce the number of categories to achieve mode similarity. o Use the same descriptive labels for response categories instead of depending upon people’s vision to convey the nature of a scale concept. o If several items must be ranked, precede the ranking question with a rating question. o Develop equivalent instructions for skip patterns that are determined by answers to several widely separated items. o Avoid question structures that unfold. o Reverse the order in which categories are listed in half the questionnaires. o Evaluate interviewer instructions carefully for unintended response effects and consider their use for other modes. 11 • Recently, however, these principles have been replaced by discussion of three different approaches, in recognition of there being no single solution and that there are different ways of combining modes and reasons for doing so. o Unified mode construction: questions are written and presented the same (or nearly) to ensure common mental stimulus. o Mode-specific construction: structure, wording, presentation of questions are modified to suit particular mode capabilities, to achieve the same stimulus. o Mode-enhancement construction: when one mode is primary, others auxiliary (and equivalence is less important), mode-specific features are used to improve quality in the primary mode. • A good example of the practical application of mixed-mode design principles was provided by US Census Bureau. o They follow a principle of ‘universal presentation’ (similar to modespecific construction): “All respondents should be presented with the same question and response categories, regardless of mode. That is, the meaning and intent of the question and response options must be consistent. The goal is that instruments collect equivalent information regardless of mode. By equivalent, we mean that the same respondent would give the same substantive answer to a question regardless of the modes of administration.” o Thirty guidelines have been written relating to various design considerations. o Some issues the Bureau have had to address include: that the principle can’t be applied mechanically, only followed as a whole, in a realistic, flexible and context-dependent way; that application will be long term; and that further testing and ongoing modification are needed. • In summary there are different approaches to mixed mode question design, with varying terms and definitions. Decisions may depend on whether: o there is one main data collection mode with auxiliary modes (in which case use mode-enhancement construction), or o each mode is equally important (in which case use generalised/modespecific construction). There remain uncertainties: further research is needed into what constitutes ‘equivalent stimulus’ in different modes; it may be an ‘illusory goal’; a pragmatic approach may be needed, designing on a question by question basis, adjusted for the survey context and needs. • 3.5 Survey transitions • General advice and actual experience relating to making mode transitions was found in the literature. o Clear communication with users from the planning stage on the key variables and the consequences of discontinuity is necessary. o Guidelines for a smooth transition include: 12 set up a mechanism to estimate change for continuous time series; document development; pilot new methods prior to estimating change; estimate the effects; and implement and document the change. o Gradual and well-managed implementation of new mixed mode design is planned at Statistics Netherlands. The need for unbroken time series in official statistics is being addressed by embedded experiments and pilots to estimate required adjustments. A ‘research on the run’ approach will adjust project plans in light of experiences. Mode effects will be reduced by piecemeal engineering. • Experimental design is covered in the literature, such as embedding experiments and eight characteristics for studies on satisficing and social desirability effects: o One group should be interviewed by one mode, a different group by an alternative mode (to avoid order and practice effects). o Respondents in both modes should be representative samples of the same population. o Respondents should be interviewed in the assigned mode (with no reassignment because refusing in one mode would confound comparison). o Respondents should be interviewed individually in both surveys (not in groups e.g. all household). o Respondents should not choose mode - self-selection could lead other factors to be confounded with mode. o Respondents should not have been interviewed previously about similar issues (practice effects might distort comparison). o The questions to gauge satisficing/social desirability should be asked identically and in identical contexts (number, content and sequence of prior questions). o Comparisons across modes should be subjected to tests of statistical significance. • Issues to be researched are identified or proposed in several papers reviewed. They include the following. o The need to update knowledge with comparative studies and meta-analysis of new modes. o More research is needed on optimal mixes in mixed contact strategies to combat nonresponse, preferably including other indicators such as bias reduction and costs. o Very little is known about optimal questionnaire design for mixed mode data collection (unimode or universal). Empirical research is needed to estimate what constitutes the same stimulus across different modes. o Research is needed on adjustment/calibration strategies for mode mixes. o The choice of mode mix should be explicit in the methodological section of articles/reports. o More mixed mode experiments are needed which include web. o Investigate whether training interviewers (especially telephone) to pause between the question and response options would reduce magnitude of the mode effects. 13 It is important to investigate social desirability – are researchers’ assumptions consistent with respondent views. And whether social desirability is a cause of mode effects found in telephone and face-to-face estimates from LFS (e.g. lower unemployment rate by telephone). o It is not always possible to use unimode design (e.g. no show cards in CATI, so it is advisable not to have lengthy response option lists. When is it preferable to deviate from unimode, and how to design questions (e.g. if introducing self-completion mode, to use five-point scale used in PAPI or two-step question used in CATI? o Different cognitive burden in different modes implies the need to develop optimal design for modes of interest so burden is comparable (e.g. rating scales in self-completion presented as list of questions or as a grid.) o Differential Item Functioning: same question functions differently under different groups (e.g. minority, gender). o That concluded the presentation. Comments by and discussion among participants during the presentation included the following. • Response rates across the GSS appeared to have held up better than ONS surveys. • Attempting to cover mobile phone-only households is very expensive. • Moving sequentially from cheaper to more expensive modes seems to be best. • There is evidence that providing a choice of modes concurrently decreases response rates. Respondents tend to wait for the alternative mode to be sent, with no evidence that the ‘threat’ of being contacted again prompts response. • There is a ‘paradox of choice’: providing respondents with choice of response mode complicates matters. • Paper mode is easier for respondents than internet, where security requirements such as the need to log in to the survey website, register and enter a password sets up barriers to participation and is prone to keying error. 4 Discussion of key issues and information sharing The next session reflected on the literature review presentation and raised further questions. Experiences were shared by participants and gaps in the evidence base were identified. 4.1 Reflections on the literature review The following topics were discussed. • Why not use mail surveys in UK Official Statistics? o Barriers were thought to include the length and complexity of instruments. o Questionnaires could be broken down: modularisation (and matrix sampling) and a short form/long form approach could be considered. o Technical issues of validity/consistency checks and rotated data not being possible were identified. o Extended fieldwork/processing had implications for the timeliness of data delivery (for example, required monthly on LFS). 14 • • • • • o The American Community Survey (mandatory) includes census-type demographic questions (household and person level) in a sequential mixed-mode design, moving from least to most expensive: mail ($10 per form), computer-assisted telephone interviewing (CATI) ($18 per interview) and computer-assisted personal interviewing (CAPI) ($140 per interview). CAPI accounts for 90% of costs. Data on mail returned forms are reviewed with follow up by telephone. o The example was given of the unsuccessful German LFS use of mail. They tried to capture person level data but usually one person responded for everyone. There were problems with item non-response, failure to follow skip patterns and error. They are now looking at multiple questionnaire forms for each household – but at how much reduction in cost? They have different requirements around timeliness of data (quarterly outputs). o The suggestion was made to conduct a proportion of the sample by CAPI, with a larger proportion by mail. This would address timeliness by producing initial estimates, with final estimates once all fieldwork complete. o The Welsh Health Survey uses paper but interviewers make the initial contact to place the questionnaire and return to pick it up. Good recruitment and response rates are achieved. o The Place Survey (commissioned separately by circa 350 Local Authorities) uses mail, leading to variable response rates (40% to 80%). A web survey supplement is being considered. - It was thought that the variation in response across the Place Survey raised questions of inference. o The England and Wales Youth Cohort Study includes a choice between mail and web: only 2% used web. The paper questionnaire is around 12 pages of A4; the estimated response rate 50% (declining). Scepticism was expressed that very long complex surveys can be easily changed into shorter self-completion instruments. However, mode is more than just the data collection medium – it also refers to the sample and respondent communication. So we should talk about systems, not just data collection. Address based sample frames do lend themselves well to mixed mode approaches. A suggested approach for web surveys was to recruit panels with the instrument split into 20 minute chunks and asked every 6 months. It was emphasised that compromise is needed: it is not just a decision about which mode(s) to use but more strategic. What do we give up? The example was given of the Health and Retirement survey, lasting 3 hours, including physical measurements. Who would do this online? We need to separate the technical feasibility of online surveys from the strategic decision to move to this mode. Are we trying to get too much from respondents? There are more modes coming, e.g. mobile computing: we need to look 15-20 years ahead. Why not vary the length of questionnaires by mode on a single survey? Collect core variables across modes, those requiring the most precision in the estimates. Those requiring less precision could be asked in a single mode, or to a subsample. o One-off modules and rotating modules for subsamples are likely; using matrix sampling. In the context of matrix sampling the example was given of the US Consumer Expenditure Survey. 15 • • • • • • • Addresses are a stable way of drawing and contacting a survey sample: they aren’t going to go away or develop like some forms of technology-based contact. What is the primary driver, cost or response (quality)? o Both. Trade-offs will vary and depend on the survey and which dimensions of quality are more important. o There was a suggestion to identify the design issues that address cost and quality; some will only address one of the drivers. Why not just go ahead and combine modes? o The mode effects literature can predict to some extent what will happen when change modes (and the magnitude). It is very important to consider using embedded random selection of mode for some of the sample to attempt to measure the extent of the mode effect. Control for mode and selection effects for example by making respondents with a preference for one mode complete by another mode – you can then create adjustment factors. o Mode effects are often confused with self-selection of mode. Using random selection is a very useful tool for measuring the mode effect. o A US crime survey found that CAPI and CATI modes can produce very different estimates for some questions, quite close for other questions. o A model to split the LFS sample was suggested: a quarter by CAPI and three quarters by mail. o It was suggested that primary sampling unit-based selection could be used for the most expensive mode of collection and another selection method for other modes. o A general consensus holds that there is not as much difference in data between CAPI and CATI (interviewer administered modes) as there is between interviewer administered modes and self administered modes. For example, a survey on aging recruited a panel of respondents aged 50+, with those aged 70+ surveyed face-to-face and 50-70 year olds surveyed by CATI. There was a group aged 68-72 that were randomly assigned to mode – found very little mode effect was found in the data. What about conducting a CAPI interview via Skype (or a similar mode of interaction)? It will be important to understand the transition when switching mode of collection. An issue can be in educating users about the impact of a change in the design, and not underestimating how much time this would take – it should be planned for. o It was thought that users are less worried about mode effects than researchers or methodologists. Datasets deposited at the UK data archive typically did not contain a flag for mode of collection. Eurostat only specifies common outputs and does not require common modes. o Changes might cause problems in time series. Probability samples are important because they allow control of mode effects. Is it possible to cut back the Integrated Household Survey core? To take a similar approach with LFS and define a core set of variables to ask across modes? Headline figures are often based on a small subset of questions. Core data can be collected by cheaper modes. o The difficulty with this is that most of the LFS data tends to be used – it is difficult to cut questions. 16 • The question of compulsion to participate in surveys was raised. To compensate a survey would be shorter but with a more frequent and bigger cut. 4.2 Presentation on recent work at Statistics Netherlands At this point Dirkjan Beukenhorst gave a presentation on ‘The new design of the LFS and research on the fringe’. • The current Dutch LFS is similar to the UK: a quarterly rotating panel design with 5 waves. First wave is by CAPI, all later waves are CATI. Dependent interviewing is used and proxy response is allowed. • There is a problem of panel attrition caused mainly by people not providing their telephone number, which causes bias. • The quality of proxy response is debatable especially for more subjective questions like willingness to work. • CAPI collection at the first wave is expensive but it is not possible to just switch to CATI (even though they can match a telephone number for 70% addresses in the Netherlands). • A two-phase redesign is being introduced: o Phase 1 (2010): a new questionnaire adapted to CATI and web modes will be used. CATI will be used at Wave 1 when a telephone number is known. The Wave 1 questionnaire has been shortened; with some items pushed to the follow up waves. A ‘basic’ questionnaire will measure core demographic variables like employment status on all surveys. o Phase 2 (2011): the first Wave is to be collected by the web as much as possible, then by CATI (if a number is known) then by CAPI. For each case data collection in follow-up waves will be in same mode as at Wave 1 (except there will be no CAPI). o There is increased risk of panel attrition. • The move from the cheapest mode to most expensive will be used on all surveys. • Sample sizes will be reduced and time series modelling used to produce precise estimates. Population register information will be used in the models to try to increase precision for small domains. • The old and new systems will run in parallel for 6 months in 2010 to try to measure breaks in the series and allow for adjustments. There will be no other experiments. Dirkjan’s presentation then digressed into observations on the “sociology of innovation” in NSIs: • Three roles were identified: management, subject matter specialist and methodologist. o Management is responsible for budgets, production and efficiency. o The subject matter specialist is concerned about time series, work routine and tradition. o The methodologist is interested in quality, innovation, estimation problems and data collection problems. • The typical innovation process: o Phase 1: management wants efficiency and rapid change; the subject matter specialist is against change and wants to slow down change; and 17 the methodologist likes the challenge and wants to conduct scientific experiments. Initiative lies with methodologists and ‘outsiders’. o Phase 2: management intensifies the pressure; the subject matter department gives in; the methodologist devises experiments. Initiative lies with the subject matter department and the methodologist. In the end there is no time for experiments, instead parallel-running is conducted. His talk then continued with a look at other relevant research. • Health Interview Survey experiment: after their first interview, respondents were asked to participate in another survey preferably by web, or CATI if not. 92% agreed, 77% of whom responded. This was equivalent to 44% of the original sample. This was not very efficient and was biased due to non-response bias in the original HIS survey. • LISS panel: a web panel, as representative of the population as possible, was used to compare results of the ‘basic’ questionnaire with existing measures of employment. One subsample of the panel was randomly allocated to CATI, the other to web. Both were compared to the basic questionnaire executed in CAPI. o Response rates were high on the web and CATI panel (80%), CAPI was lower (65%). o The estimates of the employment rate under CAPI were 6-7% different to those collected on the panel. Some of the differences appeared to be due to question design. o There was also a big difference between some of the web and CATI estimates (% in part-time employment). Given that these subsamples have the same original bias, this is disturbing. • Lots of questions remain unanswered, particularly what will happen post-Wave 1. Participants then discussed the presentation and further issues. • • • • • On the LISS panel internet access was given to those who didn’t have it, in the form of a SimPc with internet access. Broadband was installed. This is expensive – the panel costs around $3 million a year. o National Centre for Social Research is providing mobile phones for respondents to report periodically a record keeping diary. The recruitment stage – coverage and gaining cooperation - is the biggest problem. Internet instead of CATI option is a possibility for LFS waves 2 to 5. o University of Michigan are currently experimenting on this sort of model. Mode doesn’t really influence the estimates much once the effect of weighting has been taken into account. It was questioned whether the Dutch LFS model had addressed timeliness concerns. o This is addressed by combining cases from different modes from different fieldwork months. o There is an issue of what the reference date means if you are combining cases from different fieldwork stages of a sequential design. 18 • For the Dutch LFS the reference period is the date of the interview. As long as the subsamples from the various modes represent a monthly sample then this is not a problem. One issue would be any commissioned research, e.g. boosts, that altered the balance between the modes. 4.3 Gaps in the evidence base The question was posed: can we identify any gaps in the literature review? • The timeliness issue. When is the optimum time to switch modes in sequential designs with web as first mode? o Statistics Netherlands tend to have 1 reminder (maximum 2) for the web survey and then switch. The fieldwork period in that mode is short as web responders tend to respond quite quickly. • With mail the need to process the data must to be taken into account. • The issue of interviewing everyone in a household. On Understanding Society (UK Longitudinal Household Study) they have been investigating different switching points between CATI and CAPI: o a) when one person in the household asks for CAPI o b) when you can get no further interviews via CATI o There was no difference in response but they don’t know about effect on costs yet. • A gap in the literature is with regard to cost profiles and optimal strategies. • With clustered designs there is often a complicated management decision to be made about when to switch modes, e.g. what do you do if you only have 1 address left in the quota and the interviewer is not local? o Has any work done in Australia about this? ABS now does very little face to face interviewing. o The same applies in Canada – more use is made of CATI with CAPI follow up. o Could modes be targeted to, for example, geographical areas? This is done in Finland. CAPI only used around Helsinki, everything else is done by CATI. o A survey of welfare recipients is conducted mostly by CATI with a smaller localised face-to-face component. Interviewers had mobile phones and screened respondents, provided them with mobiles and conducted the interview by CATI. This was thought to be a good approach for sampling that requires screening. • Costings need to cover the whole statistical value chain, including case management, editing and weighting. • We need to be considering communication with respondents – an ‘Achilles heel’. An ONS project is being conducted. What is the best mode or method for value in terms of achieving response? o There is an under-emphasis on paper communications. Simplicity can be best. In US ‘get well soon cards’ were tried during a swine flu outbreak – anecdotally it seemed to result in interviews at a later date (although not experimental evidence). • It was thought that we need to acknowledge respondent preference for mode – respondent-centred surveying. 19 • o Statistics Finland recently radically reviewed their survey materials from the perspective of respondent experience. o Reference was made to a follow up survey in Sweden that asked about the mode choice. It resulted in words and phrases to incorporate into materials. o Caution against taking action on what people say in focus groups was advised – often they will favour what looks best, but in practice plain leaflets etc get the best response. The issue of use of administrative data instead of collecting it in survey format was raised: the integration of survey and admin data. o Admin data tends to only be used to validate survey data but there is an argument for using it more in the survey process or as a replacement for survey data where it is of sufficient quality. o It was felt that things have to change in this respect. There are issues of data access but we have to integrate data from different sources. Trade-offs need to be faced up to in government, as do legislative and ethical issues. o Admin data about income and benefits is held centrally in the Netherlands which means they don’t have to ask about it on their surveys. o There is no one size fits all approach. Item non-response is often similar across modes once aggregated but not for individual questions. Admin data could fill a gap there. 5 Introduction to the case studies Two surveys had been selected as case studies on which the workshop would focus, the Labour Force Survey (LFS) and the British Crime Survey (BCS). Presentations were given to describe the survey designs and provide other contextual information for participants to bear in mind when discussing the potential for alternative/mixedmode designs. 5.1 Labour Force Survey Presented by Dean Fletcher (Social Surveys Division, ONS). • LFS is a panel survey of the employment circumstances of the UK population (private households). It has been conducted since 1973 (annual survey since 1984; calendar quarter basis since 2006). • Cost: £14 million per annum. • It includes 500 questions and 200 derived variables on: job industries and occupations; health; transport to work; education and training; benefits; hours worked; earnings; household characteristics; employment patterns; and sickness. • Sampling frame: Postcode Address File (PAF). 80,000 households sampled per quarter (systematic random stratified sample). • 50,000 households (120,000 individuals) take part each quarter. Each case weighted by age, sex, geography; each interview equates to 500 people. • Wave structure: LFS: 5 waves, each 3 months apart. Also boost samples in England, Scotland and Wales (Local Labour Force Survey, LLFS) to allow 20 • • • • • • • • • • • analysis at local authority level – 4 annual waves. Annual Population Survey (APS) combines LFS waves 1 and 5 and all LLFS interviews in a year – 160,000 unique households (360,000 unique individuals). Response rate: There is a downward trend, from 73.5% in 1998 to 56% in 2009. Non-response is mainly from refusals to the interviewer. There is also attrition in Waves 2 - 5. Non-response: Item non-response is experienced at earnings questions; there is under-reporting of education qualifications; proxy response bias exists; but there is no bias in employment data. Accuracy and imputation: Sampling errors are calculated; imputation methods are used - roll-forward and nearest neighbour; roll-forward biases growth estimates; some discontinuities have been experienced, e.g. when new Standard Industrial Classification introduced or changes to questions. Data collection: Wave 1 is face-to-face (CAPI); Waves 2-5 are (predominantly) telephone (CATI), using dependent interviewing. Proxy interviews account for a third of data. Average interview length: 28 minutes per person. Sensitive questions include sexual identity; income; and health. Show cards are used only for sexual identity. Computerised features: Look-up tables are used for industry and occupation coding; and there is a coding frame for town/city. Hard and soft errors are programmed. Outputs include various person-level and household-level data files on a quarterly or annual basis. Data users: Many government departments (e.g. HM Treasury, devolved administrations), Bank of England, Eurostat; NGOs, academics, the wider research community, etc. Uses of the data: Monitoring; evaluating policies; designing policies; answering public/media questions; academic research. Dissemination of results: ONS publications (e.g. Labour market statistics; Economic and Labour Market Review); other government department publications (e.g. workplace sickness, health and safety, qualifications); local government and European publications. Media coverage is frequent. 5.2 British Crime Survey Presented by John Flatley, Home Office. • The survey started in 1982 (cross sectional; continuous since 2001/2); it is now restricted to England and Wales (separate surveys in Scotland and Northern Ireland). Mainly funded by the Home Office. • Purpose: To provide estimates of level and trends in crime experienced by the population resident in households. Informs policy development; monitors public opinion; monitors performance (e.g. of the police). • Minimum annual achieved sample: 1,000 interviews in each of the 43 police force areas. 50,000 interviews are achieved annually: 46,000 households/adults (one adult selected in household); plus 4,000 children aged 10-15 (in households taking part in main survey). Sample frame: Postcode Address File. Multi-stage, partially 21 • • • • • • • • • • • • • • • • clustered sample design, varying by area population density (for cost efficiency). Low crime areas are oversampled. Units of observation: Household crimes; adults experiencing personal crimes; and children experiencing personal crimes. Interview mode: Face-to-face (CAPI) with an element of computer-assisted selfinterviewing (CASI, Audio-CASI). Interview length: Adult average: 50 minutes (ranging from 20-25 minutes if no experience of crime to 1.5 - 2 hours if experience); child average: 20 minutes. Response rates: A relatively high level has been maintained – 75% adult. 70% of children respond (in participating households; so the true rate is 53%); unproductives are mainly parental refusals on behalf of children. Questionnaire construction: There are several whole-sample modules; plus other randomly allocated modules; plus self-completion modules (restricted to those aged under 60). Question types/topics: o Factual (e.g. demographics, experience of crime) o Opinions (e.g. views about the police, sentencing policy) o Behaviours (e.g. drinking, use of illicit drugs) o Generally closed questions for (later) coding of incidents to criminal offences are open ended descriptions of criminal incidents experienced (with probing by interviewer) o Crime reference period is the last 12 months; life events calendar to aid recall Question sensitivity: Topics collected by self-completion are on sexual assault, domestic abuse and illicit drugs. Item non-response is not an issue face-to-face; refusal to self-completion modules is 5% plus some item non-response. Proxy data is limited to socio-demographic information about other household members and household crimes. Non-response bias: The response rate is lower in London/inner cities. There is under-representation of adults aged 16-34 and 85+. Mode effects: There is a higher prevalence of crime when measured by selfcompletion, e.g. experience of domestic violence is five times higher from selfcompletion than from interviewer-administered. Online validation/edit checks are programmed. Assignment of criminal offence codes is a large, costly exercise to. No imputation is done. Costs: Total £5 million per year (no demands for savings). Vast majority – 90% comprise field interview costs. Unit cost £100 per achieved case. Field periods/delivery timetable: Each quarter provides a nationally representative sample. The quarterly sample is issued monthly, front loaded in first 2 months. Rolling 12 month datasets are reported quarterly. Data is delivered 6 weeks after end of fieldwork. In-house calibration and weighting is done. Main findings are published 6 weeks later. Important considerations: There is high political and media interest. A culture of resistance to design change and discontinuity in trends exists. Consultation with users is required before significant changes. There is no legislative underpinning of the survey. User /interest groups: Government; police performance management; criminologists; the general public. 22 Some comments were made by participants on the BCS presentation. • The drop in response is higher than shown as these are un-weighted response rates and don’t take into account a previous change in the design. • It was questioned if, when changing the recall period from the previous calendar year to the previous rolling 12 months, there was evidence of any effect on responses when they were run in parallel? • It was questioned if the front loading of fieldwork within the quarter caused issues around the distribution of reference dates. • Also, disappointment was noted that BCS still uses age, sex and region to weight – should this be re-visited? • In reply John commented that BCS is currently working with contractors on the weighting issue. The response rate is currently based on 1 adult. The un-weighted response rates showed a large drop in 2000. The response rate is stable mainly due to the Home Office logo on the advance letter (‘brand recognition’) and because people like talking about crime. A book of stamps is used as a token incentive. The survey is top priority for the contractor. Non-contact is approximately 5%. Reissues are targeted on heavily incentivised trained interviewers. • The US Bureau of Justice Statistics is looking to redesign its crime survey. Projects have been commissioned to look at, for example, the recall period and optimal time period. 6 Survey specific challenges and plans Next, Ed Dunn gave a presentation on the latest thinking within ONS’s Social Survey Division about issues relating to mode. • SSD currently operate a number of mixed mode designs: face-to-face at first interview with telephone interviews in following waves (from the centralised telephone unit); face-to-face at first attempt to gain response with reissue of unsuccessful cases to telephone; face-to-face at first with telephone follow-up; telephone with face-to-face reissue; face-to-face with paper diary. • Paper and pencil interviewing (PAPI) is used on the International Passenger Survey. • Current mode related developments centre upon increasing telephone interviewing on LFS (and other surveys); some LFS initiatives will commence in April 2010. • Introducing web collection is a long term aspiration. A small scale pilot was conducted in 2008. • A number of internet data collection proposals have been developed: o introducing an internet mode (and a web sampling scheme) to the Labour Force Survey; o introducing an internet mode to, or offering an internet-only, Opinions (Omnibus) Survey; and o other ideas including: internet Keep-In-Touch Exercises; an online diary for the expenditure survey; online modules for surveys; and 23 • • • • internet follow-up surveys. Drivers of change on the LFS: the cost of face-to-face LFS and the effort that is needed to maintain response rates. Barriers to change: data continuity, data quality concerns, software/hardware challenges, case management issues and design challenges. Possible options include: o offering web completion in waves 2 – 5 as alternative to telephone; and o running the survey with the normal LFS sample plus an additional internet sample. The second option was briefly outlined: o Phase 1: run the survey with the same normal LFS sample plus an additional internet sample; o Phase 2: reduce the normal LFS sample by replacing it with the internet sample where they overlap; o Phase 3: reduce the normal LFS sample by replacing it with the internet sample where they do not overlap, based on the assumption that the larger internet sample means that, even with a smaller noninternet sample, overall precision is unchanged. Dean Fletcher elaborated on the LFS ideas. • The major issue is that due to the large sample size it seems disproportionate to attempt all interviews face-to-face. • Amending the LFS would be done in phases: o In phase 1 the LFS would be protected, with an additional sample drawn for web surveys. 4-6 weeks before the reference period a sample of 250,000 people would be asked to register on-line whether they would complete by web. In registering, they would give basic demographic and contact details. o 50,000 people might be expected to register on-line (based on approximately 20% response on the ONS Opinions Survey internet pilot). o Emails/text messages could be sent to these responders as reminders. o A random subsample of the remaining 200,000 would be taken, who would be contacted in the normal LFS way. o Those who registered via the web could use the web throughout the 5 waves. o Flaws in the design would be that ineligibles and people who would not take part would be pushed into the non-registered group for the sub-sample (selection bias). The two methods might also introduce mode effects. o The web sample in Wave 1 could be used to test for mode effects. If none were found, then in future phases the web sample could be used to reduce the face-to-face sample. The other participants were invited to comment on the ideas. • It was thought that the assumptions of the proposal need to be tested as they seemed optimistic. • Concern was expressed that the web survey might be a move away from the fundamental principle of stratified sampling with implications for inference. This raised the question of how to weight for nonresponse? 24 • It was thought that 20% registering may be optimistic. • The Census’s assumptions about 25% of response being by internet have been undermined by low rates in the latest Census Rehearsal (although this might be related to the Rehearsal areas). o The LFS proposal is based on the internet pilot results, not Census. • It was questioned how much cost-saving will result. Internet take-up would need to be high in order for it to be cost-effective. • It was asked if internet respondents would share certain characteristics. o Dean agreed this would need to be built into the design. • The idea was considered interesting, but risky, and would be observed with interest should it be pursued. • It was thought that asking people to register and then respond later may drive some people away. • Further concern was expressed about statistical precision and weighting factors (a study by Jelke Bethlehem was mentioned). There may be a reduction in the resulting effective sample size. Is there a correlation between internet access and employment? This could impact on employment estimates and would need to be reflected in the weighting. o Dean suggested that it might be possible to ask a short registration questionnaire for web respondents that would allow weighting to the registering population. • Further support for the idea was expressed within ONS. Charles Lound was asked to analyse the proposal within the QIF project. 7 Scope for alternative/mixed mode designs for GSS surveys Following the introduction to the cases studies and presentation of the SSD ideas, participants ‘brainstormed’ ideas for design changes on the LFS, on BCS and on surveys conducted by other represented departments. 7.1 LFS ideas Suggestions for the LFS included the following. • Conduct large scale web recruitment. • Shorten the LFS questionnaire. To collect the core set of employment variables on the web and administer a longer questionnaire for a smaller sample size face-toface. • Conduct a sub-sample of those registering for a web survey by face-to-face or CATI, to measure mode effects/bias. • The need for Standard Industry Classification (SIC) and Standard Occupation Classification (SOC) was questioned. o Verbatim questions (e.g. for occupation and industry coding) can lead to coder variance. • Administer a sub-sample of the initial sample face-to-face, the rest by web. • Use mail - the cheap option. 25 • • • • • • • • • • • • • • • • • • • • • • • • • Implement a concurrent probability sample - 8000 to mail, 2000 face-to-face. Use Telephone Data Entry on a shorter questionnaire. More integration and better use of IHS core data. It may be less risky to focus on Waves 2-5 and offer the web options then. Collect information face-to-face at Wave 1 and then by web at Waves 2-5 as data on the respondents will already be held. ‘Reverse CATI’: Advance letters could include a freephone number which allows the respondent to provide a telephone number for the Telephone Unit to ring them. Provide incentives to respondents to participate by telephone. But respondents tend not to instigate a reaction – a pilot carried out on English Housing Survey experienced low take-up. Look at web surveys for producing longitudinal estimates. ‘Level and change’: conduct periodic large-scale measurement to benchmark level with interim short term surveys focused on measuring change. Take a ‘statistical efficiency view’: oversample those who are likely to change their circumstances between waves and concentrate fieldwork resources on them. An internet option may pick up only people who are willing to respond anyway – saving on those who would usually be cheaper to recruit anyway, but still a saving. Internet collection allows you to capture those who live abroad as still UK resident if away from the UK for less than a year. People who don’t register on-line could complete by mail. Take a sub-sample of those remaining for telephone and face-to-face. In studies on sequential design it is rare that it has led to higher response. o Too many reminders may put people off. o The cases left for face-to-face ‘clean up’ are the hardest for interviewers to contact or persuade. Use paradata to model, for example, response propensity for areas or high levels of broadband access – different styles of approach may work in different areas. Reserve face-to-face for hard to enumerate areas? A higher response rate may not necessarily lower bias. Interviewers can collect brief doorstep and observational data on non-respondents characteristics (increasingly done in the US). It is possible to take more risks with response in low response areas (e.g. London). There are many advantages starting with face-to-face in Wave 1. The shift from interviewer to self-administered might raise issues of gaining response from all household members (partial-unit data) and increase in proxy response. More use could be made of admin data instead of collecting survey data. While sampling and estimation issues are critical the potential for mode effects should be remembered, particularly as interviewers are able to provide help and to motivate. Long lists of responses, such as on educational qualification questions may cause effects. However the LFS is fast-paced – self-administered modes may allow respondents more time to consider answers. The web could focus on things which are less likely to change – this raises issues of fed-forward data. Marketing and PR of ONS: the general public is largely unaware of the ONS. Instead of focusing on response rates, focus on bias. 26 7.2 BCS ideas Ideas and comments put forward relating to the BCS included the following. • CATI interviews have been attempted on the BCS but were not successful. • A Scottish Crime Survey parallel CATI and CAPI experiment supported this. The telephone mode doubled estimates of crimes involving personal victimisation. The theory was that people who have not been a victim of crime do not take part over the phone. • Consider a panel design. • Train responders to record crime between waves of a panel survey leading to a better recall of events. • Use variable recall periods for different types of crime (regarding seriousness or salience). o The key BCS outputs are measures of ‘change’ and ‘level’ so variable recall periods could make that complicated. • Drop some of the modules to shorten interview length. • Change the mode of child interviews. o But there would be measurement issues with the definition of crime. • Experiment with splitting the questionnaire into modules and having follow-on surveys on-line. • Split subsamples to go further. • Subsample crimes at different rates; focus on those less well recorded (similar to subsampling of health conditions on a US survey). • Assign a proportion of the sample to experiments. • For the existing CASI modules, test whether web or mail would affect the prevalence rates. • Review the importance of different objectives in the sample design. • Look at scope for linkage to police crime records. • Use known police information for adjustments/weighting. • Ask about crimes which are comparable to police records periodically – but this would be politically sensitive and hard to separate. 7.3 General ideas • Consider making surveys compulsory. o ACS testing of mandatory, voluntary and equivocal approaches – similar overall response but at higher cost. o Raises issues of ‘reasonable burden’ and whether to prosecute. • Data linkage – use administrative data more effectively. o Link to driving licence, identity cards etc data. o Use to enhance sampling frames, to adjust for nonresponse bias and for data validation. o The ONS Beyond 2011 project is looking at creation of a ‘population spine’. • Use existing data sources like Acorn to enhance the sampling frame. • Link to Census data for imputation for item non-response. 27 • Shorten interview lengths, although care is needed about the inferences made as there is not really any evidence to infer that increased survey length decreases response rates. • Issues of inclusiveness and diversity (e.g. language, disability: o Access to modes. o Cognitive equivalence. o Coordination across languages. 7.4 Other GSS surveys Representatives from the other government departments were invited to discuss their surveys and the implications of the workshop discussions. 7.4.1 Welsh Assembly Government Cath Roberts outlined the situation for WAG which contracts out two surveys, the Welsh Health Survey and the National Survey for Wales (formerly ‘Living in Wales’). • The Health Survey originated at a time when cost pressures were such that a long CAPI interview was not an option. o A self completion paper questionnaire is used. An interviewer makes contact, completes the household enumeration, delivers the questionnaire and collects it. o Good response rates are achieved, around 70-75% (80% individual completion). o No proxy interviews are allowed. o Constraints due to the mode are that they can’t cover many topics, and can’t probe, or check responses or use complex filters. • The National Survey is CAPI with paper questionnaires on some topics. It has only been in the field for a short period so it was too early for results. • Cath observed that it is not easy to move from CAPI to a paper questionnaire as length must be reduced. In response to a question about whether household non-contacts were left with questionnaires, Cath said no, the interviewer must make contact with the household and talk them through it. There was potential to look into this area. 7.4.2 Department for Transport Abby Sneade talked about the National Travel Survey. • They have been looking into replacing the diary with a GPS device that records movements to collect data. A pilot of 130 people was conducted, respondents both completing a diary and being given a GPS device. o Results showed data often missing from one or other, and data not always matching up. o Using a pedometer was also a consideration (especially for short walk data). o Some respondents complained that the device was bulky. o They are looking at improving the GPS device. o The GPS data may have commercial value. 28 o GPS calculates mode of travel by analysing speed and distance. o The current survey offers a £5 incentive. o The placement interview could potentially be conducted online. Other participants discussed ideas for the NTS. • Web-enabled mobile phones were discussed as an alternative to GPS, although they may be less accurate. • There could be some verification of data at the collection call, going through the GPS results with the respondent. • The GPS could be used to help the respondent fill out a diary. • The need to be careful of respondent burden was identified. • It was suggested that data could be uploaded to the web; respondents could confirm the data and obtain information about their travel (as an incentive for participating). • Online travel diaries could remove the need for checking calls. • Reference was made to a health survey in the US which provided respondents with details of the calories they had burnt as an incentive to take part. • Respondents carrying a GPS may have concerns over government ‘spying’. Abby commented that there were ethical barriers with the GPS method, i.e. the ‘big brother effect’. • There are also possible issues with using mobile phones of data security or damaging respondent’s telephones. • The DfT’s main driver for change was response rates, but respondent burden and reduced checking/coding were also advantages. It was too early to say if the initiative would reduce the cost. • The Living Costs and Food Survey may be able to think about alternatives to its paper diary component. Reference was made to studies by the Institute of Fiscal Studies about using scanners to record expenditure data. • Digital light-pens are to be trialled on the IPS • Developments in mobile internet access were mentioned at a general level. 7.4.3 Scottish Government Alex Stannard reviewed the situation for the Scottish Government’s surveys. • The Crime and Justice survey has a clustered sample of 15,000 households. • The Health Survey has an unclustered sample of 8,000 households. The questionnaire is self-completion. A nurse visits to take measurements. • The House Conditions Survey includes an interview element and a surveyor visit. • The General Practitioner Access survey is postal; the response rate is 50%. • The main driver of change is to bring costs down by 15-20%. • They do not want to alter sample sizes or data collection modes so are looking at cutting length instead. • All surveys are contracted out. There is a limited pool of contractors, so they would like to see more competition to drive down costs. • They are bringing design elements in house to save money, for example, sampling, questionnaire design and report writing. They are also reviewing their procurement processes. • They are looking at unclustering samples as they are not convinced that clustering saves much money in Scotland. 29 It was suggested that the health survey may be able to cut out the nurse visit element by training their interviewers to carry out basic health checks. 7.4.4 Northern Ireland Statistics and Research Agency Kevin Sweeney gave a brief summary of surveys at NISRA. • CAPI is the main mode used. • Most work is commissioned by other departments so costs are their concern; NISRA has not faced pressure. • Funding for the Northern Ireland boost of the Living Costs and Food Survey has been cut. • NISRA is generally trying to improve its methods and results, particularly looking at development and incentives for interviewers. • Their disability survey is mixed mode (CATI/CAPI). o They use a land and property register and are able to match 35% of telephone numbers. o An advance letter is sent, asking the householders to check the matched number or provide an alternative number. o 8-10% of those approached have replied to this letter. o Initial contact is by telephone, then face-to-face. • On the Housing Survey central data on property size and rateable value is available. • IPS has started in Belfast, using pen tablets. • The Northern Ireland Tourist Board Survey has been piloted in CAPI using tablet PCs. • They have been urging departments to consider reducing the complexity of questionnaires, modularisation and rotation as a way forward. Encouragement was felt within ONS about the 8-10% replying to supply a telephone number as this could still save a lot of money on the LFS. 7.4.5 Communities and Local Government Suzanne Cooper raised some points relating to CLG. • The Citizenship Survey has conducted various boosts (ethnic minority; Muslim; Welsh). • The survey (primary mode – CAPI) is looking at the feasibility of introducing a paper questionnaire for young persons. • Although additional funding has been provided, the additional complexity of boosts/different designs has put the survey under pressure. • The contractor pool is too small. • (Note provided after the workshop: as a way of taking forward the above concerns CLG are to conduct a strategic review of the Citizenship Survey, looking at its current and future aims and assessing its objectives, what it delivers and how it operates.) 30 8 Summary of discussions and emerging themes There then followed a review of the comments made during the brainstorming on the two case studies. 8.1 LFS • We should be making it easy to take part for those who are willing. • A free-phone number with advance letter/postcard for respondents to provide telephone numbers seems a good idea. • There were no strong objections to the proposal Ed Dunn and Dean Fletcher had put forward with regard to using the internet, although more detail would be required to assess it fully. • Mode effects are not considered to be a major issue. • The questionnaire would need to be shorter to be on the internet (or paper). • Modularise the LFS. • More evidence should be collated on the areas we are sampling and used to tailor our approach to areas. • We shouldn’t send too many reminders as this can cause a refusal. 8.2 BCS • • • • The core questionnaire requires administration by an interviewer. Some modules could be put on the internet (or paper). A panel approach could show change over time. A possible approach would be to use an interviewer to collect basic information and leave a self-completion paper questionnaire. • A lot more testing and feasibility work is done in the US; the UK needs to do more (as distinct from embedded tests which are usually done here). • We need scientific evidence to be confident about making changes. • Possibly use administrative data to match records. 8.3 Emerging themes The main themes emerging from the workshop were then summarised by Martin Brand. • Shorten/reorganise questionnaires: o Short/long form versions; o Use of a core set of questions; o Modular structure to get detail. • Maximise use of cheaper modes like telephone. • Testing: o Needs funding and willingness to do on an ongoing basis. • Future proofing: o Changes in society; o Changes in technology; o Landline/mobile issue. • Systems capability: o Case management systems is a big issue: 31 • • • • Organisational inertia is a concern. Breaks in time series are a concern: continuity is important. There are no ‘pain-free’ solutions. Different considerations may apply to Great Britain, the UK and regionally. Participants then provided their final reflections to conclude the workshop. • CAPI is widely accepted as the best mode but is it affordable? • Is there now an opportunity to return to the concept of a modular Integrated Household Survey? o surveys/modules with different mode-suitabilities; o exploiting cheaper modes; o reducing risk • Good management systems are needed to allow the switching between modes if a sequential approach is taken. • Consider surveys systems not just data collection: internet and mail are cheap ways to make contact. • Respondent fatigue is a risk if much larger samples are contacted by using cheaper modes. • Consider the potential of using different modes in different areas due to their history, availability and appropriateness. • Sequential modes raise the issue of timeliness. o The Dutch approach to combining data to get around the issue of the time lag when using web collection appears good. • There is not much evidence on mode effects. • Mode effects may cause errors but there will always be such errors present that are largely accepted, like interviewer error. • More testing is needed: every bit of data collection is an opportunity to learn more about the process and about the effects of making changes. • Back-ups and tests are needed if we move to web data collection; if this impacts employment/unemployment figures then we could be in trouble • Transitioning: o is not an easy task; o transparency is needed; o make sure users are aware of changes; o explain the reasons for changes; and o make risks and implications clear. That concluded the workshop. 9 References Betts P and Lound C (2010a), ‘The Application of alternative modes of data collection in UK Government social surveys: A report for the Government Statistical Service’, Office for National Statistics. Available at: http://www.ons.gov.uk/about/who-we-are/our-services/data-collectionmethodology/reports-and-publications/alternative-modes-of-data-collection/index.html Betts P and Lound C (2010b), ‘The Application of alternative modes of data collection in UK Government social surveys: Review of literature and consultation with National Statistical Institutes’, Office for National Statistics. Available at: http://www.ons.gov.uk/about/who-we-are/ourservices/data-collection-methodology/reports-and-publications/alternative-modes-of-datacollection/index.html 32
© Copyright 2026 Paperzz