Report on a workshop held at ONS

The application of alternative modes of data
collection on UK Government social surveys
Report on a workshop held at the Office for National
Statistics, 8 – 9 December 2009
Peter Betts
Office for National Statistics
March 2010
[blank page]
Contents
1
2
3
4
5
6
7
8
9
Introduction............................................................................................................2
1.1
The Quality Improvement Fund Project ........................................................2
1.2
Workshop objectives......................................................................................2
1.3
Workshop participants ...................................................................................3
1.4
Workshop agenda...........................................................................................4
Context setting .......................................................................................................4
Literature review and consultation.........................................................................5
3.1
GSS survey taking traditions and drivers for change to survey designs........5
3.2
Available mode choices, including mixed-mode designs, and their
advantages and disadvantages....................................................................................6
3.3
Sampling, response and estimation................................................................7
3.4
Mode effects on measurement error and approaches to mixed-mode
questionnaire design...................................................................................................8
3.5
Survey transitions.........................................................................................12
Discussion of key issues and information sharing...............................................14
4.1
Reflections on the literature review .............................................................14
4.2
Presentation on recent work at Statistics Netherlands .................................17
4.3
Gaps in the evidence base ............................................................................19
Introduction to the case studies............................................................................20
5.1
Labour Force Survey....................................................................................20
5.2
British Crime Survey ...................................................................................21
Survey specific challenges and plans...................................................................23
Scope for alternative/mixed mode designs for GSS surveys ...............................25
7.1
LFS ideas .....................................................................................................25
7.2
BCS ideas.....................................................................................................27
7.3
General ideas................................................................................................27
7.4
Other GSS surveys.......................................................................................28
7.4.1
Welsh Assembly Government .............................................................28
7.4.2
Department for Transport ....................................................................28
7.4.3
Scottish Government............................................................................29
7.4.4
Northern Ireland Statistics and Research Agency ...............................30
7.4.5
Communities and Local Government ..................................................30
Summary of discussions and emerging themes ...................................................31
8.1
LFS...............................................................................................................31
8.2
BCS ..............................................................................................................31
8.3
Emerging themes .........................................................................................31
References............................................................................................................32
1
1 Introduction
This report is an account of a workshop held at the Office for National Statistics
(ONS) on December 8th and 9th 2009, on the application of alternative data collection
modes on UK government social surveys.
1.1 The Quality Improvement Fund Project
The workshop was held as part of a project under the UK Statistics Authority’s
Quality Improvement Fund (QIF), which is designed to help the Government
Statistical Service (GSS) address matters relating to statistical quality. The project aim
was to consider whether it is feasible to implement alternative data collection designs
by making, for example, greater use of telephone, internet or postal modes. The
project was led jointly by ONS and the Home Office. It arose from considerations
regarding the general decline in social survey response rates and the high cost of
conducting social surveys, particularly by face-to-face interviewing, the predominant
mode used on GSS social surveys. Work was focused on two main areas, modes of
data collection and sampling design and estimation.
The workshop complemented and informed the completion of two other strands of the
project underway at the time: a review of literature on the subjects of
alternative/mixed mode data collection and consultation with National Statistical
Institutes (NSIs) and other research organisations. The overall objectives of the three
strands were to review the current state of play with regard to alternative or mixed
modes of data collection and consider their application on GSS social surveys.
This workshop report is intended to be a straightforward description of the
proceedings. It should be read in conjunction with the two other project outputs. More
information on the project background and the findings from the literature review and
consultation exercise are presented (with full references) in Betts and Lound (2010b).
The third output is a report for the GSS consolidating and reflecting on the findings
from the review/consultation and the workshop to provide a research-based view of
the applicability of new modes of data collection in GSS social surveys including the
likely feasibility and benefits (Betts and Lound, 2010a).
1.2 Workshop objectives
The workshop objectives were:
• To share information about the likely success of alternative or mixed household
survey modes on UK government social surveys, drawing on other countries'
experiences and academic research.
• To achieve some consensus on the likely scope for adopting this way forward in
UK household surveys or what we need to study more.
2
1.3 Workshop participants
The workshop included academic/international expert researchers, and survey
managers, researchers and methodologists from across the GSS, including the
devolved governments in Wales, Scotland and Northern Ireland.
Academic / international researchers:
Dirkjan Beukenhorst
Statistics Netherlands
Mick Couper
Survey Research Centre, University of Michigan
Debbie Griffin
US Census Bureau
Peter Lynn
Institute for Social and Economic Research, University
of Essex
Government Statistical Service:
Dennis Roberts
ONS; Director, Surveys and Administrative Sources
Martin Brand
ONS; Deputy Director, Survey Methodology Division
Charles Lound
ONS; Methodology Consultancy Service
Peter Betts
ONS; Data Collection Methodology - Census and
Social Surveys
Sue Dalton
ONS; Data Collection Methodology - Census and
Social Surveys
Jacqui Jones
ONS; Methodology Directorate
Roeland Beerten
ONS; Social Surveys Division
Ed Dunn
ONS; Social Surveys Division
Dean Fletcher
ONS; Social Surveys Division
Steve Woodland
ONS; Social Data Collection – Field Support and
Control
John Flatley
Home Office; British Crime Survey
Debbie Moon
Home Office; British Crime Survey
Cath Roberts
Welsh Assembly Government; Welsh Health Survey
Kevin Sweeney
Head of Central Survey Unit, Northern Ireland
Statistics and Research Agency
Alex Stannard
Scottish
Government;
Coordinator
Suzanne Cooper
Communities and Local Government; Citizenship
Survey
Abby Sneade
Department for Transport; National Travel Survey
Survey
Harmonisation
Thanks are given to the following members of Social Survey Division who took notes
during the workshop: Tom Anderson, Kathryn Ashton, Colin Hewat and Clare
Watson.
3
1.4 Workshop agenda
The workshop was divided into two substantive parts. Following an introduction and
context setting, the first day focused on the current ‘state of play’ in mode-related
research and application internationally. It included a presentation of the literature
review and consultation exercise, as a work-in-progress, followed by discussion of the
issues raised, and identification of issues and other research evidence not yet covered.
The second day considered the application of alternative modes on UK government
social surveys. Survey specific challenges and plans from around the GSS were
discussed, and practical ideas for the design of surveys were put forward. The Labour
Force Survey (conducted by ONS) and the Home Office’s British Crime Survey were
used as case studies, and issues from other GSS departments were also considered.
The emerging themes arising from the workshop were identified.
2 Context setting
Martin Brand began the workshop by covering the project background, workshop
objectives, as described above.
The context in which the project arose was then set out by Dennis Roberts. The
drivers of the move to consider alternative modes for GSS social surveys were:
• the rising cost of traditional data collection methods: additional visits by
interviewers to sampled addresses are required to make contact and gain
cooperation, and interviews are longer and more complex;
• reducing budgets: for example, in ONS five per cent efficiency savings are
required each year over five years; and
• the continuing need to meet the requirements of survey sponsors, data users and
policy makers in terms of quality of measurement, timeliness and relevance.
The basic survey model, particularly in ONS, had been followed for decades.
Samples of addresses are drawn from the Postcode Address File; and interviews are
primarily conducted face-to-face, with some use of telephone interviewing for
follow-up interviews, but little use made of the internet. This raised questions as to
what is the right way to go forward and what groundwork was needed.
Other participants affirmed that similar drivers applied outside government and in
other countries.
• The American Community Survey and other US surveys face falling response
rates and/or difficulty obtaining a large proportion of response by cheaper mail or
telephone modes, with fewer landline telephone numbers and increased mobile
phone use. More diverse living situations and language needs also increase costs.
Annual data is only released eight months after fieldwork ends.
• At Statistics Netherlands thirty per cent efficiency savings are required, which will
be achieved mostly by reducing face-to-face interviews.
• The research community cannot continue with past models but does not know how
to move forward. There are potentially many solutions; modes may be a partial
one. A diverse approach to data collection is needed, but that undermines
standardisation which underpins efficient collection.
4
• Longitudinal or panel surveys offer efficiencies due to the lower cost of follow-up
interviews and the possibility of mixed mode designs (collecting household and
contact information at the first wave).
3 Literature review and consultation
Peter Betts and Charles Lound presented the findings of their literature review and
consultation exercise, which had been circulated to participants in advance as a workin-progress.
Further detail on the findings and evidence, and all the references, which are not
provided in this account of the presentation, can be found in the literature review and
consultation report (Betts and Lound, 2010b).
The presentation did not include some sections of the report which are included in the
final version. The final version includes additional content, on survey management
and logistical issues, ‘future proofing’ against social and technological trends, and
details of work being conducted in other NSIs and research organisations. It also
incorporates additional references suggested by workshop participants.
3.1 GSS survey taking traditions and drivers for change to
survey designs
• Numerous continuous and ad hoc surveys are conducted across the GSS. Many are
cross-sectional but increasingly longitudinal designs are used (including panels and
follow-ups). Surveys are conducted in-house by ONS and contracted out by other
departments to survey research organisations.
• Data collection modes: GSS surveys are primarily face-to-face; some telephone;
paper exceptions; some mixing of modes; little internet.
• Coverage and sampling: Most surveys are of the general population (UK, GB,
lower geography); utilising probability samples (most use the Postcode Address
File as the frame); all household members are interviewed or a selection within.
• There is some need for comparative measures including: harmonised questions and
outputs; Eurostat/ILO requirements; Equalities Data Review; and over time within
longitudinal exercises.
• The key drivers of change - as previously stated - are twofold.
o Response rates are in decline: people harder to contact or less willing
to participate. Not consistent across GSS – some departments
maintaining rates better than others. Measures to mitigate: interviewer
training, incentives, new modes.
o The high cost of face-to-face interviewing in a climate of efficiency
pressures.
• The literature review and consultation exercise was conducted by searching
journals, research web sites, conference proceedings; reference to text books;
contacting NSIs and other organisations.
5
3.2 Available mode choices, including mixed-mode designs,
and their advantages and disadvantages
• A widening range of mode choices is available. The basic distinction is between
interviewer- and self-administration; but within each are proliferating
computerised/automated methods: including computer-assisted personal
interviewing (CAPI), computer-assisted telephone interviewing (CATI), computerassisted web interviewing (CAWI), computer-assisted self interviewing (CASI),
audio-CASI, interactive voice recognition (IVR), telephone data entry (TDE).
There are more portable computing devices available. Use of paper methods also
continues.
• Factors determining mode choices include access to the survey population
(coverage, sampling frame), costs, time, questionnaire (length, complexity, topic,
burden), and the non-statutory nature of surveys in the UK. Hence in the GSS
interviewers have largely been used due to their value in making contact with
respondents and gaining response.
• Sources of survey error include coverage, nonresponse, measurement and
processing.
• The apparent benefits of alternative or mixed mode designs are cheaper, faster
surveys with better coverage, less bias and higher response.
• The potential problems of alternative or mixed mode designs relate to inferential
and comparability challenges: the need for probability samples; incomplete
coverage and lack of sampling frames for internet and telephone; and measurement
differences by modes.
• Thus trade-offs are needed between costs and quality.
• Mixed-mode design options include the following.
o Mixing data collection only.
o Mixing communication with sample members (e.g. prenotification and
reminder letters, emails, postcards) and data collection. These are
known as ‘mixed-mode systems’. Numerous systems are possible, with
different mode mixes being made within or across i) different samples,
and/or ii) different time periods and/or iii) the different survey stages:
ƒ precontact phase;
ƒ main data collection phase;
ƒ follow-up phase.
o Mixed mode data collection can be concurrent, or sequential (e.g. from
cheapest to most expensive).
• The main decision to be made is whether to:
o mix communication with a single data collection mode:
ƒ a “win-win” situation as it results in better coverage and/or
response rates but with no mode effects (differences in the
responses collected in different modes of collection); or
o to mix data collection modes:
ƒ positive with regard to greater coverage, higher response rates
and lower costs; but
ƒ negative with regard to mode effects and confounds between
the effects of mode, selection bias, nonresponse biases and
time.
6
3.3 Sampling, response and estimation
•
•
•
•
•
•
•
•
The key challenge for internet and telephone data collection relates to
nonobservation and representation: coverage bias, lack of sampling frames and
selection effects.
Internet coverage: There is a ‘digital divide’ – different characteristics of those
with internet access (70% of UK households in 2009) and those without. They
relate to education, affluence, sex, age, ethnicity, economic activity, health,
household size and region. A question remains as to whether bias will reduce as
coverage increases.
Telephone coverage: 7% of UK households are without a fixed landline; mobileonly households are biased towards younger people; and people are increasingly
unlikely to include their number in directories.
Probability sampling must be used in official statistics to avoid self-selection bias.
o There is no suitable sampling frame for internet users or telephone
numbers (landline or mobile).
o But there is scope for using internet and telephone modes within a
probability sampling framework and mixed-mode survey system (e.g. a
postal advance letter including a web link).
o Random Digit Dialling and Dual-frame sampling can be used in the
UK, but with implications for response rates and measurement error.
Interviewers have important roles beyond asking questions and recording answers,
including gaining co-operation; establishing eligibility and sub-sampling within
the household.
Response and nonresponse bias: Nonresponse error relates to response rate and
difference between responders and non-responders. Both of these factors vary by
mode. Biases affect reliability of estimates and generalisability of findings. Web
surveys alone ‘do not offer an antidote’ to the decline in response rates.
Many studies explore response rates and nonresponse bias in single modes and in
mixed-mode survey systems.
o Single mode comparisons show lower response rates by internet than
mail/telephone and telephone rates to be lower than face-to-face.
o Mixed-mode design doesn’t increase overall response.
o In concurrent designs (offering respondents choice of response mode)
other modes tend to be chosen in preference to the internet.
o Sequential mixed mode designs usually progress from cheap to
expensive modes, following-up nonresponders in a new mode. They do
not increase the overall response rate but can obtain a large proportion
of response by cheaper modes.
o Some differences in respondent characteristics have been observed
between face-to-face, telephone, internet and mail responders (with
mixed evidence for comparisons between face to face and telephone).
Miscellaneous findings relating to response rates and nonresponse bias included
the following.
o Mixed mode data collection is more difficult when including all
household members.
o The proportion of response by internet mode is highest when it is the
only option offered first in a sequential design.
7
Telephone reminders to respond in a self-completion mode are not
effective.
o Leaving interviewers to ‘mop-up’ people who are harder to contact or
persuade may have an impact on their motivation.
o Pre-notification is better by post not email, being harder to ignore.
o Web follow-up surveys to interviewer surveys result in significant
accumulated sample loss.
o Interview length is asserted to affect nonresponse, though topic interest
or salience may be positively correlated with length. Premature
termination may be more likely in telephone and internet modes. The
typical length of face-to-face surveys may be a barrier to moving them
to other modes, but more needs to be known about sensitivity to length
in different modes.
o Respondent mode preferences: The literature is inconsistent about
preferences for interviewer- or self-administration.
Weighting and statistical adjustment: ways to adjust for biases – undercoverage,
nonresponse, systematic mode effects – will be required. Possible approaches are
design-based (traditional random sampling) and model-based (opt-in/volunteer
samples where inclusion probabilities are modelled).
o Coverage and non-response bias can be addressed by controlling on
demographic variables; implemented in a GREG-type estimator. This
needs population distributions.
o Alternative approaches include propensity score variables – sociodemographic, behavioural or attitudinal (these have been described as
webographic information); this reduces bias; but also reduces the
effective sample size.
A break in a time series can be estimated by embedded experiment, parallel run or
time series methods to evaluate change. The estimated break can be used to
backcast or forecast results.
o
•
•
3.4 Mode effects on measurement error and approaches to
mixed-mode questionnaire design
•
•
•
Mode effects have been defined as “The effect that using a specific mode has
on the responses that are obtained in that mode” (International Handbook of
Survey Methodology).
The interactions between features of modes, questions and respondents are
complex but involve three main factors:
o media related factors: social norms, and the ‘locus of control’ over the
survey environment (which lies with interviewer or respondent);
o the information transmission or communication channel: how the
cognitive stimulus is presented and received – visual/aural; nonverbal
cues, paralanguage (verbal emphasis; visual presentation); and
o interviewer effects: how the presence or absence of an interviewer
affects privacy, motivation to take part, legitimacy of the survey and
assistance that can be provided.
There are further psychological effects co-varying with mode:
o the respondent’s comfort and the cognitive effort required or possible;
8
the cognitive response process (comprehension; retrieval of
information; judgement as to how it maps onto the question format,
and provision - or potentially editing or withholding - of a response);
and
o interactions of mode with question type, task difficulty, respondent
ability and respondent motivation.
These can all affect the amount and nature of measurement error.
o
•
• The first main type of mode effect is ‘satisficing’, where a respondent’s response
processing is not thorough (weak satisficing - responses given will be merely
satisfactory) or where stages of the process are skipped (strong satisficing –
responses may be arbitrary). Satisficing relates to:
o the task difficulty or cognitive burden, which varies by topic and
question type, complexity or difficulty;
o respondent ability; and
o respondent motivation.
• The different types of satisficing are as follow.
o Acquiescence: tendency to agree with assertions – less effort than to
disagree.
o Provision of no opinion and don’t know responses.
o Non-differentiation: rating a series of items to same scale point.
o Random selection of response category.
o Selection of the middle/neutral point of a scale.
o Extremeness: choosing extreme points on attitude scales.
o Response order effects: the effect of the order in which response
options are presented. ‘Primacy’ is the tendency to select items near
start of list when visually presented. ‘Recency’ is the tendency to select
from end when read aloud only. It is more likely when using long lists
or when respondents have lower education, comprehension problems
or are older. Unordered lists are more susceptible than ordinal.
• Research predicts satisficing to be greater:
o in self-administration than interviewer administered mode;
o in telephone than face-to-face; and
o in internet than postal mode.
• The evidence from the literature reviewed broadly supported the theory but was
not overwhelming and sometimes inconsistent.
•
The second main type of mode effect is social desirability bias, where the true
answer is not reported by the respondent in order to portray him/herself
favourably or to meet perceived researcher expectation.
o It is associated with questions on sensitive behaviours, attitudes and
health and the actual or perceived privacy or anonymity of the survey.
o It is greater in interviewer-administration than self-administration and
appears to be greater by telephone than face-to-face.
o There is debate relating to:
ƒ whether it is a characteristic of question or a personality trait;
ƒ what the desired values/norms are; and
ƒ whether it is subject to conscious control or relates to
shortcutting, recall error or self-deception.
9
•
Evidence from the literature supported the hypothesis: social desirability bias was
reported to be higher in interviewer- than self-administered modes; and in
telephone than face-to-face. But there were exceptions. Survey topics in which it
was found included health status, behaviour and problems; mistrust in
government; alcohol consumption and drug use; employment and television
watching.
•
Other measurement differences are found which it is not possible to attribute to
satisficing, social desirability bias or some other factor. They include item
nonresponse being higher in self- than interviewer-administered modes, and
higher in paper than computerised self-administration.
A number of confounding factors in mode studies have been identified.
o Confounding mode with sampling and selection. There are two types
of mode study:
ƒ comparisons of unique survey systems (with respect to their
coverage, sample frame and design, nonresponse bias and
question design) – here it is not possible to generalise or predict
to other settings; and
ƒ experimental designs which isolate mode from selection and
nonresponse biases, controlling for survey characteristics and
using random assignment of cases to mode.
o Confounding mode with questionnaire construction, including the
following.
ƒ Question format effects: the tendency to construct questions
and use design conventions differently in different modes,
which can change the stimulus. For example, when asking a
series of items using forced-choice (yes, no) or ‘check all that
apply’. The latter tends to be used in visual presentation, but
forced-choice has been found to result in better data in all
modes.
ƒ Context and order effects: the location or order in which
questions are presented can cause measurement differences.
They can relate to mode in that in paper questionnaires it is
possible for the respondent to see previous and subsequent
questions, whereas in interviewer-administered modes the order
is controlled. In internet surveys, either scenario is possible
depending on how the instrument is programmed.
o Recall/memory effects, relating to the elapsed time between behaviour
of interest and being asked about it varying by mode; or because use of
recall aids differed between modes.
o Interviewer standardisation: different practices may exist between
face-to-face and telephone interviewers, such as their freedom to repeat
or alter question wording or explain meaning.
o Visual layout: additional meaning is given to self-administered
questions by paralanguage (e.g. layout, symbols, graphics, fonts, scale
labels) which needs to be considered when translating questions across
modes.
•
•
Assessing the magnitude, direction and impact of mode effects were identified in
the literature as challenges. Some techniques to deal with them were proposed.
10
o
o
o
o
o
o
•
•
•
•
Tests need to control for respondent characteristics, if differential
nonresponse results in sample composition differing by mode weighting/correction factors are needed.
‘Empirically based adjustment’ can be informed by embedded experiments
in longitudinal surveys.
Regression analysis can eliminate factors including sample characteristics;
if mode remains a significant factor it can be regarded as a mode effect.
The magnitude of differences needs to be reported.
Significant differences between estimates may not matter; what is
important is the effect on substantive interpretation, for example, of the
relationship between variables.
Assessing which mode obtains ‘better’ quality data can be done by internal
and external validation and with prior knowledge of the expected
direction.
In conclusion, there exists a ‘dichotomy’ between interviewer modes and selfcompletion, but no mode is superior all round. Internet mode appears to be more
like mail than telephone but comparisons are unclear. More controlled
experiments are needed.
The implications of the findings are that optimum design has to weigh up the
advantages/disadvantages of mixing modes:
o lower costs, better coverage and higher response rates; against
o measurement
errors
compounded
with
differential
coverage/nonresponse error.
This raises the question of whether instruments which are insensitive to mode can
be developed.
There are various approaches to designing question(naires) for mixed mode data
collection. Widely referenced is Don Dillman’s ‘unimode construction’ approach,
with its nine principles:
o Make all response options the same across modes and incorporate them
into the stem of the survey question.
o Avoid inadvertently changing the basic question structure across
modes in ways that change the stimulus.
o Reduce the number of categories to achieve mode similarity.
o Use the same descriptive labels for response categories instead of
depending upon people’s vision to convey the nature of a scale
concept.
o If several items must be ranked, precede the ranking question with a
rating question.
o Develop equivalent instructions for skip patterns that are determined
by answers to several widely separated items.
o Avoid question structures that unfold.
o Reverse the order in which categories are listed in half the
questionnaires.
o Evaluate interviewer instructions carefully for unintended response
effects and consider their use for other modes.
11
•
Recently, however, these principles have been replaced by discussion of three
different approaches, in recognition of there being no single solution and that
there are different ways of combining modes and reasons for doing so.
o Unified mode construction: questions are written and presented the
same (or nearly) to ensure common mental stimulus.
o Mode-specific construction: structure, wording, presentation of
questions are modified to suit particular mode capabilities, to achieve
the same stimulus.
o Mode-enhancement construction: when one mode is primary, others
auxiliary (and equivalence is less important), mode-specific features
are used to improve quality in the primary mode.
•
A good example of the practical application of mixed-mode design principles was
provided by US Census Bureau.
o They follow a principle of ‘universal presentation’ (similar to modespecific construction):
ƒ “All respondents should be presented with the same question
and response categories, regardless of mode. That is, the
meaning and intent of the question and response options must
be consistent. The goal is that instruments collect equivalent
information regardless of mode. By equivalent, we mean that
the same respondent would give the same substantive answer to
a question regardless of the modes of administration.”
o Thirty guidelines have been written relating to various design
considerations.
o Some issues the Bureau have had to address include:
ƒ that the principle can’t be applied mechanically, only followed
as a whole, in a realistic, flexible and context-dependent way;
ƒ that application will be long term; and
ƒ that further testing and ongoing modification are needed.
•
In summary there are different approaches to mixed mode question design, with
varying terms and definitions. Decisions may depend on whether:
o there is one main data collection mode with auxiliary modes (in which
case use mode-enhancement construction), or
o each mode is equally important (in which case use generalised/modespecific construction).
There remain uncertainties: further research is needed into what constitutes
‘equivalent stimulus’ in different modes; it may be an ‘illusory goal’; a pragmatic
approach may be needed, designing on a question by question basis, adjusted for
the survey context and needs.
•
3.5 Survey transitions
•
General advice and actual experience relating to making mode transitions was
found in the literature.
o Clear communication with users from the planning stage on the key
variables and the consequences of discontinuity is necessary.
o Guidelines for a smooth transition include:
12
ƒ
ƒ
ƒ
ƒ
ƒ
set up a mechanism to estimate change for continuous time series;
document development;
pilot new methods prior to estimating change;
estimate the effects; and
implement and document the change.
o Gradual and well-managed implementation of new mixed mode design is
planned at Statistics Netherlands. The need for unbroken time series in
official statistics is being addressed by embedded experiments and pilots to
estimate required adjustments. A ‘research on the run’ approach will adjust
project plans in light of experiences. Mode effects will be reduced by
piecemeal engineering.
•
Experimental design is covered in the literature, such as embedding experiments
and eight characteristics for studies on satisficing and social desirability effects:
o One group should be interviewed by one mode, a different group by an
alternative mode (to avoid order and practice effects).
o Respondents in both modes should be representative samples of the same
population.
o Respondents should be interviewed in the assigned mode (with no
reassignment because refusing in one mode would confound comparison).
o Respondents should be interviewed individually in both surveys (not in
groups e.g. all household).
o Respondents should not choose mode - self-selection could lead other
factors to be confounded with mode.
o Respondents should not have been interviewed previously about similar
issues (practice effects might distort comparison).
o The questions to gauge satisficing/social desirability should be asked
identically and in identical contexts (number, content and sequence of
prior questions).
o Comparisons across modes should be subjected to tests of statistical
significance.
•
Issues to be researched are identified or proposed in several papers reviewed.
They include the following.
o The need to update knowledge with comparative studies and meta-analysis
of new modes.
o More research is needed on optimal mixes in mixed contact strategies to
combat nonresponse, preferably including other indicators such as bias
reduction and costs.
o Very little is known about optimal questionnaire design for mixed mode
data collection (unimode or universal). Empirical research is needed to
estimate what constitutes the same stimulus across different modes.
o Research is needed on adjustment/calibration strategies for mode mixes.
o The choice of mode mix should be explicit in the methodological section
of articles/reports.
o More mixed mode experiments are needed which include web.
o Investigate whether training interviewers (especially telephone) to pause
between the question and response options would reduce magnitude of the
mode effects.
13
It is important to investigate social desirability – are researchers’
assumptions consistent with respondent views. And whether social
desirability is a cause of mode effects found in telephone and face-to-face
estimates from LFS (e.g. lower unemployment rate by telephone).
o It is not always possible to use unimode design (e.g. no show cards in
CATI, so it is advisable not to have lengthy response option lists. When is
it preferable to deviate from unimode, and how to design questions (e.g. if
introducing self-completion mode, to use five-point scale used in PAPI or
two-step question used in CATI?
o Different cognitive burden in different modes implies the need to develop
optimal design for modes of interest so burden is comparable (e.g. rating
scales in self-completion presented as list of questions or as a grid.)
o Differential Item Functioning: same question functions differently under
different groups (e.g. minority, gender).
o
That concluded the presentation.
Comments by and discussion among participants during the presentation included the
following.
• Response rates across the GSS appeared to have held up better than ONS surveys.
• Attempting to cover mobile phone-only households is very expensive.
• Moving sequentially from cheaper to more expensive modes seems to be best.
• There is evidence that providing a choice of modes concurrently decreases
response rates. Respondents tend to wait for the alternative mode to be sent, with
no evidence that the ‘threat’ of being contacted again prompts response.
• There is a ‘paradox of choice’: providing respondents with choice of response
mode complicates matters.
• Paper mode is easier for respondents than internet, where security requirements
such as the need to log in to the survey website, register and enter a password sets
up barriers to participation and is prone to keying error.
4 Discussion of key issues and information sharing
The next session reflected on the literature review presentation and raised further
questions. Experiences were shared by participants and gaps in the evidence base
were identified.
4.1 Reflections on the literature review
The following topics were discussed.
• Why not use mail surveys in UK Official Statistics?
o Barriers were thought to include the length and complexity of instruments.
o Questionnaires could be broken down: modularisation (and matrix
sampling) and a short form/long form approach could be considered.
o Technical issues of validity/consistency checks and rotated data not being
possible were identified.
o Extended fieldwork/processing had implications for the timeliness of data
delivery (for example, required monthly on LFS).
14
•
•
•
•
•
o The American Community Survey (mandatory) includes census-type
demographic questions (household and person level) in a sequential
mixed-mode design, moving from least to most expensive: mail ($10 per
form), computer-assisted telephone interviewing (CATI) ($18 per
interview) and computer-assisted personal interviewing (CAPI) ($140 per
interview). CAPI accounts for 90% of costs. Data on mail returned forms
are reviewed with follow up by telephone.
o The example was given of the unsuccessful German LFS use of mail. They
tried to capture person level data but usually one person responded for
everyone. There were problems with item non-response, failure to follow
skip patterns and error. They are now looking at multiple questionnaire
forms for each household – but at how much reduction in cost? They have
different requirements around timeliness of data (quarterly outputs).
o The suggestion was made to conduct a proportion of the sample by CAPI,
with a larger proportion by mail. This would address timeliness by
producing initial estimates, with final estimates once all fieldwork
complete.
o The Welsh Health Survey uses paper but interviewers make the initial
contact to place the questionnaire and return to pick it up. Good
recruitment and response rates are achieved.
o The Place Survey (commissioned separately by circa 350 Local
Authorities) uses mail, leading to variable response rates (40% to 80%). A
web survey supplement is being considered.
- It was thought that the variation in response across the Place
Survey raised questions of inference.
o The England and Wales Youth Cohort Study includes a choice between
mail and web: only 2% used web. The paper questionnaire is around 12
pages of A4; the estimated response rate 50% (declining).
Scepticism was expressed that very long complex surveys can be easily changed
into shorter self-completion instruments.
However, mode is more than just the data collection medium – it also refers to
the sample and respondent communication. So we should talk about systems, not
just data collection. Address based sample frames do lend themselves well to
mixed mode approaches. A suggested approach for web surveys was to recruit
panels with the instrument split into 20 minute chunks and asked every 6 months.
It was emphasised that compromise is needed: it is not just a decision about
which mode(s) to use but more strategic. What do we give up? The example was
given of the Health and Retirement survey, lasting 3 hours, including physical
measurements. Who would do this online? We need to separate the technical
feasibility of online surveys from the strategic decision to move to this mode. Are
we trying to get too much from respondents?
There are more modes coming, e.g. mobile computing: we need to look 15-20
years ahead.
Why not vary the length of questionnaires by mode on a single survey? Collect
core variables across modes, those requiring the most precision in the estimates.
Those requiring less precision could be asked in a single mode, or to a
subsample.
o One-off modules and rotating modules for subsamples are likely; using
matrix sampling. In the context of matrix sampling the example was given
of the US Consumer Expenditure Survey.
15
•
•
•
•
•
•
•
Addresses are a stable way of drawing and contacting a survey sample: they
aren’t going to go away or develop like some forms of technology-based contact.
What is the primary driver, cost or response (quality)?
o Both. Trade-offs will vary and depend on the survey and which
dimensions of quality are more important.
o There was a suggestion to identify the design issues that address cost and
quality; some will only address one of the drivers.
Why not just go ahead and combine modes?
o The mode effects literature can predict to some extent what will happen
when change modes (and the magnitude). It is very important to consider
using embedded random selection of mode for some of the sample to
attempt to measure the extent of the mode effect. Control for mode and
selection effects for example by making respondents with a preference for
one mode complete by another mode – you can then create adjustment
factors.
o Mode effects are often confused with self-selection of mode. Using
random selection is a very useful tool for measuring the mode effect.
o A US crime survey found that CAPI and CATI modes can produce very
different estimates for some questions, quite close for other questions.
o A model to split the LFS sample was suggested: a quarter by CAPI and
three quarters by mail.
o It was suggested that primary sampling unit-based selection could be used
for the most expensive mode of collection and another selection method
for other modes.
o A general consensus holds that there is not as much difference in data
between CAPI and CATI (interviewer administered modes) as there is
between interviewer administered modes and self administered modes.
For example, a survey on aging recruited a panel of respondents aged 50+,
with those aged 70+ surveyed face-to-face and 50-70 year olds surveyed
by CATI. There was a group aged 68-72 that were randomly assigned to
mode – found very little mode effect was found in the data.
What about conducting a CAPI interview via Skype (or a similar mode of
interaction)?
It will be important to understand the transition when switching mode of
collection. An issue can be in educating users about the impact of a change in the
design, and not underestimating how much time this would take – it should be
planned for.
o It was thought that users are less worried about mode effects than
researchers or methodologists. Datasets deposited at the UK data archive
typically did not contain a flag for mode of collection. Eurostat only
specifies common outputs and does not require common modes.
o Changes might cause problems in time series.
Probability samples are important because they allow control of mode effects.
Is it possible to cut back the Integrated Household Survey core? To take a similar
approach with LFS and define a core set of variables to ask across modes?
Headline figures are often based on a small subset of questions. Core data can be
collected by cheaper modes.
o The difficulty with this is that most of the LFS data tends to be used – it is
difficult to cut questions.
16
•
The question of compulsion to participate in surveys was raised. To compensate a
survey would be shorter but with a more frequent and bigger cut.
4.2 Presentation on recent work at Statistics Netherlands
At this point Dirkjan Beukenhorst gave a presentation on ‘The new design of the LFS
and research on the fringe’.
• The current Dutch LFS is similar to the UK: a quarterly rotating panel design
with 5 waves. First wave is by CAPI, all later waves are CATI. Dependent
interviewing is used and proxy response is allowed.
• There is a problem of panel attrition caused mainly by people not providing their
telephone number, which causes bias.
• The quality of proxy response is debatable especially for more subjective
questions like willingness to work.
• CAPI collection at the first wave is expensive but it is not possible to just switch
to CATI (even though they can match a telephone number for 70% addresses in
the Netherlands).
• A two-phase redesign is being introduced:
o Phase 1 (2010): a new questionnaire adapted to CATI and web modes will
be used. CATI will be used at Wave 1 when a telephone number is
known. The Wave 1 questionnaire has been shortened; with some items
pushed to the follow up waves. A ‘basic’ questionnaire will measure core
demographic variables like employment status on all surveys.
o Phase 2 (2011): the first Wave is to be collected by the web as much as
possible, then by CATI (if a number is known) then by CAPI. For each
case data collection in follow-up waves will be in same mode as at Wave
1 (except there will be no CAPI).
o There is increased risk of panel attrition.
• The move from the cheapest mode to most expensive will be used on all surveys.
• Sample sizes will be reduced and time series modelling used to produce precise
estimates. Population register information will be used in the models to try to
increase precision for small domains.
• The old and new systems will run in parallel for 6 months in 2010 to try to
measure breaks in the series and allow for adjustments. There will be no other
experiments.
Dirkjan’s presentation then digressed into observations on the “sociology of
innovation” in NSIs:
• Three roles were identified: management, subject matter specialist and
methodologist.
o Management is responsible for budgets, production and efficiency.
o The subject matter specialist is concerned about time series, work routine
and tradition.
o The methodologist is interested in quality, innovation, estimation
problems and data collection problems.
• The typical innovation process:
o Phase 1: management wants efficiency and rapid change; the subject
matter specialist is against change and wants to slow down change; and
17
the methodologist likes the challenge and wants to conduct scientific
experiments. Initiative lies with methodologists and ‘outsiders’.
o Phase 2: management intensifies the pressure; the subject matter
department gives in; the methodologist devises experiments. Initiative lies
with the subject matter department and the methodologist. In the end there
is no time for experiments, instead parallel-running is conducted.
His talk then continued with a look at other relevant research.
• Health Interview Survey experiment: after their first interview, respondents were
asked to participate in another survey preferably by web, or CATI if not. 92%
agreed, 77% of whom responded. This was equivalent to 44% of the original
sample. This was not very efficient and was biased due to non-response bias in the
original HIS survey.
• LISS panel: a web panel, as representative of the population as possible, was used
to compare results of the ‘basic’ questionnaire with existing measures of
employment. One subsample of the panel was randomly allocated to CATI, the
other to web. Both were compared to the basic questionnaire executed in CAPI.
o Response rates were high on the web and CATI panel (80%), CAPI
was lower (65%).
o The estimates of the employment rate under CAPI were 6-7%
different to those collected on the panel. Some of the differences
appeared to be due to question design.
o There was also a big difference between some of the web and CATI
estimates (% in part-time employment). Given that these subsamples
have the same original bias, this is disturbing.
• Lots of questions remain unanswered, particularly what will happen post-Wave 1.
Participants then discussed the presentation and further issues.
•
•
•
•
•
On the LISS panel internet access was given to those who didn’t have it, in the
form of a SimPc with internet access. Broadband was installed. This is expensive
– the panel costs around $3 million a year.
o National Centre for Social Research is providing mobile phones for
respondents to report periodically a record keeping diary.
The recruitment stage – coverage and gaining cooperation - is the biggest
problem.
Internet instead of CATI option is a possibility for LFS waves 2 to 5.
o University of Michigan are currently experimenting on this sort of
model.
Mode doesn’t really influence the estimates much once the effect of weighting has
been taken into account.
It was questioned whether the Dutch LFS model had addressed timeliness
concerns.
o This is addressed by combining cases from different modes from
different fieldwork months.
o There is an issue of what the reference date means if you are
combining cases from different fieldwork stages of a sequential
design.
18
ƒ
•
For the Dutch LFS the reference period is the date of the
interview. As long as the subsamples from the various modes
represent a monthly sample then this is not a problem.
One issue would be any commissioned research, e.g. boosts, that altered the
balance between the modes.
4.3 Gaps in the evidence base
The question was posed: can we identify any gaps in the literature review?
• The timeliness issue. When is the optimum time to switch modes in sequential
designs with web as first mode?
o Statistics Netherlands tend to have 1 reminder (maximum 2) for the
web survey and then switch. The fieldwork period in that mode is
short as web responders tend to respond quite quickly.
• With mail the need to process the data must to be taken into account.
• The issue of interviewing everyone in a household. On Understanding Society
(UK Longitudinal Household Study) they have been investigating different
switching points between CATI and CAPI:
o a) when one person in the household asks for CAPI
o b) when you can get no further interviews via CATI
o There was no difference in response but they don’t know about effect
on costs yet.
• A gap in the literature is with regard to cost profiles and optimal strategies.
• With clustered designs there is often a complicated management decision to be
made about when to switch modes, e.g. what do you do if you only have 1 address
left in the quota and the interviewer is not local?
o Has any work done in Australia about this?
ƒ ABS now does very little face to face interviewing.
o The same applies in Canada – more use is made of CATI with CAPI
follow up.
o Could modes be targeted to, for example, geographical areas?
ƒ This is done in Finland. CAPI only used around Helsinki,
everything else is done by CATI.
o A survey of welfare recipients is conducted mostly by CATI with a
smaller localised face-to-face component. Interviewers had mobile
phones and screened respondents, provided them with mobiles and
conducted the interview by CATI. This was thought to be a good
approach for sampling that requires screening.
• Costings need to cover the whole statistical value chain, including case
management, editing and weighting.
• We need to be considering communication with respondents – an ‘Achilles heel’.
An ONS project is being conducted. What is the best mode or method for value in
terms of achieving response?
o There is an under-emphasis on paper communications. Simplicity can
be best. In US ‘get well soon cards’ were tried during a swine flu
outbreak – anecdotally it seemed to result in interviews at a later date
(although not experimental evidence).
• It was thought that we need to acknowledge respondent preference for mode –
respondent-centred surveying.
19
•
o Statistics Finland recently radically reviewed their survey materials
from the perspective of respondent experience.
o Reference was made to a follow up survey in Sweden that asked about
the mode choice. It resulted in words and phrases to incorporate into
materials.
o Caution against taking action on what people say in focus groups was
advised – often they will favour what looks best, but in practice plain
leaflets etc get the best response.
The issue of use of administrative data instead of collecting it in survey format
was raised: the integration of survey and admin data.
o Admin data tends to only be used to validate survey data but there is
an argument for using it more in the survey process or as a
replacement for survey data where it is of sufficient quality.
o It was felt that things have to change in this respect. There are issues
of data access but we have to integrate data from different sources.
Trade-offs need to be faced up to in government, as do legislative and
ethical issues.
o Admin data about income and benefits is held centrally in the
Netherlands which means they don’t have to ask about it on their
surveys.
o There is no one size fits all approach. Item non-response is often
similar across modes once aggregated but not for individual questions.
Admin data could fill a gap there.
5 Introduction to the case studies
Two surveys had been selected as case studies on which the workshop would focus,
the Labour Force Survey (LFS) and the British Crime Survey (BCS). Presentations
were given to describe the survey designs and provide other contextual information
for participants to bear in mind when discussing the potential for alternative/mixedmode designs.
5.1 Labour Force Survey
Presented by Dean Fletcher (Social Surveys Division, ONS).
• LFS is a panel survey of the employment circumstances of the UK population
(private households). It has been conducted since 1973 (annual survey since 1984;
calendar quarter basis since 2006).
• Cost: £14 million per annum.
• It includes 500 questions and 200 derived variables on: job industries and
occupations; health; transport to work; education and training; benefits; hours
worked; earnings; household characteristics; employment patterns; and sickness.
• Sampling frame: Postcode Address File (PAF). 80,000 households sampled per
quarter (systematic random stratified sample).
• 50,000 households (120,000 individuals) take part each quarter. Each case
weighted by age, sex, geography; each interview equates to 500 people.
• Wave structure: LFS: 5 waves, each 3 months apart. Also boost samples in
England, Scotland and Wales (Local Labour Force Survey, LLFS) to allow
20
•
•
•
•
•
•
•
•
•
•
•
analysis at local authority level – 4 annual waves. Annual Population Survey
(APS) combines LFS waves 1 and 5 and all LLFS interviews in a year – 160,000
unique households (360,000 unique individuals).
Response rate: There is a downward trend, from 73.5% in 1998 to 56% in 2009.
Non-response is mainly from refusals to the interviewer. There is also attrition in
Waves 2 - 5.
Non-response: Item non-response is experienced at earnings questions; there is
under-reporting of education qualifications; proxy response bias exists; but there is
no bias in employment data.
Accuracy and imputation: Sampling errors are calculated; imputation methods are
used - roll-forward and nearest neighbour; roll-forward biases growth estimates;
some discontinuities have been experienced, e.g. when new Standard Industrial
Classification introduced or changes to questions.
Data collection: Wave 1 is face-to-face (CAPI); Waves 2-5 are (predominantly)
telephone (CATI), using dependent interviewing. Proxy interviews account for a
third of data. Average interview length: 28 minutes per person.
Sensitive questions include sexual identity; income; and health. Show cards are
used only for sexual identity.
Computerised features: Look-up tables are used for industry and occupation
coding; and there is a coding frame for town/city. Hard and soft errors are
programmed.
Outputs include various person-level and household-level data files on a quarterly
or annual basis.
Data users: Many government departments (e.g. HM Treasury, devolved
administrations), Bank of England, Eurostat; NGOs, academics, the wider research
community, etc.
Uses of the data: Monitoring; evaluating policies; designing policies; answering
public/media questions; academic research.
Dissemination of results: ONS publications (e.g. Labour market statistics;
Economic and Labour Market Review); other government department publications
(e.g. workplace sickness, health and safety, qualifications); local government and
European publications.
Media coverage is frequent.
5.2 British Crime Survey
Presented by John Flatley, Home Office.
• The survey started in 1982 (cross sectional; continuous since 2001/2); it is now
restricted to England and Wales (separate surveys in Scotland and Northern
Ireland). Mainly funded by the Home Office.
• Purpose: To provide estimates of level and trends in crime experienced by the
population resident in households. Informs policy development; monitors public
opinion; monitors performance (e.g. of the police).
• Minimum annual achieved sample: 1,000 interviews in each of the 43 police force
areas. 50,000 interviews are achieved annually: 46,000 households/adults (one
adult selected in household); plus 4,000 children aged 10-15 (in households taking
part in main survey). Sample frame: Postcode Address File. Multi-stage, partially
21
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
clustered sample design, varying by area population density (for cost efficiency).
Low crime areas are oversampled.
Units of observation: Household crimes; adults experiencing personal crimes; and
children experiencing personal crimes.
Interview mode: Face-to-face (CAPI) with an element of computer-assisted selfinterviewing (CASI, Audio-CASI).
Interview length: Adult average: 50 minutes (ranging from 20-25 minutes if no
experience of crime to 1.5 - 2 hours if experience); child average: 20 minutes.
Response rates: A relatively high level has been maintained – 75% adult. 70% of
children respond (in participating households; so the true rate is 53%);
unproductives are mainly parental refusals on behalf of children.
Questionnaire construction: There are several whole-sample modules; plus other
randomly allocated modules; plus self-completion modules (restricted to those
aged under 60).
Question types/topics:
o Factual (e.g. demographics, experience of crime)
o Opinions (e.g. views about the police, sentencing policy)
o Behaviours (e.g. drinking, use of illicit drugs)
o Generally closed questions for (later) coding of incidents to criminal
offences are open ended descriptions of criminal incidents experienced
(with probing by interviewer)
o Crime reference period is the last 12 months; life events calendar to aid
recall
Question sensitivity: Topics collected by self-completion are on sexual assault,
domestic abuse and illicit drugs.
Item non-response is not an issue face-to-face; refusal to self-completion modules
is 5% plus some item non-response.
Proxy data is limited to socio-demographic information about other household
members and household crimes.
Non-response bias: The response rate is lower in London/inner cities. There is
under-representation of adults aged 16-34 and 85+.
Mode effects: There is a higher prevalence of crime when measured by selfcompletion, e.g. experience of domestic violence is five times higher from selfcompletion than from interviewer-administered.
Online validation/edit checks are programmed. Assignment of criminal offence
codes is a large, costly exercise to. No imputation is done.
Costs: Total £5 million per year (no demands for savings). Vast majority – 90% comprise field interview costs. Unit cost £100 per achieved case.
Field periods/delivery timetable: Each quarter provides a nationally representative
sample. The quarterly sample is issued monthly, front loaded in first 2 months.
Rolling 12 month datasets are reported quarterly. Data is delivered 6 weeks after
end of fieldwork. In-house calibration and weighting is done. Main findings are
published 6 weeks later.
Important considerations: There is high political and media interest. A culture of
resistance to design change and discontinuity in trends exists. Consultation with
users is required before significant changes. There is no legislative underpinning of
the survey.
User /interest groups: Government; police performance management;
criminologists; the general public.
22
Some comments were made by participants on the BCS presentation.
• The drop in response is higher than shown as these are un-weighted response rates
and don’t take into account a previous change in the design.
• It was questioned if, when changing the recall period from the previous calendar
year to the previous rolling 12 months, there was evidence of any effect on
responses when they were run in parallel?
• It was questioned if the front loading of fieldwork within the quarter caused issues
around the distribution of reference dates.
• Also, disappointment was noted that BCS still uses age, sex and region to weight –
should this be re-visited?
• In reply John commented that BCS is currently working with contractors on the
weighting issue. The response rate is currently based on 1 adult. The un-weighted
response rates showed a large drop in 2000. The response rate is stable mainly due
to the Home Office logo on the advance letter (‘brand recognition’) and because
people like talking about crime. A book of stamps is used as a token incentive. The
survey is top priority for the contractor. Non-contact is approximately 5%.
Reissues are targeted on heavily incentivised trained interviewers.
• The US Bureau of Justice Statistics is looking to redesign its crime survey. Projects
have been commissioned to look at, for example, the recall period and optimal time
period.
6 Survey specific challenges and plans
Next, Ed Dunn gave a presentation on the latest thinking within ONS’s Social Survey
Division about issues relating to mode.
• SSD currently operate a number of mixed mode designs: face-to-face at first
interview with telephone interviews in following waves (from the centralised
telephone unit); face-to-face at first attempt to gain response with reissue of
unsuccessful cases to telephone; face-to-face at first with telephone follow-up;
telephone with face-to-face reissue; face-to-face with paper diary.
• Paper and pencil interviewing (PAPI) is used on the International Passenger
Survey.
• Current mode related developments centre upon increasing telephone interviewing
on LFS (and other surveys); some LFS initiatives will commence in April 2010.
• Introducing web collection is a long term aspiration. A small scale pilot was
conducted in 2008.
• A number of internet data collection proposals have been developed:
o introducing an internet mode (and a web sampling scheme) to the
Labour Force Survey;
o introducing an internet mode to, or offering an internet-only, Opinions
(Omnibus) Survey; and
o other ideas including:
ƒ internet Keep-In-Touch Exercises;
ƒ an online diary for the expenditure survey;
ƒ online modules for surveys; and
23
•
•
•
•
ƒ internet follow-up surveys.
Drivers of change on the LFS: the cost of face-to-face LFS and the effort that is
needed to maintain response rates.
Barriers to change: data continuity, data quality concerns, software/hardware
challenges, case management issues and design challenges.
Possible options include:
o offering web completion in waves 2 – 5 as alternative to telephone; and
o running the survey with the normal LFS sample plus an additional
internet sample.
The second option was briefly outlined:
o Phase 1: run the survey with the same normal LFS sample plus an
additional internet sample;
o Phase 2: reduce the normal LFS sample by replacing it with the
internet sample where they overlap;
o Phase 3: reduce the normal LFS sample by replacing it with the
internet sample where they do not overlap, based on the assumption
that the larger internet sample means that, even with a smaller noninternet sample, overall precision is unchanged.
Dean Fletcher elaborated on the LFS ideas.
• The major issue is that due to the large sample size it seems disproportionate to
attempt all interviews face-to-face.
• Amending the LFS would be done in phases:
o In phase 1 the LFS would be protected, with an additional sample
drawn for web surveys. 4-6 weeks before the reference period a sample
of 250,000 people would be asked to register on-line whether they
would complete by web. In registering, they would give basic
demographic and contact details.
o 50,000 people might be expected to register on-line (based on
approximately 20% response on the ONS Opinions Survey internet
pilot).
o Emails/text messages could be sent to these responders as reminders.
o A random subsample of the remaining 200,000 would be taken, who
would be contacted in the normal LFS way.
o Those who registered via the web could use the web throughout the 5
waves.
o Flaws in the design would be that ineligibles and people who would
not take part would be pushed into the non-registered group for the
sub-sample (selection bias). The two methods might also introduce
mode effects.
o The web sample in Wave 1 could be used to test for mode effects. If
none were found, then in future phases the web sample could be used
to reduce the face-to-face sample.
The other participants were invited to comment on the ideas.
• It was thought that the assumptions of the proposal need to be tested as they
seemed optimistic.
• Concern was expressed that the web survey might be a move away from the
fundamental principle of stratified sampling with implications for inference. This
raised the question of how to weight for nonresponse?
24
• It was thought that 20% registering may be optimistic.
• The Census’s assumptions about 25% of response being by internet have been
undermined by low rates in the latest Census Rehearsal (although this might be
related to the Rehearsal areas).
o The LFS proposal is based on the internet pilot results, not Census.
• It was questioned how much cost-saving will result. Internet take-up would need to
be high in order for it to be cost-effective.
• It was asked if internet respondents would share certain characteristics.
o Dean agreed this would need to be built into the design.
• The idea was considered interesting, but risky, and would be observed with interest
should it be pursued.
• It was thought that asking people to register and then respond later may drive some
people away.
• Further concern was expressed about statistical precision and weighting factors (a
study by Jelke Bethlehem was mentioned). There may be a reduction in the
resulting effective sample size. Is there a correlation between internet access and
employment? This could impact on employment estimates and would need to be
reflected in the weighting.
o Dean suggested that it might be possible to ask a short registration
questionnaire for web respondents that would allow weighting to the
registering population.
• Further support for the idea was expressed within ONS. Charles Lound was asked
to analyse the proposal within the QIF project.
7 Scope for alternative/mixed mode designs for GSS
surveys
Following the introduction to the cases studies and presentation of the SSD ideas,
participants ‘brainstormed’ ideas for design changes on the LFS, on BCS and on
surveys conducted by other represented departments.
7.1 LFS ideas
Suggestions for the LFS included the following.
• Conduct large scale web recruitment.
• Shorten the LFS questionnaire. To collect the core set of employment variables on
the web and administer a longer questionnaire for a smaller sample size face-toface.
• Conduct a sub-sample of those registering for a web survey by face-to-face or
CATI, to measure mode effects/bias.
• The need for Standard Industry Classification (SIC) and Standard Occupation
Classification (SOC) was questioned.
o Verbatim questions (e.g. for occupation and industry coding) can lead to
coder variance.
• Administer a sub-sample of the initial sample face-to-face, the rest by web.
• Use mail - the cheap option.
25
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Implement a concurrent probability sample - 8000 to mail, 2000 face-to-face.
Use Telephone Data Entry on a shorter questionnaire.
More integration and better use of IHS core data.
It may be less risky to focus on Waves 2-5 and offer the web options then. Collect
information face-to-face at Wave 1 and then by web at Waves 2-5 as data on the
respondents will already be held.
‘Reverse CATI’: Advance letters could include a freephone number which allows
the respondent to provide a telephone number for the Telephone Unit to ring them.
Provide incentives to respondents to participate by telephone. But respondents
tend not to instigate a reaction – a pilot carried out on English Housing Survey
experienced low take-up.
Look at web surveys for producing longitudinal estimates.
‘Level and change’: conduct periodic large-scale measurement to benchmark level
with interim short term surveys focused on measuring change.
Take a ‘statistical efficiency view’: oversample those who are likely to change
their circumstances between waves and concentrate fieldwork resources on them.
An internet option may pick up only people who are willing to respond anyway –
saving on those who would usually be cheaper to recruit anyway, but still a
saving.
Internet collection allows you to capture those who live abroad as still UK
resident if away from the UK for less than a year.
People who don’t register on-line could complete by mail. Take a sub-sample of
those remaining for telephone and face-to-face.
In studies on sequential design it is rare that it has led to higher response.
o Too many reminders may put people off.
o The cases left for face-to-face ‘clean up’ are the hardest for interviewers to
contact or persuade.
Use paradata to model, for example, response propensity for areas or high levels
of broadband access – different styles of approach may work in different areas.
Reserve face-to-face for hard to enumerate areas?
A higher response rate may not necessarily lower bias.
Interviewers can collect brief doorstep and observational data on non-respondents
characteristics (increasingly done in the US).
It is possible to take more risks with response in low response areas (e.g. London).
There are many advantages starting with face-to-face in Wave 1.
The shift from interviewer to self-administered might raise issues of gaining
response from all household members (partial-unit data) and increase in proxy
response.
More use could be made of admin data instead of collecting survey data.
While sampling and estimation issues are critical the potential for mode effects
should be remembered, particularly as interviewers are able to provide help and to
motivate. Long lists of responses, such as on educational qualification questions
may cause effects.
However the LFS is fast-paced – self-administered modes may allow respondents
more time to consider answers.
The web could focus on things which are less likely to change – this raises issues
of fed-forward data.
Marketing and PR of ONS: the general public is largely unaware of the ONS.
Instead of focusing on response rates, focus on bias.
26
7.2 BCS ideas
Ideas and comments put forward relating to the BCS included the following.
• CATI interviews have been attempted on the BCS but were not successful.
• A Scottish Crime Survey parallel CATI and CAPI experiment supported this. The
telephone mode doubled estimates of crimes involving personal victimisation. The
theory was that people who have not been a victim of crime do not take part over
the phone.
• Consider a panel design.
• Train responders to record crime between waves of a panel survey leading to a
better recall of events.
• Use variable recall periods for different types of crime (regarding seriousness or
salience).
o The key BCS outputs are measures of ‘change’ and ‘level’ so variable
recall periods could make that complicated.
• Drop some of the modules to shorten interview length.
• Change the mode of child interviews.
o But there would be measurement issues with the definition of crime.
• Experiment with splitting the questionnaire into modules and having follow-on
surveys on-line.
• Split subsamples to go further.
• Subsample crimes at different rates; focus on those less well recorded (similar to
subsampling of health conditions on a US survey).
• Assign a proportion of the sample to experiments.
• For the existing CASI modules, test whether web or mail would affect the
prevalence rates.
• Review the importance of different objectives in the sample design.
• Look at scope for linkage to police crime records.
• Use known police information for adjustments/weighting.
• Ask about crimes which are comparable to police records periodically – but this
would be politically sensitive and hard to separate.
7.3 General ideas
• Consider making surveys compulsory.
o ACS testing of mandatory, voluntary and equivocal approaches – similar
overall response but at higher cost.
o Raises issues of ‘reasonable burden’ and whether to prosecute.
• Data linkage – use administrative data more effectively.
o Link to driving licence, identity cards etc data.
o Use to enhance sampling frames, to adjust for nonresponse bias and for
data validation.
o The ONS Beyond 2011 project is looking at creation of a ‘population
spine’.
• Use existing data sources like Acorn to enhance the sampling frame.
• Link to Census data for imputation for item non-response.
27
• Shorten interview lengths, although care is needed about the inferences made as
there is not really any evidence to infer that increased survey length decreases
response rates.
• Issues of inclusiveness and diversity (e.g. language, disability:
o Access to modes.
o Cognitive equivalence.
o Coordination across languages.
7.4 Other GSS surveys
Representatives from the other government departments were invited to discuss their
surveys and the implications of the workshop discussions.
7.4.1 Welsh Assembly Government
Cath Roberts outlined the situation for WAG which contracts out two surveys, the
Welsh Health Survey and the National Survey for Wales (formerly ‘Living in
Wales’).
• The Health Survey originated at a time when cost pressures were such that a long
CAPI interview was not an option.
o A self completion paper questionnaire is used. An interviewer makes
contact, completes the household enumeration, delivers the
questionnaire and collects it.
o Good response rates are achieved, around 70-75% (80% individual
completion).
o No proxy interviews are allowed.
o Constraints due to the mode are that they can’t cover many topics, and
can’t probe, or check responses or use complex filters.
• The National Survey is CAPI with paper questionnaires on some topics. It has only
been in the field for a short period so it was too early for results.
• Cath observed that it is not easy to move from CAPI to a paper questionnaire as
length must be reduced.
In response to a question about whether household non-contacts were left with
questionnaires, Cath said no, the interviewer must make contact with the household
and talk them through it. There was potential to look into this area.
7.4.2 Department for Transport
Abby Sneade talked about the National Travel Survey.
• They have been looking into replacing the diary with a GPS device that records
movements to collect data. A pilot of 130 people was conducted, respondents both
completing a diary and being given a GPS device.
o Results showed data often missing from one or other, and data not
always matching up.
o Using a pedometer was also a consideration (especially for short walk
data).
o Some respondents complained that the device was bulky.
o They are looking at improving the GPS device.
o The GPS data may have commercial value.
28
o GPS calculates mode of travel by analysing speed and distance.
o The current survey offers a £5 incentive.
o The placement interview could potentially be conducted online.
Other participants discussed ideas for the NTS.
• Web-enabled mobile phones were discussed as an alternative to GPS, although
they may be less accurate.
• There could be some verification of data at the collection call, going through the
GPS results with the respondent.
• The GPS could be used to help the respondent fill out a diary.
• The need to be careful of respondent burden was identified.
• It was suggested that data could be uploaded to the web; respondents could
confirm the data and obtain information about their travel (as an incentive for
participating).
• Online travel diaries could remove the need for checking calls.
• Reference was made to a health survey in the US which provided respondents with
details of the calories they had burnt as an incentive to take part.
• Respondents carrying a GPS may have concerns over government ‘spying’. Abby
commented that there were ethical barriers with the GPS method, i.e. the ‘big
brother effect’.
• There are also possible issues with using mobile phones of data security or
damaging respondent’s telephones.
• The DfT’s main driver for change was response rates, but respondent burden and
reduced checking/coding were also advantages. It was too early to say if the
initiative would reduce the cost.
• The Living Costs and Food Survey may be able to think about alternatives to its
paper diary component. Reference was made to studies by the Institute of Fiscal
Studies about using scanners to record expenditure data.
• Digital light-pens are to be trialled on the IPS
• Developments in mobile internet access were mentioned at a general level.
7.4.3 Scottish Government
Alex Stannard reviewed the situation for the Scottish Government’s surveys.
• The Crime and Justice survey has a clustered sample of 15,000 households.
• The Health Survey has an unclustered sample of 8,000 households. The
questionnaire is self-completion. A nurse visits to take measurements.
• The House Conditions Survey includes an interview element and a surveyor
visit.
• The General Practitioner Access survey is postal; the response rate is 50%.
• The main driver of change is to bring costs down by 15-20%.
• They do not want to alter sample sizes or data collection modes so are looking
at cutting length instead.
• All surveys are contracted out. There is a limited pool of contractors, so they
would like to see more competition to drive down costs.
• They are bringing design elements in house to save money, for example,
sampling, questionnaire design and report writing. They are also reviewing
their procurement processes.
• They are looking at unclustering samples as they are not convinced that
clustering saves much money in Scotland.
29
It was suggested that the health survey may be able to cut out the nurse visit element
by training their interviewers to carry out basic health checks.
7.4.4 Northern Ireland Statistics and Research Agency
Kevin Sweeney gave a brief summary of surveys at NISRA.
• CAPI is the main mode used.
• Most work is commissioned by other departments so costs are their concern;
NISRA has not faced pressure.
• Funding for the Northern Ireland boost of the Living Costs and Food Survey
has been cut.
• NISRA is generally trying to improve its methods and results, particularly
looking at development and incentives for interviewers.
• Their disability survey is mixed mode (CATI/CAPI).
o They use a land and property register and are able to match 35% of
telephone numbers.
o An advance letter is sent, asking the householders to check the
matched number or provide an alternative number.
o 8-10% of those approached have replied to this letter.
o Initial contact is by telephone, then face-to-face.
• On the Housing Survey central data on property size and rateable value is
available.
• IPS has started in Belfast, using pen tablets.
• The Northern Ireland Tourist Board Survey has been piloted in CAPI using
tablet PCs.
• They have been urging departments to consider reducing the complexity of
questionnaires, modularisation and rotation as a way forward.
Encouragement was felt within ONS about the 8-10% replying to supply a telephone
number as this could still save a lot of money on the LFS.
7.4.5 Communities and Local Government
Suzanne Cooper raised some points relating to CLG.
• The Citizenship Survey has conducted various boosts (ethnic minority; Muslim;
Welsh).
• The survey (primary mode – CAPI) is looking at the feasibility of introducing a
paper questionnaire for young persons.
• Although additional funding has been provided, the additional complexity of
boosts/different designs has put the survey under pressure.
• The contractor pool is too small.
• (Note provided after the workshop: as a way of taking forward the
above concerns CLG are to conduct a strategic review of the Citizenship Survey,
looking at its current and future aims and assessing its objectives, what it delivers
and how it operates.)
30
8 Summary of discussions and emerging themes
There then followed a review of the comments made during the brainstorming on the
two case studies.
8.1 LFS
• We should be making it easy to take part for those who are willing.
• A free-phone number with advance letter/postcard for respondents to provide
telephone numbers seems a good idea.
• There were no strong objections to the proposal Ed Dunn and Dean Fletcher had
put forward with regard to using the internet, although more detail would be
required to assess it fully.
• Mode effects are not considered to be a major issue.
• The questionnaire would need to be shorter to be on the internet (or paper).
• Modularise the LFS.
• More evidence should be collated on the areas we are sampling and used to tailor
our approach to areas.
• We shouldn’t send too many reminders as this can cause a refusal.
8.2 BCS
•
•
•
•
The core questionnaire requires administration by an interviewer.
Some modules could be put on the internet (or paper).
A panel approach could show change over time.
A possible approach would be to use an interviewer to collect basic information
and leave a self-completion paper questionnaire.
• A lot more testing and feasibility work is done in the US; the UK needs to do more
(as distinct from embedded tests which are usually done here).
• We need scientific evidence to be confident about making changes.
• Possibly use administrative data to match records.
8.3 Emerging themes
The main themes emerging from the workshop were then summarised by Martin
Brand.
• Shorten/reorganise questionnaires:
o Short/long form versions;
o Use of a core set of questions;
o Modular structure to get detail.
• Maximise use of cheaper modes like telephone.
• Testing:
o Needs funding and willingness to do on an ongoing basis.
• Future proofing:
o Changes in society;
o Changes in technology;
o Landline/mobile issue.
• Systems capability:
o Case management systems is a big issue:
31
•
•
•
•
Organisational inertia is a concern.
Breaks in time series are a concern: continuity is important.
There are no ‘pain-free’ solutions.
Different considerations may apply to Great Britain, the UK and regionally.
Participants then provided their final reflections to conclude the workshop.
• CAPI is widely accepted as the best mode but is it affordable?
• Is there now an opportunity to return to the concept of a modular Integrated
Household Survey?
o surveys/modules with different mode-suitabilities;
o exploiting cheaper modes;
o reducing risk
• Good management systems are needed to allow the switching between modes if a
sequential approach is taken.
• Consider surveys systems not just data collection: internet and mail are cheap ways
to make contact.
• Respondent fatigue is a risk if much larger samples are contacted by using cheaper
modes.
• Consider the potential of using different modes in different areas due to their
history, availability and appropriateness.
• Sequential modes raise the issue of timeliness.
o The Dutch approach to combining data to get around the issue of the
time lag when using web collection appears good.
• There is not much evidence on mode effects.
• Mode effects may cause errors but there will always be such errors present that are
largely accepted, like interviewer error.
• More testing is needed: every bit of data collection is an opportunity to learn more
about the process and about the effects of making changes.
• Back-ups and tests are needed if we move to web data collection; if this impacts
employment/unemployment figures then we could be in trouble
• Transitioning:
o is not an easy task;
o transparency is needed;
o make sure users are aware of changes;
o explain the reasons for changes; and
o make risks and implications clear.
That concluded the workshop.
9 References
Betts P and Lound C (2010a), ‘The Application of alternative modes of data collection in UK
Government social surveys: A report for the Government Statistical Service’, Office for National
Statistics. Available at: http://www.ons.gov.uk/about/who-we-are/our-services/data-collectionmethodology/reports-and-publications/alternative-modes-of-data-collection/index.html
Betts P and Lound C (2010b), ‘The Application of alternative modes of data collection in UK
Government social surveys: Review of literature and consultation with National Statistical Institutes’,
Office for National Statistics. Available at: http://www.ons.gov.uk/about/who-we-are/ourservices/data-collection-methodology/reports-and-publications/alternative-modes-of-datacollection/index.html
32