Social Survey Methodology Bulletin Number 49

No. 49
July 2001
Contents
The Internet as a mode of data collection in government social
surveys: issues and investigations
John Flatley
1
Using the Omnibus Survey to pilot the Arts Council of
England’s survey on arts attendance, participation and
attitudes
Tania Corbin
11
Developing a weighting and grossing system for the General
Household Survey
Jeremy Barton
15
The General Household Survey as a sampling frame: the
example of the Survey of Poverty and Social Exclusion
Joanne Maher
27
Evaluation of survey data quality using matched Census –
survey records
Amanda White, Stephanie
Freeth & Jean Martin
39
Forthcoming conferences, seminars and courses
51
The National Statistics Methodology Series
54
Recent Social Survey Division Publications
56
The Social Survey Methodology Bulletin is produced primarily to inform staff in the Office for
National Statistics (ONS) and the Government Statistical Service (GSS) about the work on
social survey methodology carried out by Social Survey Division (SSD) of the ONS. It is
produced by the Social Survey Methodology Unit within SSD, and staff in the Division are
encouraged to write short articles about methodological projects or issues of general interest.
Articles in the bulletin are not professionally refereed as this would considerably increase the
time and effort to produce the bulletin: they are working papers and should be viewed as such.
The bulletin is published twice a year, in January and July, and is free to staff within the ONS
and the GSS. It is made available to others with an interest in survey methodology at a small
cost to cover production and postage.
The Office for National Statistics works in partnership with others in the Government
Statistical Service to provide Parliament, government and the wider community with the
statistical information, analysis and advice needed to improve decision-making, stimulate
research and inform debate. It also registers key life events. It aims to provide an authoritative
and impartial picture of society and a window on the work and performance of government,
allowing the impact of government policies and actions to be assessed.
Edited by:
Lucy Haselden
Prepared by: Sarah Binfoh
The Internet as a mode of data collection in government
social surveys: issues and investigations
John Flatley
1. Introduction
For those in the statistics business, web technologies offer the potential to speed the process
of collecting data; to reduce the costs associated with collection and processing, and to make
data accessible more widely and more timely. It is not surprising, therefore, that national
statistical institutes (NSIs), such as the Office for National Statistics (ONS), are racing to
embrace the Internet to enhance the service they deliver to government, other customers and
to citizens. This is the business imperative. In addition, governments have added political
impetus. For example, here in the United Kingdom (UK) central government has set a target
for all public services to be available on-line by 2005. Similar, but more ambitious, targets
have been set in Australia (2001) and Canada (2004). As the number with Internet access
grows and people become accustomed to using it, there is likely to be increasing demand
from the providers and users of official statistics to use the web for such purposes.
Social Survey Division (SSD) of the ONS is examining the feasibility of using the web for
the surveys that it undertakes. Nationally representative surveys of the private household
population carried out on behalf of ONS or other government departments, forms the core
business of SSD. Such surveys employ random probability sample designs and are
voluntary. Surveys undertaken by SSD tend to be undertaken with trained interviewers using
computer assisted interviewing (CAI). Computer assisted personal interviewing (CAPI) is
often the single mode of data collection. There are examples of surveys that combine CAPI
with another mode. For example, the Labour Force Survey (LFS), which is a panel survey,
uses CAPI for first interview and offers a computer assisted telephone interviewing (CATI)
option for subsequent interviews (there are five interviews in total). On both the Expenditure
and Food Survey (EFS) and National Travel Survey (NTS), following an initial CAPI
interview, respondents are asked to self-complete a paper diary. SSD is also commissioned
to sample other populations, such as school pupils, in which paper and pencil interviewing
continue to have a role though interest in using the web for such surveys may increase.
The characteristics of government social surveys of the general population do not readily
lend themselves to web data collection. Social surveys of the private household population
tend to have the following requirements:
•
random probability sampling
•
long and complex questionnaires, e.g. interviews of 1 hour or more are typical
•
high response rates, e.g. 80% plus are a common requirement
These features contrast with common practice on web surveys and raise questions about the
potential use of the Internet for general population surveys. ONS, like other NSIs, has begun
to consider the issues that are raised by the possible collection of data via the web on social
surveys of the general population. This paper draws on initial work in which, following a
literature review and discussions with others already working in the field 1 , we have
identified a number of key issues. We start by considering the issue of population coverage
and sampling and present the latest government statistics on Internet penetration.
1
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
2. Population coverage and sampling
2.1. Use of Internet
The National Statistics Omnibus survey started to include questions on the Internet on a
regular basis in July 2000 and have been asked once every three months. The Omnibus
Survey has a random probability sample design such that it yields results which are
nationally representative of all adults in Great Britain.
According to the January 2001 Omnibus Survey, an estimated 23 million adults in Britain
have used the Internet at some time either at their own home or elsewhere, such as at work
or another person’s home (ONS, March 2001). This is equivalent to half (51%) of the adult
population in Great Britain and compares with a figure of 45% in July 2000 (ONS,
September 2000).
Eight in ten adults who had ever used the Internet by the time of the latest survey (January
2001) had accessed it in the month prior to the interview. This represents some 40% of
Britain’s adult population. The Omnibus Survey shows that a higher proportion of men
(57%) had used the Internet than women (45%). This gender gap is consistent by age.
As expected, the proportion using the Internet declines with age. For example, among those
aged 65-74 years 15% had used the Internet and just 6% among those aged 75 years and
over. Overall two-thirds of all adults who had ever accessed the Internet were aged below 45
years old. Between July 2000 and January 2001 the largest increase in Internet usage was
recorded for the youngest age group (up from 69% to 85%). This contrasted most strongly
with those aged 75 years and over, among whom there was no growth recorded (6% at both
times).
2.2. Home Internet access
Another source, the FES, can provide nationally representative estimates of the number of
households with home Internet access. Figures for the FES refer to the whole of the UK. A
question on home Internet access has been included on the FES since April 1998. Results on
Internet access are produced on a quarterly basis and the latest figures relate to the final
quarter of the year 2000 (October-December). Up to March 2000 this question was asked
only for households who had already said that they had a home computer. As new methods
of accessing the Internet have begun to be taken up the levels of Internet access in the first
quarter of 2000 may be understated. From April 2000 all households have been asked about
home access to the Internet and if they have it, the means of access.
In the first quarter in which such questions were included (April-June 1998), 9% of UK
households were estimated to have home Internet access. This is equivalent to some 2.2
million households. Since the first quarter of 1999, the survey has recorded a steady increase
in the proportion of households with access to the Internet at home. The latest estimates, for
October-December 2000, are that 8.6 million households now have home access. This is
equivalent to one in three UK households (35%) and is a four-fold increase on the
comparable figure in 1998.
Analysis of the latest quarter shows that the vast majority of those with home Internet access
(98%) use a home computer to access the Internet. Take-up of new technologies, such as
WAP phones and digital televisions has yet to make a large impact. Over time the growth in
the use of these new technologies may be the driving force behind increases in home
Internet access. With the planned withdrawal of analogue transmitters by the end of this
decade it is possible that nearly all UK households will have access to digital televisions,
and possibly, as a by-product, home Internet access. This may have implications for the web
survey designer. Currently such forms of web access do not have the same functionality as
2
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
computers and few of the existing web surveys would run on such a platform. Another
implication relates to possible differences in the profile of users. As yet there is little
research on these issues.
2.3. The digital divide: unequal access to the Internet
The FES demonstrates that the proportion of households with home access to the Internet
varies by a range of social and demographic factors. For example, FES data show that there
is a strong relationship between household income and home Internet access. Over the
financial year 1999-2000, the percentage of households with access to the net at home was
lowest amongst households in the four lowest deciles (ranging between 3% and 6%). In both
the fifth and sixth decile groups 15% of households had home Internet access. Amongst the
remaining deciles access increased steadily with income, for example rising from 22%
among those in the seventh decile to 48% for those in the highest decile (ONS, July 2000).
On the basis of two years data it is difficult to comment on the extent to which there is
evidence of the narrowing of the gap between lower and higher income groups. In the last
two years there was growth in home Internet access across all income decile groups and the
pattern of unequal access was broadly similar.
Analysis of the January 2001 Omnibus Survey has shown a clear relationship between usage
and social class. Here social class is based on the occupation, or previous occupation, of the
household reference person2 . For example, the proportion of adults who had used the
Internet was higher than average (51%) among those living in households headed by a
person in Social Class I, professionals and managers and Social Class II, Intermediate nonmanual (78% and 65% respectively). This contrasted with those whose household head was
in one of the manual social classes, where the proportions were as follows: 37% for those in
Social Class III manual, 33% for those in Social Class IV partly skilled manual and 27% for
those in Social Class V unskilled manual (ONS, March 2001).
Analysis of the 1999-2000 FES has shown households with the highest levels of access in
1999-2000 were those comprising couples with children (31% of couples with one child and
35% of couples with two or more children). This compared with an average figure of 19%
for the whole of the UK at that time. A lower proportion of lone adult households had
Internet access (7% for those with one child and 11% for those with two or more children).
It is interesting to note the general pattern that, irrespective of the number of adults, rates of
access were higher for households with two or more children compared with those
containing one child. As expected, the lowest rates of access were found for households
comprising people who were retired and living on their own or as couples (1% and 5%
respectively) (ONS, July 2000).
While access to the Internet remains at a relatively low level, there is no prospect of the web
replacing other modes of interviewing on high quality general population surveys which
require nationally representative surveys. Even if access was to reach a near universal level,
such as is the case with telephones (according to the General Household Survey, 96% of
households have one), there is another challenge: the issue of sampling.
2.4. Sampling issues
Random probability sampling requires that each member of the population of interest has a
known (non-zero) chance of being selected for the survey. In the UK, there is not a central
register of the population, as is the case in some Scandinavian countries, to act as a sampling
frame. For the population of adults living in private households, however, there are proxies
such as the Postcode Address File (PAF) and Electoral Register (ER). In SSD, most of the
general population survey samples are drawn from the PAF. Most of our surveys are carried
out by our field force of trained interviewers using computer assisted interviewing (CAI)
3
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
with laptops. Selected addresses are visited by interviewers to establish whether or not the
address listed contains a household that is eligible for the survey and, if so, to attempt to
persuade potential respondents to participate in the survey. Surveys that use face-to-face
interviewing tend to find that around 11-12% of addresses listed in the PAF are found to be
ineligible. Some surveys over or under sample particular groups but because the chance of
selection is known weighting strategies can be devised to adjust for known survey bias that
would otherwise result.
Currently, there does not exist a single listing of people with access to the Internet. Email
lists exist for membership associations, such as the Association for Survey Computing
(ASC) or the Social Research Association (SRA), and for employees of businesses and other
organisations. If these are reasonably comprehensive and up-to-date such lists could be used
to invite all, or a sample of, members to respond to a web survey. No such list of the general
population exists here in the UK; nor is one likely in the foreseeable future.
Even if a list were to become available for sampling purposes, we would have difficulty in
ensuring that each sampled individual had a known chance of being selected because
individuals may possess more than one email address at any one time. Some people with
web access may not use email, even if they were required to set up an email account by an
Internet Service Provider (ISP). Further, as people move from one ISP to another there is
potential for email addresses once used to become dormant. We know little about the
general public’s use of email addresses and we don’t have good answers to questions such
as:
•
What is the average number of email addresses that a household or individual
maintains?
•
What is the average life of an email address?
•
What is the rate at which households or individuals acquire new email addresses?
In addition because of the great variety in the nomenclature of email addresses, there is little
prospect of us being able to randomly generate email addresses in a manner that is possible,
though not without problems, for random digit dialling.
3. Design and implementation issues
3.1. Screens and scrolling
There are two main approaches to questionnaire construction that have conventionally been
used on web surveys. With a screen-based approach usually one question is displayed on
each screen and each time a respondent answers a particular question the screen refreshes to
display the next question. Scrolling refers to the situation when the whole of the
questionnaire is available at one time by allowing the respondent to scroll up and down,
using a scroll bar to the right of the screen. The choice of which method to use is often
related to the way in which respondents are invited to complete the survey. Screen by screen
presentations are often used with on-line surveys where there is the need for continuous
contact between the respondent’s computer and a web server. Off-line surveys often use a
plain (Hypertext Mark-Up Language) HTML form which the respondent is able to download
from a website or receive as an email attachment. The pros and cons of on-line and off-line
approaches are discussed later. Here we’re concerned simply with design issues.
It has been argued that scrolling is more akin to the way in which people use computers and
surf the web and for that reason should be preferred (Dillman, 2000). It is true that
experienced web users are likely to find a screen by screen presentation more cumbersome.
They may also be frustrated if they feel their freedom to navigate is constrained. This maybe
4
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
compounded if the time delay between individual screens refreshing is perceived by
respondents to be excessive.
Another advantage of scrolling is that respondents can more easily form a perception of the
content and the length of the questionnaire before they begin their task. This makes scrolling
more like conventional paper self-completion questionnaires where the respondent can
easily flick through the document before answering and can form a judgement about how
long it will take them to complete.
For these reasons, Dillman argues that scrolling should be preferred whenever web surveys
are being used to replace paper self-completion methods. However, if web surveys are being
used in a mixed mode environment, alongside CAPI or CATI, then the case for using a
screen by screen construction is stronger. One could argue that it is more akin to the way in
which questions are ‘served’ and answered on these modes.
Off-the-shelf software is increasing the choices available to questionnaire developers and
greater flexibility in questionnaire construction is becoming possible.
3.2. On-line versus off-line interviewing
An on-line survey is one in which the respondent accesses the questionnaire instrument on a
website and completes it, via their web browser, while connected to the Internet.
Respondents can be directed to the URL either through paper instructions, for example sent
by regular mail or left by an interviewer, or in an email, possibly with a hyperlink.
On-line surveys need constant contact between the respondent’s computer and the web
server upon which the instrument is located. Thus one needs to consider whether or not this
requirement is likely to be acceptable to respondents. For Intranet surveys, for example of
employees of a business or other organisation, this is likely to be perfectly acceptable. One
proviso is that the connection speed between the respondent’s computer and web server
must be sufficient to ensure that there are not unacceptable delays in the processing of
individual questions.
However, on-line methods raise a number of issues for general population surveys. The first
of these concerns the length of time that a respondent is required to be on-line to complete
the survey. This is likely to be of particular concern to those who pay by the minute for their
time on-line. One could offer to compensate respondents for the cost of their time on-line
but we need more research to find out whether or not this will be perceived as an adequate
incentive for all respondents.
As outlined above, on-line surveys can be problematic if the time it takes for the response to
one question being transmitted to the server and for the server to respond is deemed to be
excessive. For general population surveys, where few households currently have high speed
Internet access, this is a key concern. This suggests that on-line surveys are only feasible for
general population surveys where there are few questions being asked and the duration of
the interview is short.
In an off-line survey, respondents may initially go on-line to download a file, containing the
survey instrument, to their local computer, completing the survey off-line before going back
on-line to transmit the completed questionnaire back to the host. An alternative is for
respondents to be sent a file, either by regular mail on CD or floppy disk or as an attachment
to an email. Respondents need to install the file on their computer before answering the
questionnaire. Once the questionnaire is completed, respondents can go on-line to return the
completed questionnaire. There are other concerns, such as security or computer
functionality, that may be relevant to a decision about on-line or off-line surveys.
5
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
Respondents’ competence in the use of the Internet may need to be higher to complete all
the required tasks than is the case of on-line use.
Off-line surveys are preferred when the survey is longer and complex. However, off-line
surveys can introduce other concerns. For example, the time that it takes for the
questionnaire instrument to download to the respondent’s computer may be perceived to be
excessive. In addition, respondents may be unwilling to download files to their own hard
drive worried that they may contain viruses or otherwise damage their computer. A possible
alternative to downloading is to send respondents a file via regular mail, for example on CD
or floppy disk, or as an email attachment. However, as with the download method this still
requires some degree of trust in the integrity of the files being sent. It adds to the burden on
respondents and for some may be enough to dissuade them to participate. Off-line surveys
are likely to work best when the files that need to be installed on a respondent’s computer
are relatively small. This means that there may be less opportunity to build complex
instruments, for example with extensive interactive edits and multi-media, if as a result
respondents are required to download large computer files.
3.3. Plain HTML versus advanced programming
The simplest approach to designing web questionnaires is to provide a plain HTML form
that respondents can access via their normal browser. However, plain HTML is limited. It is
necessary to supplement HTML with the use of advanced programming languages, such as
Java, to allow the use of interactive edits and routing and the addition of features such as
pop-up help and progress bars.
One could argue that the possibility of including such features represents a radical step
forward compared with what is possible with conventional paper self-completion forms.
However, the use of advanced programming languages can be problematic since different
browsers may be inconsistent in the way in which they interpret Java and JavaScript.
Further, the addition of such features necessarily results in the file size of the instrument
growing. This impacts on download times and increases the risk of non-response.
It is worth noting that usability tests performed by the US Census Bureau suggested that
respondents rarely made use of on-line help. This was true irrespective of how it was
presented, for example via buttons, hyperlinks or icons (Kanarek and Sedivi, 1999).
3.4. Security
Security is a major issue for government surveys. Respondents to official surveys need to be
assured that the data that they give us will be kept secure. At the same time, we need to
devise systems that protect the integrity of our own data from hackers (Ramos, Sedivi and
Sweet, 1998 and Clayton and Werking, 1998).
The US Census Bureau has developed a three level security system including the use of
encryption, authentication and a firewall. Encryption provides strong security of data on the
Internet while it passes between respondent and the Bureau. Their encryption solution
requires respondents to have version 4.0 or higher of either Netscape Communicator or
Microsoft’s Internet Explorer and enables strong 128-bit encryption. Once received at
Census bureau, the data are protected by a firewall that guarantees that others from outside
cannot have access to the data (Kanarek and Sedivi, 1999).
There is a tension between the strength of security and survey response. The experience of
the US Census Bureau has been that the more stringent the security the lower is the
response. In part, this appears to be related to technical issues, such as respondents not
having the required browser version or encryption level. It is also possible that requiring the
6
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
use of usernames and passwords to access a web survey may be problematic for some
respondents (Sedivi, Nichols, Kanarek, 2000)
3.5. Data quality
One of the attractions of web based data collection, compared with paper self-completions,
is the improvement in data quality that is offered with the use of CAI methods. If the
technical issues noted above can be overcome, WEB CASI opens up the possibility for more
complex questionnaires to be self-administered than is possible with paper based
approaches. Complex routing can be incorporated in a way that is not feasible in a paper
document. Computations and edits can be carried out at the time of the interview and
inconsistent or improbable responses checked with the respondent. This is analogous of the
move from pencil and paper methods to CAPI and CATI.
An important aspect of data quality is the reduction of measurement error. This arises when
inaccurate answers are recorded for respondents. There are different sources of measurement
error, such as poor question wording and differences arising from the mode in which the
interview is conducted. It is possible that there will be mode effects arising from the switch
from other modes to web CASI. For example, compared with CAPI or CATI, there is a
difference between the way in which information is presented to respondents – a move from
aural to visual presentation. The speed with which respondents read and respond to
questions can’t easily be controlled. Clearly more research is needed in this area to examine
such questions.
3.6. Survey response
Currently, web surveys have tended to achieve lower response than mail surveys. The US
Census Bureau has found no evidence from its trials, largely among businesses, that offering
a web option in addition to other modes leads to an overall increase in response rate (Sedivi,
Nichols and Kanarek, 2000). Part of this is likely to be accounted for by technical problems,
discussed earlier. Another possible explanation offered relates to respondents’ concerns
about privacy and confidentiality (Couper, 2000).
Response rates to self-completion surveys tend to be lower than to interviewer administered
surveys and there is every reason to expect that this will be the case for web surveys too.
CAPI and CATI achieve higher response than mail surveys because of interviewer
involvement in the initial recruitment of respondents and in sustaining respondent
involvement. Interviewers can be good at persuading respondents to participate by
explaining the value of the survey and the respondents’ participation. During the interview
itself, interviewers can develop a rapport with respondents that encourages them to continue.
For conventional surveys, the proportion of respondents who refuse to complete an
interview, once started, is low. This contrasts with web surveys. Opinion on the optimal
length of web surveys is divided but anecdotal evidence suggests that 15-20 minutes is as
much as one can expect. This compares with CAPI/CATI surveys that can range from 30-40
minutes up to one hour or more.
One interesting feature about the analysis of the profile of the on-line population, presented
earlier, is the over-representation of younger adults. This is one sub-group which tends to be
under-represented in random probability sample surveys of the general population. WEB
CASI might be a way to increase response in this sub-group. However, there is evidence to
suggest that it is not a good idea, in mixed mode surveys, to offer respondents a choice of
response mode. For example, the US Census Bureau tested whether or not response was
boosted by offering both mail and telephone options. Overall 5% of the sample responded
by phone but the overall level of response (mail and phone combined) was the same as when
only a mail option was offered (Dillman, Clark and West, 1995).
7
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
Questionnaire designers need to be aware that the use of more advanced techniques may
impact on respondents’ ability to respond. The more complex the questionnaire instrument
is, and the more that elaborate programming is embedded within it, the greater will be the
size of the resulting file that respondents will need to download to run the questionnaire.
This will impact on the time it takes to download the questionnaire and is very likely to
impact on response.
3.7. Design principles
It has been argued that the same visual design principles that apply to the design of paper
questionnaires apply for web surveys (Dillman, 2000). Respondents have the same
requirement for information that is presented clearly and efficiently. Layout and design
should aim for respondents to read every word of each question and in a prescribed order.
Dillman had developed a set of 14 design principles for web surveys. These are:
1. Introduce the Web questionnaire with a welcome screen that is motivational, emphasises
the ease of responding, and instructs respondents about how to proceed to the next page
2. Provide a PIN number for limiting access only to people in the sample
3. Choose for the first question an item that is likely to be interesting to most respondents,
easily answered, and fully visible on the welcome screen of the questionnaire
4. Present each question in a conventional format similar to that normally used on paper selfadministered questionnaires
5. Restrain the use of colour so that figure/ground consistency and readability are
maintained, navigational flow is unimpeded, and measurement properties of questions are
maintained
6. Avoid differences in the visual appearance of questions that result from different screen
configurations, operating systems, browsers, partial screen displays, and wrap-around text
7. Provide specific instructions on how to take each necessary computer action for
responding to the questionnaire, and give other necessary instructions at the point where
they are needed
8. Use drop-down boxes sparingly, consider the mode implications, and identify each with a
“click-here” instruction
9. Do not require respondents to provide an answer to each question before being allowed to
answer any subsequent ones
10. Provide skip directions in a way that encourages marking of answers and being able to
click to the next applicable question
11. Construct web questionnaires so they scroll from question to question unless order
effects are a major concern, or when telephone and Web survey results are being combined
12. When the number of answer choices exceeds the number that can be displayed in a
single column on one screen, consider double-banking with an appropriate grouping device
to link them together
13. Use graphical symbols or words that convey a sense of where the respondent is in the
completion process, but avoid those that require significant increase in computer resources
8
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
14. Exercise restraint in the use of question structures that have known measurement
problems on paper questionnaires, such as check-all-that apply and open-ended questions
Another view is that the web is a fundamentally different medium from paper and that much
more research is required before we can determine optimal designs (Couper, 2000). Couper
suggests that design issues interact with the type of web survey being conducted and the
population which is being targeted. For example, an optimal design for a survey of teenagers
may be quite different from that which is required for a survey of elderly people.
Dealing with the variety of computer hardware and software that may be used by
respondents to access the survey also needs to be kept in mind. A range of factors, such as
operating system, browser type and version and Internet connection speed may affect the
functionality of the survey instrument. Most of these are outside our control. The result may
be that some respondents may not be able to access the questionnaire at all, others may find
it slow to respond and/or the questionnaire does not display as intended.
4. Conclusion
This paper has been concerned with the use of the web for government social surveys of the
general population. Such surveys are required to yield nationally representative estimates.
The fact that the proportion of households with Internet access remains low and that the online population differs in important respects from the general population presents a major
challenge. Issues of population coverage and sampling mean that it is not possible to achieve
good population coverage with stand-alone web-surveys. This suggests that in the immediate
future the most likely application of the web in official social surveys is as part of a mixed
mode data collection strategy alongside others, such as mail, CAPI or CATI.
It is worth noting that, to date, it is with surveys of businesses that NSIs have made most
progress with using the Internet as a mode of data collection. Rates of Internet access are
relatively high (compared with households) and an increasing number are using the net to
conduct their business. Most of the surveys that have offered a web option are relatively
short (compared with social surveys) and self-completed. Here the web offers the potential
to reduce costs, speed processing and improve data quality (e.g. with use of interactive
edits).
A number of the issues raised in this paper are common to both surveys of businesses and
social surveys of the general population. However, the challenges are greater for social
surveys. For general population surveys that have self-completion elements there is growing
interest in reporting via the web. Offering a web option for parts of the survey that are
interviewer-administered raises even more challenging questions, such as the effect on
overall response rates and comparability with data collected by other modes.
NSIs are starting to grapple with the issues raised in this paper and the ONS has established
a project (the WEB CASI project) within SSD to investigate these issues and take this work
forward. Some of the points raised in this paper will be explored in a trial study during
which former respondents to one of ONS’ surveys, who reported having home Internet
access at the time at which they were interviewed, will be approached to participate in a trial
study. The results of our work will be shared within the social survey research community to
contribute to the development of our knowledge in this area.
This paper was first presented at the ASC conference: The challenge of the internet in
May 2001
9
SMB 49 7/01
John Flatley
The Internet as a mode of data collection in government social surveys
References
Couper, M.P. (2000) Web Surveys: A Review of Issues and Approaches Public Opinion
Quarterly Volume 64, American Association for Public Opinion Research.
Clayton, R.L. and Werking, G.S. (1998) Business Surveys of the future: the world wide web
as a data collection methodology in Couper MP, Baker RP, Bethlehem J, Clark CZF, Martin
J, Nicholls II W L and O’Reilly JM (eds) Computer Assisted Survey Information Collection,
John Wiley and Sons.
Dillman, D. Clark, J.R. and West, K.K. (1995) Influence of an invitation to answer by
telephone on response rates to census questionnaires Public Option Quarterly, Volume 58,
American Association for Public Opinion Research.
Dillman, D. (2000) Mail and Internet Surveys, John Wiley.
Kanarek, H. and Sedivi, B. (1999) Internet Data Collection at the U.S. Census Bureau, Paper
to Federal Committee on Statistical Methodology conference.
Office for National Statistics (July 2000) First Release: Internet Access, 1st Quarter 2000.
Office for National Statistics (September 2000) First Release: Internet Access.
Office for National Statistics (March 2001) First Release: Internet Access.
Ramos, M. Sedivi, B. and Sweet, E.M. (1998) Computerised Self-Administered
Questionnaires in Couper MP, Baker RP, Bethlehem J, Clark CZF, Martin J, Nicholls II W
L and O’Reilly JM (eds) Computer Assisted Survey Information Collection, John Wiley and
Sons.
Sedivi, B. Nichols, E. and Kanarek, H. (2000) Web-based collection of economic data at the
U.S. Census Bureau Paper presented to ICES II Conference.
Notes
1
This paper draws on the work already undertaken by other NSIs, especially in the
United States and Canada, and the author is grateful to those who have so willingly
shared their experience. In particular, I am grateful to Howard Kanarek and
Barabara Sedivi Gaul of the U.S. Bureau of the Census and other participants of the
Web Surveys session at the recent FEDCASIC workshop in Washington DC for
their helpful insights.
2
The person in the respondent’s household in whose name the accommodation is
owned or rented, or if there two or more such people, the one with the highest
personal income
10
SMB 49 7/01
Using the Omnibus Survey to pilot the Arts Council of
England’s survey on arts attendance, participation and
attitudes
Tania Corbin
1. Introduction
In order to test questions for inclusion in a survey, a pilot study is often carried out.
Although this provides useful information about how an interview might work at a main
stage, it is limited in its use for in-depth question testing.
Piloting will give an overall impression of whether a set of questions ‘work’ but will not
give insight into how questions are understood. One way of testing how questions are
understood is by use of the cognitive process model.
This paper describes how the cognitive process model was used to test questions for a
module on the National Statistics (NS) Omnibus Survey1 for the Arts Council of England in
November 2000.
2. The Cognitive Process Model
Reliable survey results depend on respondents providing accurate and relevant responses to
survey questions. However, this is not as straightforward as often thought because when
asked a question respondents go through a three-stage cognitive process before giving a
response (Willis, G. 1994).
The first of these stages is comprehension of the question; what do respondents think the
question is asking and how are they interpreting specific words and phrases in the question?
The next stage is the retrieval of relevant information from memory; what types of
information do respondents need to recall to be able to answer the question and what
strategy do they employ to recall it? For example, do they count events by recalling each
one individually, or do they use an estimation strategy?
The final stage in the process involves respondents in making decisions about their answer.
Having understood what they think the question is asking and retrieved the relevant
information, respondents have to decide how to present their response. Are they willing to
devote enough mental effort to answering the question accurately and are they truthful in
their response? Do they amend their response in order to present themselves in a better
light?
3. Background to the Arts Council Omnibus Pilot
The aim of the Arts Council module on the Omnibus was to pilot a set of questions to assess
the frequency of attendance at, and participation in , arts and cultural events and activities,
and to measure attitudes towards the role, value and public subsidy of the arts. The results
of the pilot will inform a large-scale survey to be carried out by ONS during 2001.
The only national data currently available on arts attendance are those provided by BMRB’s
Target Group Index (TGI) which covers only eight types of arts activities2 . The Arts
Council is interested in collecting data on the complete range of arts and crafts which it
funds, as well as on some non-funded activities. A wide-ranging survey was carried out in
1991 (Arts Council of Great Britain, 1991), but this needed updating to reflect the changing
role of the arts in England and to provide a picture of people’s current engagement with the
11
SMB 49 7/01
Tania Corbin
Using the Omnibus Survey to pilot the Arts Council of England’s survey
arts. In addition, the Arts Council is required to report on attendance at arts events as part of
its Funding Agreement with the Department for Culture, Media and Sport (DCMS).
The Omnibus Survey pilot data were analysed by the Question Testing Unit 3 who were able
to draw on their experience of cognitive interviewing and analysis. This article describes the
advantages of using this combination of methods and the value that was gained for the Arts
Council.
The pilot module was designed to allow the Arts Council to assess the questions by asking
both the interviewers and respondents how well they worked. The data came from three
different sources:
•
questions in the administrative section of the questionnaire asking interviewers to record
how respondents had dealt with the showcards used in the module and how they had
reacted to the module as a whole 4
•
‘cognitive questions’ placed within the body of the questionnaire aimed at gaining
information about the thought processes respondents went through to arrive at their
answer5
•
general comments on the module by the interviewers highlighting elements of the
questionnaire they had found problematic and suggesting ways in which they felt it
could be improved.
Because the pilot module included cognitive questions it provided not only structural data
about the questionnaire and survey process but also information about:
•
how respondents interpreted questions,
•
whether or not respondents were able to recall the information required to answer
questions, and
•
whether or not respondents found questions sensitive or embarrassing.
4. Using the cognitive process model on the Arts Council Pilot
Problems with survey questions can often be located in one or more of the steps in the
cognitive process discussed above. The cognitive questions in the Arts Council pilot module
revealed no major instances of stage 1 problems apart from a few respondents who thought
that the phrase ‘public funding’ , when applied to the arts, meant that it was ‘financed by
means of ticket sales’.
The Arts Council module included questions about respondents’ frequency of participation
in and attendance at arts and cultural events that relied on respondents finding a successful
strategy for recalling this information. Respondents were usually able to provide an answer
to these questions with little difficulty and the pre-coded responses appeared to work well.
However, a later question6 asked respondents how easy or difficult they had found it to
answer these questions and responses here varied. It was common for respondents who
rarely attended arts and cultural events to report that recall of frequencies had been fairly
easy:
“Very easy, I never go to these sorts of things”,
while those who were very interested or active in the arts had found it more difficult. In
other words the question worked at the second stage in the cognitive process for some, but
not all, respondents. As the majority of respondents did provide an answer to the frequency
12
SMB 49 7/01
Tania Corbin
Using the Omnibus Survey to pilot the Arts Council of England’s survey
question it is likely that some of the answers were based on estimates rather than the recall
of individual events. Findings such as these provide a context for the substantive data and
could be used to amend questions to make them work better for a larger number of
respondents.
Questions which may seem innocuous can nevertheless be sensitive. For respondents who
were less interested in the arts, questions on participation and attendance at cultural events
caused feelings of guilt and embarrassment and in one case were the cause of an argument
between the respondent and her spouse. There was evidence that people who had not taken
part in any of the listed activities attempted to show themselves in a better light by trying to
fit the things they had done into the available pre-codes. For example, a number of
respondents said that they had not read anything on a list of novels but then qualified this by
saying that they had read other books, autobiographies or texts which could be classed as
novels.7 It seemed important to these respondents to be able to say that they had participated
in some artistic activities. Respondents who reported doing nothing related to the arts or
culture often expressed concern at being thought of as “thick” or “uncultured”.
5. Lessons learned from the pilot
Incorporating cognitive questioning in an NS Omnibus module has proved to be an effective
and an efficient strategy for piloting a questionnaire. Not only have our customers at the
Arts Council been provided with information about the survey process and some substantive
results, but also insights into how respondents understood some of the questions. The latter
have been particularly useful in identifying questions requiring amendment and in
suggesting the form the amendments should take.
As a result of the pilot, activities and events such as such as visiting the library, museum or
gallery, stately home, castle or palace, or well-known gardens have been added to the survey
with the aim of making it more inclusive and less off-putting for respondents. The
questionnaire will also ask respondents whether they have ‘read for pleasure’ in the last 12
months. Those who say they have will then be asked for details of the kind of books they
have read. More generally, the language used on showcards to describe events and activities
has been simplified and the next survey will be accompanied by more detailed interviewer
instructions.
6. References
Arts Council of Great Britain - Extracts from the 1991 RSGB Omnibus Survey. Available
on-line only at www.artscouncil.org.uk/publications/pdfs/rsgb.pdf
Willis, G. B. - Cognitive Interviewing and Questionnaire Design: A Training Manual March 1994
Corbin, T. and Ursachi, K. - Report on the Arts Council Pilot Survey carried out in
November 2000 National Statistics Omnibus Survey - January 2001
Notes
1
The Omnibus is a multi-purpose survey developed by ONS which allows customers
to receive results quickly while retaining the hallmark of high quality - a random
probability sample, high response rates and advice on questionnaire design. It is
13
SMB 49 7/01
Tania Corbin
Using the Omnibus Survey to pilot the Arts Council of England’s survey
now the only Omnibus survey available that uses a stratified random probability
sample. See the website at www.statistics.gov.uk.
2.
These are: plays, opera, ballet, contemporary dance, jazz, classical music, art
galleries/exhibitions and ‘any performance in a theatre’, plus film).
3
The Question Testing Unit use a variety of qualitative methods, including cognitive
and in-depth interviews, focus groups, expert reviews and forms design, to improve
survey and questionnaire design and quality.
4.
Interviewer Questions
a “Could you tell me about any problems you had using the showcards, e.g. were some of
them too long?”
b “Finally, could you comment on whether the respondent found this module interesting.
Were there any parts they seemed to particularly enjoy or not enjoy?”
5. Respondent Questions
a. “Earlier we asked you about what arts and cultural events you had been to. Was there
anything else that you thought of as arts or culture which should have been mentioned but
wasn’t?”
b. “Thinking now about the questions which asked you how often you had been to these
events and how often you participated in arts and cultural activities. Was it easy or difficult to
remember the number of times you had been to or participated in these events?”
c. “Thinking now about the question which asked you about your attitude towards the public
funding of the arts in this country as a whole. Can you tell me what types of arts/cultural
events you were thinking about when you answered this question?”
d “And may I just check, what is your understanding of public funding? What I mean is,
where do you think the money comes from?”
6.
“Thinking now about the questions which asked you how often you had been to
these events and how often you participated in arts and cultural activities. Was it
easy or difficult to remember the number of times you had been to or participated in
these events?”
7.
The question about reading referred to in the text was:
I would like you to look at this card and tell me which, if any, of these things you have done in
the last 12 months (Please include things like community events and festivals but exclude
anything you have done as part of your job, at school or 6th form college).
a Buy a novel, other work of fiction, play or book of poetry for yourself
b Read a novel, other work of fiction, pay or book of poetry
c Write any stories or plays
d Write any poetry
e None of these
14
SMB 49 7/01
Developing a weighting and grossing system for the General
Household Survey
Jeremy Barton
1. Introduction
This is a report of methodological work carried out in response to the recommendations of
the General Household Survey (GHS) review conducted in 1997.
The study investigated the effect of incorporating two types of weighting procedures to
adjust for nonresponse bias and sampling bias on the General Household Survey. It follows
on from a previous report by Elliot (1999) which investigated the use of a single type of
weighting procedure (calibration weighting). In the analyses presented here, we investigated
pre-weighting the sample using simple nonresponse weights (sample-based weighting) prior
to computing calibration weights (population based weighting). The nonresponse weight
groups were created by using GHS data matched with (1991) Census records. Two sets of
weights were produced: one based on calibration weighting alone and one produced by preweighting with Census-linked nonresponse weights prior to calibration. This article will
detail the methodology used to form the nonresponse groups and show the probable effect
the sets of weights will have on nonresponse biases.
2. Background
This paper deals with weighting methods, the methodology generally applied to compensate
for missing data due to total nonresponse (all survey information for an individual or
household is completely absent) and not item nonresponse (information for particular survey
questions is missing). The latter tends to be dealt with using imputation methods. Weighting
for nonresponse involves applying a weight to each of the respondents so that they also
represent a certain number of nonrespondents that are similar to them in terms of survey
characteristics. The size of the weights will depend on the level of under-representation of
respondents in predetermined groups known as ‘weighting classes’. The weighting classes
tend to be based on characteristics that are known to be correlated to response.
Two forms of weighting procedures will be covered by these analyses. Firstly, sample-based
weighting procedures use information known about the non-respondents to calculate the
weights. Unfortunately here lies the main problem of the method, since by definition
nonrespondents will have little useful information recorded about them in the survey that can
be used for this purpose.
Population weighting adjustment, the second methodology covered, uses auxiliary
information to measure the extent to which the distribution of the responding sample over
several variables differs from the distribution of the known population over the same
variables (and time period). This method is sometimes known as post-stratification. The
known population distributions are usually obtained from a reliable source such as the
Census mid-year estimates or, as in these analyses, the population projections (adjusted by
the latest known population estimates) used to gross up the Labour Force Survey (LFS).
Accurate and up-to-date population data tend to be available only for a small number of
variables such as age, sex and region – these are the ones used for these analyses.
Not all SSD surveys presently weight to compensate for non-response, but two that do are
the LFS and the Expenditure and Food Survey (EFS). The LFS uses a population-based
weighting procedure, grossing up to population estimates within local authorities, age and
sex groups, and regions. The LFS can use local authorities in this procedure because it has a
15
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
relatively large sample size (approximately 60,000 households per quarter). It uses the
calibration method to calculate the weights. Calibration is explained more fully in section 5.
The EFS uses a sample-based weighing procedure combined with a population-based
procedure. The sample-based weights are calculated using the results of the Census-linked
nonresponse study (Foster, 1994). The population-based weights then adjust the samplebased weights so that the sample distribution matches the population distribution in predefined weighting classes based on region, age-group and sex. The calibration method is
also used to produce these weights. Variations of both the LFS method and the EFS method,
as applied to the GHS, are investigated in this paper.
2.1. Problems with the use of weighted data
The main aim of nonresponse weighting is to reduce nonresponse bias and hence to increase
the value of the survey data. There are some disadvantages however. Some weights can be
time-consuming to calculate and some statistical procedures can be more difficult to apply
and interpret when the data are weighted. Some consideration must also be given to the
presentation and interpretation of time series and trend data. Weighted data cannot be
meaningfully compared to unweighted data from previous years – any change in the
estimates may be solely due to the introduction of weighting. It is not uncommon to present
weighted and unweighted data together in the same report.
3. Data sources
There are two sources of data that were used in this study, one for the sample-based
procedure and one for the population-based procedure.
3.1. Sample-based data sources
We used information from the 1991 Census linked (at address level) to 1991 GHS survey
data. Using Census information restricts us to using survey data from around the time the
Census was carried out, i.e. April 1991, since we cannot assume that household information
correct in 1991 will remain correct in 1998 (or perhaps more importantly that the same
household is still living at the same address). However, we can compute weights based on
1991 data and use these to compensate for nonresponse on surveys in years after this. This
makes the assumption that relative nonresponse in the weighting classes has remained the
same between 1991 and the year the survey was carried out. Using Census data has the
advantages of providing a greater number of variables to create weighting classes, and that
many of these variables will be good predictors of nonresponse.
We also considered using information from the postcode address file (PAF) sampling frame,
current to the year of the survey (1998), to form weighting classes, since information for
both respondents and nonrespondents is present. This has the benefit of providing up-to-date
information, but with the drawback that only a small number of variables will be available to
produce weighting classes (region and area type: metropolitan/non-metropolitan and
ACORN 1 ). Another drawback is that the PAF will only contain information at the postcode
unit or postcode sector 2 level and not at the sampling unit (i.e. address ) level. Our analyses
(not presented in this report) concluded that weights based on this method were no different
to those based solely on calibration methods. However, further investigation of PAF-based
nonresponse weighting should be considered.
3.2. Population-based data
The population-based source of data we used was the LFS control totals taken from current
population projections. These are projections based on mid-year estimates, and have been
16
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
adjusted to exclude residents of institutions (from Census estimates), since these people are
outside the scope of the GHS.
4. Deriving weighting groups using Census-linked data
4.1. The Census-linked dataset
A GHS dataset from January to June 1991 was linked to Census information at address level
from the same year. This provided information on the Census characteristics of respondents
and non-respondents to the GHS. Foster (1994) first used this data in a study on Censuslinked nonresponse3 .
In order to be able to make recommendations for the current GHS we had to make sure that
any variables we used to calculate the nonresponse weights were comparable to variables on
the current (1998) survey. This meant that some categories needed redefining on some
variables, for example, Standard Region had since become Government Office Region
(GOR) with slightly different boundaries.
Variables used in the Census-linked analysis of response were restricted to those which had
GHS equivalents (in 1998) and which showed fairly close agreement with the GHS
equivalent in 1991 (see Foster (1994)). Table 1 shows the variables.
Table 1: Variables used in Census-linked analysis of nonresponse
Variable
Basic Government Office Region (11 categories)
Number of cars
Sex of HOH
Marital status of HOH
Household type 1
Household type 2
Household type 3
Pensioner household
Lone Parent household
Type of building
Tenure
Central Heating
Age of HOH
No. of persons in hh
No. of adults in hh
No. of children in hh
No. of >60’s in hh
No. of dependent children in hh
Age of youngest dependent child
Social class of HOH
Economic status of HOH
SEG of HOH
17
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
Table 2: Weighting classes formed in the CHAID analysis
Level 1 split
Level 2 split
Level 3 split
Level 4 split
Region
North East
Merseyside
Yorks & Humbs
W Midlands
South East
Scotland
No. of Cars
0 or 1
No. of dependent
children
0 or 1
Household type 1
1 adult 16-59
Youngest 5-15
3+ adults no child
Household type 1
2 adults 16-59
Youngest 0-4
2 adults, 1 or 2 60+
1 adult only 60+
Region
North West
E Midlands
Eastern
South West
Pensioner HH
No pensioner in
HH
Region
London
2
3
No. of dependent
children
2 or more
No. of Cars
2 or more
Pensioner HH
Pensioner in HH
Weight
class
1
4
SEG (grouped)
Skilled Manual
Partly-skilled
manual
Unskilled manual
& others
Not employed in
last 10 years
SEG (grouped)
Professional
Manager/employer
Intermediate/jnr
No. of adults
1
No. of adults
2 or more
5
6
7
Social Class
I
II
IV
Not employed in
last 10 years
Social Class
IIInm
IIIm
V
Other
8
9
10
Type of building
Detached
Semi-detached
Terraced
Converted
flat/other
Type of building
Purpose built flat
11
12
Region
Wales
18
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
4.2. Identifying the weighting classes
The software package Answer Tree was used to produce groups that differed in terms of
response rates based on their census characteristics. The CHAID 4 analysis, of which this is a
part, uses chi-squared statistics to identify optimal splits or groupings of independent
variables in terms of predicting the outcome of a dependent variable, in our case response.
The resulting groups are mutually exclusive and cover all the cases in the sample.
The analysis was set so that the smallest category was at least 100 (2.7% of total sample
size) in order to prevent forming many very small groups. Table 2 shows the exact way the
variables were used to produce the twelve weighting classes. For example, weight class 1
was formed of households in the North East, Merseyside, Yorkshire and Humberside, the
West Midlands, the South East, and Scotland; that had one or no car; one or no dependent
children; and that also contained either 1 adult aged 16-59, or a youngest child aged 5-15, or
3 or more adults but no children.
4.3. Calculating the nonresponse weights
The CHAID nonresponse weight Wi for weighting class i is calculated as:
Wi =
( Ri + NRi )
Ri
where,
Ri = No. of respondents in weight class i, NRi = No. of nonrespondents in weight class i
For the nonresponse weights to be most effective the following should hold true:
• response should vary as much as possible between the weighting classes;
• the survey variables with likely nonresponse bias need to be correlated to the
classes;
• the distributions of the survey variables should differ as little as possible between
respondents and non-respondents within each class.
CHAID ensures that the first rule is true. Clearly the second rule will hold true for those
variables used to produce the weighting classes, and for those obviously related to them. For
those variables not obviously correlated, e.g. tenure, possession of consumer durables, chisquare tests were used to check that there was variation between the weighting classes. All
variables that were analysed appeared to be highly correlated to weighting class. The final
rule is more difficult to assess since we do not have survey information on non-respondents,
but by defining classes using variables known to be correlated to survey variables we can
minimise this variation.
5. Production of calibration weights in CALMAR
Population-based weighting schemes address additional potential sources of bias to samplebased weighting schemes (e.g. sample non-coverage), and this is why the two types of
procedures are sometimes used in conjunction. The population-based weights we produced
were computed using a calibration procedure called CALMAR (Sautory, 1993), which is a
SAS-based macro. The procedure, also known as raking ratio or rim weighting, is as
follows. The sample needs to be divided into weighting classes similar to those produced for
nonresponse weighting. The difference will be that the weighting classes will have known
population totals. The population figures used are the LFS control totals used to gross the
LFS. The calibration process produces weights that, when applied, will produce a
19
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
distribution in the sample weighting classes that matches the distribution in the equivalent
population weighting classes to within a preset level of precision.
The process is not straightforward because two sets of variables are used: age-groups within
sex; and region. Normally using two variables would produce a 2-dimensional crossclassification table where each cell in the table would be a different combination of age/sex
and region. However, because the GHS sample is relatively small, the likelihood of small or
null cells (cells with no observations) becomes high. Such instances would produce either
very large weights, or in the case of null cells, would not be able to produce any weights.
Calibration methods avoid this by adjusting the sample distribution to the population
distribution for each variable in turn and then repeats the process until the marginal
distributions of the sample (the distributions for each variable) eventually match that of the
population distributions.
The weighting classes used were those that were recommended for the GHS by Elliot
(1999) 5 and comprised 12 age/sex categories and 5 regions as follows (Table 3):
Table 3: Weighting classes used for CALMAR analysis
Age/sex
0-4
5-14
15-24 Male
15-24 Female
25-44 Male
25-44 Female
45-64 Male
45-64 Female
65-74 Male
65-74 Female
75+ Male
75+ Female
Region
London
Scotland
Wales
Other metropolitan
Other non-metropolitan
Since we are using calibration weighting in conjunction with nonresponse weights, the
sample (the 1998 responding GHS dataset) first needs to be weighted by the nonresponse
weights calculated above. These weights are then scaled by n/N, where n is the responding
sample size (persons) and N is the total population size we are grossing to. The calibration
procedure effectively re-adjusts these weights. This means that we do not need to combine
the nonresponse weights with the calibration weights by multiplication – the weight
produced by CALMAR is the final weight to apply to the GHS dataset. The CALMAR
weight is often known as a grossing weight because it can weight the sample up to the
population.
Two sets of weights were produced by CALMAR: weight a was produced without preweighting with the nonresponse weight; and weight b was produced by first pre-weighting
by the Census-linked nonresponse weight. Both weights were approximately normally
distributed as described in Table 4.
Table 4: Distribution descriptives of weights a and b
Weight
Mean
a
b
2.79
2.83
Standard
deviation
0.50
0.60
20
Range
0.91 – 7.49
0.36 – 6.92
No. of grossedup households
24,110,000
24,440,000
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
Note that the means of the weights are different. This is because we are calibrating
households to individual population estimates. Note also that for this analysis the population
totals used in the calibration were scaled down by a factor of 1000. The figures given for
‘No. of grossed-up households’ are those which would have arisen if the weights had not
been scaled.
6. Results
The 1998 GHS responding sample was weighted by the two weights (a & b) in turn and the
weighted distributions of key variables were compared with each other and to the
unweighted distributions.
Absolute percentage point differences between the weighted distributions and the
unweighted distributions were computed and are shown in tables A1 to A3 in Appendix A.
The distributions of variables weighted by weights a and b differ noticeably from the
unweighted distributions, so clearly both the nonresponse-only weighting and the
nonresponse + CALMAR weighting are having an effect. Additionally, it appears that in
most instances the weighting is moving the estimates in the same direction for both sets of
weights. Exceptions to this are:
•
•
•
•
•
•
•
•
•
percentage of one-adult-only households;
rents from local authority;
rents from housing association;
car ownership (any level);
satellite TV ownership;
percentage of HoHs in the Professional SEG group;
percentage of HoHs in the employer and manager SEG group;
percentage of HoHs in the partly/unskilled manual SEG group;
percentage of adults who were divorced.
All the other estimates are weighted in the same direction (or have no effect) when weighted
by either weight a or b, although the relative size of the change differs between the two. The
absolute difference of the weighted to unweighted estimate is, in most cases, small for both
weights. However for some estimates the absolute differences between weighted and
unweighted estimates are quite marked. Table 5 lists estimates where the absolute difference
was greater than 1%.
The differences between estimates weighted by weight b and the unweighted estimates
tended to be higher than those for estimates weighted by weight a. This is what we would
want if we believed that the differences represent real biases inherent in the unweighted data.
However we cannot be absolutely sure that either weight is moving the estimate in the
direction required or is of the correct magnitude to remove bias, and therefore cannot be sure
that the weighting is reducing/eliminating the bias or increasing it.
We do have some indication for some of the variables of the likely direction of the bias,
assuming that nonresponse biases on the GHS had remained fairly consistent over time. This
indicator is drawn from the original Census-linked nonresponse work carried out by Foster
(1994) and also re-produced in Elliot’s (1999) report on calibration methods. The Census
correction factor is the ratio of the set sample estimates to the responding sample estimates.
This ratio is also used to compute the Census-linked CHAID weights within each weighting
class. We would therefore expect the ratio of the estimates weighted by the Census-linked
CHAID weight b to be similar to the Census correction factor, especially for those variables
that were used to form the weighting classes. This seems to be the case for most of the
variables where there was a Census correction factor from the Foster report. This doesn’t
prove that weight b (nonresponse + CALMAR) is the correct weight to use since this
21
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
assumes that the relative biases have remained constant since 1991. However, if we felt that
the biases that were found in the 1991 dataset still hold true, then we would expect that
weighting by b would reduce the biases for these variables.
Table 5: Estimates with absolute differences between weighted and unweighted
of more than 1%
Absolute difference weight a - unweighted
2 adults in hh
No children in hh
Youngest child aged 0-4
3 or more adults, no children in hh
Owns property outright
Marital status – single
Marital status –married & living with spouse
% men aged 18-24 economically active
% women aged 18-24 economically active
Absolute difference weight b - unweighted
1 person in hh
2 people in hh
1 adult in hh
2 adults in hh
No children in hh
Youngest child aged 0-4
1 adult & no children in hh
2 adults & no children in hh
2 cars in hh
Owns property outright
Owns drier
Marital status – single
Marital status –married & living with spouse
% men aged 18-24 economically active
% men current cigarette smokers
An effect of sample-based weights is that, unlike population-based weights, they increase
the variances of the survey estimates. The size of the increase is proportional to the variance
of the weights themselves. The variance in weights can be exacerbated by having many
weighting classes with small numbers of observations within them. This has not been the
case in this study and we would therefore expect the variance increase to be small. Simple
estimates6 calculate the increase in variance to be in the region of 2%. This should be offset
by the decrease in variance due to population weighting, as reported by Elliot (1999), for the
majority of variables.
7. Recommendations
We have recommended the use of a dual weighting scheme. The benefits of population
(calibration) weighting schemes have been reported in detail by Elliot (1999). Estimates
produced by a dual scheme and by population weights only are similar for most GHS
variables. There was some concern that the relative response within the nonresponse
weighting classes may have changed since 1991, and that this may lead to application of
incorrect weights. Indeed, even survey to survey the relative response within certain groups
of the population will also vary, so the use of weights not based on the actual sample to
which they are applied are always likely to lead to a certain amount of imprecision.
However, we take the view that the weights are likely to be reducing bias, at least to some
degree, rather than increasing it, and that this can only be beneficial.
22
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
References
Elliot, D. (1999) Report of the Task Force on Weighting and Estimation, GSS Methodology
Series.
Elliot, D (1991) Weighting for non-response. A survey researcher’s guide, New
Methodology Series 17. OPCS
Foster, K (1996) A comparison of census characteristics of respondents and non-respondents
to the 1991 FES. Survey Methodology Bulletin, 38, 9-17.
Foster, K (1994) The General Household Survey report of the 1991 census-linked study of
survey non-respondents. OPCS (unpublished)
Foster, K (1994) Weighting the FES to compensate for non-response, Part 1: An
investigation into census-based weighting schemes. OPCS. (unpublished)
Kish, L (1965) Survey Sampling. Wiley.
Sautory, O (1993) La Macro CALMAR – redressement d’un echantillon par calage sure
marges. Series des documents de travail de la direction des statistiques et sociales, No.
F9310. INSEE, Paris.
Notes
1.
ACORN is a geodemographic classification of postcode areas used extensively in
market and social research produced from (1991) Census information and updated
annually. ACORN represents a ‘half-way house’ between area-level classifications
such as region and area type, and household-level classifications such as building type
and SEG of HoH.
2.
A postcode unit is the area specified by the 7 character postcode, e.g. “SW1V 2QQ”.
A postcode sector is the area specified by the first 5 characters of the postcode unit,
e.g. “SW1V 2”. Therefore sectors cover a much larger area than units. For sampling,
small sectors are usually grouped with larger sectors to provide a minimum number of
addresses within them.
3.
It should be noted that approximately 4% of the households were unmatched. The
majority of households were matched by address information. However it is likely that
in a further small proportion of these, the households at the matched address were not
the same in the Census as in the GHS. For example, of those households that were
matched and were responding (on the GHS) 10% contained a different number of
persons in the household on the GHS than on the Census form. See Foster (1994) for
more information.
4.
CHAID is an acronym that stands for Chi-squared Automatic Interaction Detection.
5.
The LFS uses a slightly differently defined region variable. Also, the LFS and FES
use age-groups 5-15, 16-24 and so on.
6.
Estimates of increase in variance (L) are based on the following formula from Kish
(1965) for the ratio of the variance of a mean in a randomly weighted simple random
sample (SRS) to that in an unweighted SRS:
(∑ n )(∑ n k )
L=
(∑ n k )
2
h
h h
2
h
h
where n h is the sample size in weighting class h and k h is the weight for class h.
23
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
Appendix A
Table A1: Household level variable distributions
% of all households
Weighted by:
Absolute
difference (%)
weighted –
unweighted
Ratio of
weighted
/unweighted
Census
corr.
factor
Una = Calmar only
weight- Weight Weight Weight Weight Weight Weight
b = nonresponse+ Calmar
ed
a
b
a
b
a
b
Household size
1 person
2 persons
3 persons
4 persons
5 or more persons
28.6
35.8
15.3
14.0
6.3
29.0
35.2
15.6
14.0
6.1
30.5
34.5
15.6
13.6
5.9
0.4
-0.6
0.3
0.0
-0.2
1.9
-1.3
0.2
-0.4
-0.4
1.01
0.98
1.02
1.00
0.97
1.07
0.96
1.01
0.97
0.94
1.03
1.00
1.01
0.97
0.98
Number of adults *
1 adult
2 adults
3 adults
4 or more adults
34.3
51.9
10.2
3.7
33.8
50.8
10.9
4.5
35.7
49.2
10.7
4.4
-0.5
-1.1
0.7
0.8
1.4
-2.7
0.5
0.7
0.99
0.98
1.07
1.22
1.04
0.95
1.05
1.19
1.02
0.98
1.04
1.01
Number of children *
No children
1 child
2 children
3 or more children
71.1
12.3
11.5
5.1
72.4
12.4
10.8
4.4
72.6
12.6
10.5
4.3
1.3
0.1
-0.7
-0.7
1.5
0.3
-1.0
-0.8
1.02
1.01
0.94
0.86
1.02
1.02
0.91
0.84
1.01
0.99
0.95
0.92
Age of youngest child *
No children
Youngest aged 0-4
Youngest aged 5-9
Youngest aged 10-15
71.1
13.0
8.2
7.7
72.4
11.7
8.0
7.9
72.6
11.5
8.0
7.8
1.3
-1.3
-0.2
0.2
1.5
-1.5
-0.2
0.1
1.02
0.90
0.98
1.03
1.02
0.88
0.98
1.01
1.01
0.95
0.98
0.98
28.6
5.7
33.1
15.1
3.7
9.4
4.4
29.0
4.8
32.8
14.8
3.2
10.6
4.8
30.5
5.2
31.7
14.5
3.1
10.4
4.6
0.4
-0.9
-0.3
-0.3
-0.5
1.2
0.4
1.9
-0.5
-1.4
-0.6
-0.6
1.0
0.2
1.01
0.84
0.99
0.98
0.86
1.13
1.09
1.07
0.91
0.96
0.96
0.84
1.11
1.05
1.03
1.00
0.99
0.96
0.92
1.04
1.02
27.6
44.0
22.8
5.6
27.2
43.7
23.1
6.0
28.2
44.6
21.7
5.5
-0.4
-0.3
0.3
0.4
0.6
0.6
-1.1
-0.1
0.99
0.99
1.01
1.07
1.02
1.01
0.95
0.98
1.03
1.00
0.95
0.98
Household type (Census) *
1 adult only
1 adult with child(ren)
2 adults only
2 adults with 1,2 children
2 adults with 3+ children
3 or more adults only
3 or more adults with
child(ren)
Car ownership *
No car
1 car
2 cars
3 or more cars
* variable used to form weighting classes
24
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
Table A1 (continued): Household level variable distributions
% of all households
Weighted by:
Absolute
difference (%)
weighted –
unweighted
Ratio of
weighted
/unweighted
Census
corr.
factor
Una = Calmar only
weight- Weight Weight Weight Weight Weight Weight
b = nonresponse+ Calmar
ed
a
b
a
b
a
b
Tenure (harmonised)
Owns outright
Buying on mortgage
Rents from LA
Rents from HA
Rents privately
28.0
41.2
16.4
5.2
9.3
26.9
42.1
15.9
5.1
10.0
26.3
41.5
16.5
5.3
10.3
-1.1
0.9
-0.5
-0.1
0.7
-1.7
0.3
0.1
0.1
1.0
0.96
1.02
0.97
0.98
1.08
0.94
1.01
1.01
1.02
1.11
Ownership of consumer
durables
Satellite TV
VCR
Washing machine
Drier
Dishwasher
Microwave oven
CD player
Home computer
29.2
85.0
91.9
52.5
23.8
78.5
68.4
33.6
29.3
84.9
91.5
52.0
23.8
78.2
69.4
34.6
29.0
84.6
91.1
51.4
22.8
77.9
69.2
34.0
0.1
-0.1
-0.4
-0.5
0.0
-0.3
1.0
1.0
-0.2
-0.4
-0.8
-1.1
-1.0
-0.6
0.8
0.4
1.00
1.00
1.00
0.99
1.00
1.00
1.01
1.03
0.99
1.00
0.99
0.98
0.96
0.99
1.01
1.01
Central heating
90.3
90.3
90.0
0.0
-0.3
1.00
1.00
0.99
0.99
1.02
1.00
1.05
0.99
* variable used to form weighting classes
Table A2: Head of household level variable distributions
% of all Head of
Households
Weighted by:
Absolute
difference (%)
weighted –
unweighted
Ratio of
weighted
/unweighted
Census
corr.
factor
Una = Calmar only
weight- Weight Weight Weight Weight Weight Weight
b = nonresponse+ Calmar
ed
a
b
a
b
a
b
SEG of Head of
Household *
Professional
Employer or manager
Intermediate non-manual
Junior non-manual
Skilled manual
Partly/unskilled manual
7.4
19.9
12.4
12.4
26.4
21.4
7.6
20.0
12.4
12.2
26.7
21.1
7.3
19.4
12.4
12.3
27.0
21.6
1.03
1.01
1.00
0.98
1.01
0.99
0.99
0.97
1.00
0.99
1.02
1.01
1.03
1.01
1.00
0.98
1.01
0.99
0.99
0.97
1.00
0.99
1.02
1.01
0.96
0.99
0.98
1.00
1.01
1.05
Address one year ago
Lived elsewhere
10.5
11.2
11.5
1.07
1.10
1.07
1.10
1.03
* variable used to form weighting classes
25
SMB 49 7/01
Jeremy Barton
Developing a weighting and grossing system for the GHS
Table A3: Individual level variable distributions
% of all individuals
(unless stated)
Weighted by:
Absolute
difference (%)
weighted –
unweighted
Ratio of
weighted
/unweighted
Census
corr.
factor
Una = Calmar only
weight- Weight Weight Weight Weight Weight Weight
b = nonresponse+ Calmar
ed
a
b
a
b
a
b
Marital status (% adults)
Single
Married & living with
Married and separated
Divorced
Widowed
24.7
56.4
2.7
7.6
8.6
27.2
54.6
2.5
7.4
8.3
27.7
53.5
2.7
7.7
8.3
2.5
-1.8
-0.2
-0.2
-0.3
3.0
-2.9
0.0
0.1
-0.3
1.10
0.97
0.93
0.97
0.97
1.12
0.95
1.00
1.01
0.97
93.1
2.0
3.3
93.1
2.0
3.1
93.1
2.1
3.0
0.0
0.0
-0.2
0.0
0.1
-0.3
1.00
1.00
0.94
1.00
1.05
0.91
13.2
11.0
75.8
13.6
10.8
75.5
13.4
10.7
75.9
0.4
-0.2
-0.3
0.2
-0.3
0.1
1.03
0.98
1.00
1.02
0.97
1.00
51.0
70.8
7.9
50.9
69.0
7.9
50.9
68.1
7.8
-0.1
-1.8
0.0
-0.1
-2.7
-0.1
1.00
0.97
1.00
1.00
0.96
0.99
50.7
61.8
4.0
51.5
63.4
3.8
51.7
62.7
3.8
0.8
1.6
-0.2
1.0
0.9
-0.2
1.02
1.03
0.95
1.02
1.01
0.95
Limiting longstanding illness
Male
18.7
Female
21.1
18.2
20.8
18.5
21.0
-0.5
-0.3
-0.2
-0.1
0.97
0.99
0.99
1.00
Consulted NHS GP in
14 days before interview
Male
Female
11.8
16.6
11.5
16.7
11.5
16.7
-0.3
0.1
-0.3
0.1
0.97
1.01
0.97
1.01
Current cigarette
smokers (% adults)
Men
Women
28.2
26.1
29.1
26.0
29.6
26.5
0.9
-0.1
1.4
0.4
1.03
1.00
1.05
1.02
Weekly alcohol consumption (% aged 18 or over)
Men > 21 units
Women > 14 units
26.8
14.6
27.5
14.8
27.7
14.7
0.7
0.2
0.9
0.1
1.03
1.01
1.03
1.01
Ethnic group
White
Black
Indian, Pakistani,
Bangladeshi
Education level (% 16-69
not in full-time education)
Degree or equivalent
Higher qualification
GCE or below
Economic activity
Men
16-17
18-24
65 and over
Women
16-17
18-24
65 and over
26
0.94
0.94
1.01
SMB 49 7/01
The General Household Survey as a sampling frame: the
example of the Survey of Poverty and Social Exclusion1
Joanne Maher
1. Introduction
The practice of using an existing survey to identify a sample of people or households with
certain characteristics is commonly used as a relatively inexpensive method of carrying out
special population surveys when an alternative sampling frame is not readily available. The
use of these follow-up surveys raises some interesting methodological issues which are
explored in this article in connection with a survey of Poverty and Social Exclusion (PSE)
which was carried out as a follow-up to the 1998/99 General Household Survey2 (GHS).
This article will describe the background to the PSE, the methodological issues raised and
considerations made by conducting the survey as a follow-up to the GHS, principles which
can also apply to other follow-up surveys. Because aspects of this work have not been
published elsewhere, this article gives a fuller report of the project for researchers who are
interested in the PSE3 itself as well as in the general methodological issues raised.
A follow-up survey is designed to re-interview former survey respondents who are chosen
on characteristics identified from responses given at an earlier interview. As such it enjoys
the benefits (and inherits the difficulties) of the ‘parent’ survey as well as presenting a
number of methodological issues peculiar to this type of survey. The issues that need to be
addressed include sample design, questionnaire design, respondent burden and non-response
bias (including opportunities for weighting). These issues are foremost in many survey
designs but take on another dimension in the context of the follow-up survey. The aims of
the follow-up survey are constrained by the parent survey’s design and content but are also
enhanced by the rich source of information on which the follow-up survey can be based.
The follow-up survey for the PSE was drawn from respondents to the 1998/9 GHS, who
were interviewed in detail about their circumstances and views on a range of issues
associated with poverty and social exclusion. By using the GHS it was possible to:
•
Draw a sample with known characteristics;
•
Use GHS data for cross-topic analysis in conjunction with PSE data;
•
Reduce the length of the PSE questionnaire by not repeating questions previously asked
in the GHS interview;
•
Identify characteristics of PSE non-responders in the GHS dataset.
2. The Sample
Findings from the PSE preparatory research called for a reliable source of income data, on
which to measure poverty on the basis of income. The PSE needed to interview enough
people in low income households to allow analysis of this group. If a readily available and
useable list of such households had been available, then this could have been used as a
sampling frame or a large scale sift could have been carried out to identify such households.
However, no such reliable frame exists and a sift would have been expensive, particularly as
income data is difficult to collect reliably in this way. A follow-up survey was, therefore,
deemed the most efficient way of identifying the sample and the GHS was the survey that
collected all the information necessary for a follow-up. As the authors of Poverty and Social
Exclusion in Britain 4 note, “the PSE survey makes major use of income data from the GHS
but measures poverty in terms of both deprivation and income level: whether people lack
27
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
items that the majority of the population perceive to be necessities, and whether they have
incomes too low to afford them.”
The PSE benefited from using the GHS as a sampling-frame by being able to select a sample
with known characteristics. The sample design had to meet three main objectives:
1. Sufficient cases were required for the analysis of key variables by sub-groups.
2. Sufficient cases were required for separate analysis of households and individuals in
Scotland.
3. Sufficient cases of low income households and respondents were required to examine
their characteristics.
The fact that the GHS dataset contains income details, household composition and regional
information allowed the sample to be selected on these key variables.
Identifying individuals involved a three-stage process5 :
1. a number of areas were selected from all those used for the 1998/9 GHS;
2. a number of households were selected from each of the areas;
3. one individual was chosen from each sampled household.
3. Sampling from edited data
3.1. Calculating equivalised income
The sample gave a greater probability of selection to people in lower income groups. An
equivalised income measure was developed by Jonathan Bradshaw of York University, in
conjunction with ONS. The McClements equivalence scale, which is used as the standard
by ONS, was felt not to be appropriate for the PSE, as it does not assign sufficient weight to
children, particularly young children. The scale used for the PSE was designed to take
account of this. Each member of the household was assigned a value, shown in Table 1:
Table 1 Equivalised income scale
Type of household member
Equivalence value
Head of household
Partner
Each additional adult (anyone over 16)
Add for first child
Add for each additional child
If head of household is a lone parent, add
0.70
0.30
0.45
0.35
0.30
0.10
The values for each household member were added together to give the total equivalence
value for that household. This number was then divided into the gross income for that
household. For example, the equivalence value for a lone-parent household with two
children is 0.7 + 0.35 + 0.3 + 0.1 = 1.45. If the household’s gross income is £10,000, its
equivalised income is £6,897 (=£10,000/1.45).
The income, household composition and relationship information collected on the GHS
allowed the equivalised income to be created for the purpose of drawing the sample.
However, it was not only in the sample design that information collected during the GHS
28
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
interview was put to use in the design of the PSE. GHS data was also used to enhance PSE
fieldwork procedures, questionnaire design and weighting for non-response.
3.2. Selecting areas
Because the GHS is a nationally representative sample of private households in Britain, any
probability sub-sample from the GHS will also be nationally representative. For the smaller
sample size required for this survey, it would have been most economical to sub-sample the
GHS areas and select all GHS respondents in those areas. However because of the
requirement to oversample lower income households, a larger sample of areas was required.
In fact, 70% of GHS areas in England and Wales were selected (360 areas from a total of
518 ). All of the 54 Scottish areas were sampled to provide sufficient cases for separate
analysis of the Scottish data.
To allow for variation in income within the areas the list of PSUs was sorted on area and
equivalised income quintile groups before any selections were made.
3.3. Selecting Households
A sample of 2846 households was taken from the selected areas. PSE required sufficient
cases in lower income households to allow analysis of those found to be in poverty by an
income based measure. However, for the purposes of testing theories of poverty and social
exclusion, a sample of all income groups was required. For this reason households were
selected in different proportions according to household income. The equivalised income of
the household was grouped into quintiles which were then sampled in the following
proportions:
Table 2 Proportions sampled in each quintile group
Quintile group
Proportion sampled
Bottom quintile (lowest income)
Fourth quintile
Third quintile
Second quintile
Top quintile (highest income)
40%
30%
10%
10%
10%
3.4. Selecting individuals
One adult aged 16 or over was selected at random from each sampled household using a
Kish grid. This was done in preference to interviewing all eligible adults to reduce interview
length and costs. It was also believed that when asking opinion questions (on which the PSE
was heavily based), household members may influence each other’s answers; the risk of this
was hoped to be reduced by interviewing one person.
3.5. Partial and proxy interviews
Only those who had given a full interview in the 1998/9 GHS were eligible for selection. As
the intention was to analyse the PSE in combination with GHS data, proxy and partial
interviews were excluded from the sample because of their limited usefulness for analysis.
By removing proxies and partials from the sample item non-response in the GHS data was
reduced for PSE cases when analysing combined PSE and GHS data.
29
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
4. Contacting the respondent
An advantage of using a survey such as the GHS as a means of drawing a follow-up sample
is that a more personalised approach could be adopted when re-contacting the GHS
respondent than could normally be used when contacting an unknown household at a
sampled address for the first time. Advance letters were addressed to the selected
respondent, except when a name was not given during the GHS interview, when the letter
was addressed to ‘the resident’.
Where contact telephone numbers were available, interviewers made initial contact with the
respondent by telephone. Once an appointment was made with the respondent the
interviews were conducted face-to-face. In the event of a broken appointment, interviewers
were instructed to make a maximum of two visits to an address before recording a noncontact, unless they were already in the area and could make an extra call without driving
out of their way.
5. Questionnaire Design
Choosing a survey design based on a follow-up of the GHS meant that detailed information
was already available on those topics covered by the GHS. Questions asked during the GHS
interview need not be repeated again in the follow-up survey. The 1998/9 GHS
questionnaire held 68 questions on income which, if included again in the PSE would have
taken some time from the PSE interview that could be used for other questions. By
following up the GHS respondents the PSE was able to concentrate on operationalising
concepts of poverty and social exclusion within the survey instrument and was able to
reduce the length of the PSE questionnaire.
As the follow-up interviews6 took place between 6 and 18 months after the original GHS
interview, a small number of follow-up questions were included in the PSE questionnaire to
record changes to the household composition, tenure, employment and income. Where
appropriate this was carried out by the technique of dependent interviewing. “Dependent
interviewing can be used to remind the respondent of previously reported information or to
probe for inconsistent responses between data provided in the current interview and data
previously provided” (Mathiowetz & McGonagle, 2000).
Questions were included in the PSE questionnaire which asked whether there had been any
changes to the respondent’s individual and household income since the GHS interview. The
amount by which the income had changed and reasons for the change in income were also
recorded. It was possible to feed the date of the GHS interview into the question to aid the
respondent’s recall.
The PSE questionnaire included questions relating to housing conditions. The GHS housing
tenure question was fed-forward to the PSE questionnaire to check for changes in tenure
over time. Questions were repeated in the PSE for the purposes of verification, but did not
have to be asked again unless the situation had changed7 .
The PSE dependent interviewing was designed to remind the respondent of the answer they
gave during the GHS interview. This approach was adopted to aid the respondent’s recall
and reduce the length of the interview. Such proactive interviewing is adopted to reduce
respondent burden; however it is not always considered suitable for follow-up interviewing
as it allows the respondent to agree too readily with his/her previous response but it was
believed that the tenure question would not be disadvantaged by this approach.
Little research into the effects of dependent interviewing and what constitutes best practice
exists (Mathiowetz and McGonagle, 2000). The effects of rewording questions were
therefore carefully considered and care was taken to change the question wording as little as
30
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
possible. This approach seemed to have worked well and interviewers did not report any
difficulties with feed forward questions.
Dependent interviewing is not always an appropriate approach. It is not appropriate where a
respondent is asked an opinion-based question and they may change their response over
time. For example, the GHS health questions were repeated in the PSE questionnaire as
experience from other surveys, such as the Health Education Monitoring Survey (HEMS),
showed that respondents gave different answers on their health status over time. Of
particular interest in HEMS were those who changed their reports of having a limiting longstanding illness. Work by ONS’s Question Testing Unit has shown that the question asking
whether the respondent has a longstanding illness that limits their activities presents a
number of difficulties to the respondent, which may explain why there is variation in the
answers given over time. These health questions were repeated in the PSE interview as
future analysis of health and social exclusion was anticipated.
It would have been possible to route respondents to the PSE questions on health and wellbeing and use of health and social services by their answers to the GHS health questions.
However, it was feared that routing in this way would increase the risk of asking
respondents inappropriate questions or under-reporting health problems in the PSE
questionnaire, where the respondent’s health might have changed or their opinion on their
health changed since the GHS interview. Comparison of the GHS and PSE responses to the
limiting and long-standing health questions showed that some respondents’ opinion about
their health had changed between the two surveys. The responses can be seen in Table 4
below.
Although the distribution of longstanding illness for PSE and GHS respondents were
broadly similar (see Table 3 below) there was some variation in the answers given by
individuals between surveys (see Table 4). For example, of those who reported a limiting
long-standing illness in the GHS 14% reported that they did not have a longstanding illness
in the PSE (see Table 4 below). These changes could reflect real changes in health but could
also be attributed to context effects or to the fact that answers to questions of this nature are
often affected by how a person feels on the day they are interviewed and this illustrates a
common finding, that such questions should not be treated as factual.
Table 3 The distribution of longstanding illnesses for PSE and GHS respondents
GHS (%)
PSE (%)
30.2
15.7
54.0
100.0
1534
32.0
14.0
54.0
100.0
1534
Limiting Longstanding Illness
Non-Limiting Longstanding Illness
No Longstanding Illness
Total
base (numbers)
31
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
Table 4 GHS responses to Limiting Longstanding illness by PSE responses
PSE
GHS
%
Limiting
Longstanding
Illness
%
Non-Limiting
Longstanding
Illness
%
No
Longstanding
illness
%
Total
Limiting
Longstanding Illness
Non-Limiting
Longstanding Illness
No Longstanding
Illness
Don’t know
Total %
Base
22.9
4.0
5.1
32.0
3.1
6.5
4.5
14.1
4.2
5.3
44.3
53.8
0
30.2
464
0
15.7
541
0.1
54.0
829
0.1
100
1534
6. Interview length and Respondent Burden
Respondent burden was an issue for the content of the PSE. In particular, the interview
length was of greatest concern for respondent burden and the ultimate effects on the
response rate. ”Long interviews, or unpredictable lengths, can interact with interviewer
behaviour, and can limit the number of interviews which can be completed in any one day.
Where fieldwork periods are restricted, or days in an area limited to control costs, response
may be affected.” (Martin and Matheson; 1999) A great deal of work was done to keep the
questions on the PSE to a minimum to reduce respondent burden.
The average length of interview was 60 minutes. With older respondents or those who had
literacy problems, it took about 90 minutes. Questions requiring a lot of thought or those
involving difficult concepts, such as assessments of absolute and overall poverty, added to
the length of interview for some respondents. Three types of data collection were adopted in
PSE: face-to-face interviews, a self-completion module and a card-sorting exercise. The
mixed mode interview was adopted to allow a change of pace of the interview in an attempt
to keep the respondent’s interest, but for some this may have become burdensome.
7. Response rate
The length of the questionnaire affected the response rate. ONS interviewers are required to
give an assessment of how long the interview is likely to take when making an appointment,
to ensure that respondents set aside sufficient time. Some sampled individuals refused to
take part on hearing that the interview was likely to last for an hour. The relatively short
field period (a month), meant interviewers also did not have sufficient time to call back on
many households to attempt refusal conversion.
Of the 2,431 eligible individuals, 1,534 (63%) were interviewed, the vast majority
completing a full interview. One of the drawbacks of using follow-up surveys as a means to
identify special populations is that it suffers from two waves of non-response; that of the
original survey and that of the follow-up. As such this response rate was disappointing, and
may reflect some of the factors outlined above. However, the availability of information
about non-responders means that it is possible to compensate for non-response by weighting.
32
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
8. Weighting the data
As noted earlier, the PSE interviewed one person per household, over-sampled households
in Scotland, and over-sampled households in the lowest quintile groups of equivalised
income. Several weights were calculated to allow for the probability of selection and also to
compensate for non-response. The final weight is made up of four elements: a weight for
country, a weight for income quintiles, a weight for the probability of selection of
individuals from households, and a weight for non-response. Details of the weighting for
probability of selection are presented in Appendix 1.
A benefit of using the GHS as a parent survey for PSE was that it was possible to
compensate for non-response by weighting. To weight for non-response at this stage,
response rates were analysed to see how they varied according to categories defined by 1998
GHS data. Variables that were compatible with the PSE were selected from the GHS data to
investigate the effect of non-response in PSE.
The variables examined were as follows:
•
Sex
•
Age
•
Marital status
•
Number of adults in the household
•
Government Office Region
•
Tenure
•
Type of accommodation
•
Number of vehicles
•
Household type
•
Limiting longstanding illness
•
Equivalised income (quintiles)
•
Economic status (ILO definition)
•
Age of head of household
•
SEG of adults
•
SEG of head of household
•
Highest educational qualification
Consumer durables (TV, satellite/cable receiver, video recorder, freezer, washing machine,
tumble drier, dishwasher, microwave, telephone, CD player, computer).
CHAID (Chi-squared Automatic Interaction Detector) Answer Tree was used to split the
sample down to find the groups which best explain the variation in (weighted) response. It
is the compatibility of the subjects covered in the GHS with those in the PSE that allowed a
suitable set of variables to be selected. Within these groups weights are formed on the
reciprocals of the (weighted) response rates and multiplied by the weight for the probability
of selection of individuals from households to create final weight. The variables which
Answer Tree selected (from the list above) as having an effect on the response rate were:
•
Age of Head of household
33
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
•
Socio-Economic Group of Head of Household
•
Government Office Region
•
Equivalised income
•
Consumer durables (Satellite/cable receiver, CD player, microwave, washing machine,
tumble drier)
•
Limiting longstanding illness
•
Household type
Appendix 2 shows the eighteen weighting classes that were produced from the output and
details of the variables selected by Answer Tree in column2. The percentages, response rate
and weight for non-response are also shown.
9. Conclusion
The use of the GHS as a sampling frame for the PSE highlights some of the common issues
inherent in any follow-up survey. The GHS provided a high quality data source for the PSE
in a variety of ways. The GHS provided a sampling frame which could represent particular
groups in the proportions required for analysis where no other sampling frame was readily or
economically available. The challenge of the sample design in using equivalised income as
the basis for selection could be met by the GHS income data.
The compatibility of the GHS and PSE topics made the GHS an appropriate source of data
not only as a sampling frame but as a rich data source for further analysis. Use of the GHS
data allowed the PSE interview to concentrate on operationalising different concepts of
poverty and social exclusion rather than taking up interviewing time asking many of the
questions carried on the GHS again. The consideration was not only for interview length but
also because repeating questions unnecessarily would have reduced respondent tolerance of
the survey. Interviewers reported that PSE respondents were concerned about the length of
the interview and the impact of this on their limited leisure time.
With any follow-up survey, there will be an element of ‘double non-response’, that is the
non-response inherited from the parent survey plus the additional non-response from the
follow-up. However, the parent survey can provide us with detailed information about nonresponders to the follow-up and this can feed into a weighting scheme.
The usefulness of a survey as a sampling frame will always depend on the group of the
population that is needed for the follow-up survey but in the case of the PSE, the GHS has
proved to be an efficient way of identifying the refined sample.
References
Mathiowetz, N. and McGonagle, K.(2000) An Assessment of the current state of dependent
interviewing in household surveys, Journal of Official Statistics p401
Martin, J and Matheson. J (1999) Response to declining response rates on government
surveys, Survey Methodology Bulletin , Issue 45
Notes
1
The author would like to thank the following people for the work they contributed to
the design and implementation of the methods adopted in the PSE survey: Charles
Lound, Dave Elliot; Angie Osborn, Alan Francis, Olu Alaka, Karen Irving and
Michaela Pink.
34
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
2
The General Household Survey (GHS) has collected information on a variety of
topics over the last 30 years. The survey is primarily designed to provide
information for policy use by its sponsoring government departments. Its usefulness
lies in the potential for cross-topic analysis, and this potential has been exploited by
many GHS data users. The multi-purpose design and relatively large sample size of
the GHS has also benefited those wanting to conduct research into a particular subsample of the population (for example those aged 65 and over). In this way the GHS
has been used not only as a data source for analysis but has also been employed as a
sampling frame for follow-up surveys.
3
The PSE was carried out by Social Survey Division of the Office for National
Statistics on behalf of a consortium of the Universities of York, Bristol,
Loughborough and the London School of Economics who were funded by the
Joseph Rowntree Foundation. The PSE aimed to measure poverty and social
exclusion in Britain today, focussing on deprivation indicators based on a societal
definition of the necessities of life. The team at the four universities conducted
preparatory research, defining concepts and testing ways to operationalise them,
including question testing work, before commissioning ONS to advise on the survey
design and to conduct the fieldwork. The consortium of academics had conducted
research into poverty in conjunction with MORI in the Breadline Britain Surveys in
1983 and 1990. PSE was designed to update these surveys. In addition to the
follow-up to the GHS described in this paper the ONS Omnibus Survey also asked a
representative sample of the population of Great Britain their views on what
constitutes the necessities of life in present-day Britain in June 1999.
4
From Poverty and Social Exclusion in Britain – D. Gordon et. al. 2000, p.10
5
The sample was drawn from edited GHS data. This was necessary due to the
complexity of the data that was used to select the sample. The use of edited, rather
than raw data, has implications for the timescale of such projects.
6
The 1998/9 GHS and PSE were conducted as a computer-assisted personal interview
(CAPI) using a Blaise program. Answers recorded during the GHS interview were
fed forward into the PSE CAPI interview for the interviewer to verify with the
respondent and update where circumstances had changed. Feeding information from
one CAPI questionnaire to another helped maintain the smooth flow of the
interview.
7
For example, in PSE the tenure question was asked in this way:
“Last time we spoke you said that you occupied your accommodation in the following way.
Buying it with the help of a mortgage. (fed-forward answer from GHS interview)
Is this still correct?”
1. Yes
2. No
If the respondent answered ‘no’ the GHS tenure question was asked again in full
35
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
Appendix 1 Weighting for the probability of selection
Since the GHS is an equal-probability sample of households and individuals, it was only
necessary to build weights to account for each stage of the subsampling from the GHS
sample.
Weighting for countries
360 areas out of 518 were selected in England and Wales and all PSUs in Scotland.
Therefore the initial weight, Wt1, has a value of 518/360 in England and Wales and a value
of 1 in Scotland.
Weighting by quintile group
Households in different income quintile groups (as defined by the data collected in the 1998
GHS) were sampled at different rates. For each quintile group the sampling rate was used to
create a second weight, Wt2, which is a product of Wt1 and the sampling rate.
For the lowest quintile group: Wt2=Wt1*1.
For the fourth quintile group: Wt2=Wt1*4/3.
For the top three quintile groups and those who refused the GHS income section:
Wt2=Wt1*4.
Weighting by household size.
The weight needed here depends on the type of analysis to be carried out. Three weights
have been created, each of which is a product of Wt2 and household size and composition.
1. Analysing sampled adults. Here the chance of being included for the survey, given that
the household is sampled, is ‘1’ divided by the number of adults in the household. So the
weighting factor should be the number of adults in the household:
Wt3a = Wt2*number of adults in household
2. Analysing families. Here the chance of a family unit being included, given that the
household is sampled, is equal to the number of adults in the family divided by the number
of adults in the household.
Wt3b = Wt2*number of adults in household/number of adults in family
3.Analysing adults in a sampled family. Here each adult can be included in the file either by
being sampled themselves, or because their spouse is sampled. Therefore given that the
household is sampled, for single adults, the probability is one divided by the number of
adults in the household and for adults in couples the probability is two divided by the
number of adults in the household.
Wt3c
If the respondent has a partner, Wt3c =Wt2*number of adults in household/2.
If the respondent does not have a partner, Wt3c=Wt2*number of adults in
household.
36
SMB 49 7/01
Joanne Maher
The GHS as a sampling frame: the example of the PSE
Appendix 2 Weighting for unit non-reponse
Weighting
Class
Description
Response
Rate (%)
Weight
for Nonresponse
Weighted
Base
Age 17-28
1
2
Socio- Economic Group of head of household
Professional; Employer-manager; skilled manual;
semi-skilled manual; <missing>
Intermediate non-manual; junior non-manual;
unskilled manual
32
3.13
986
48
2.09
384
58
65
1.72
1.54
559
807
27
3.75
499
61
36
1.64
2.78
910
389
29
3.42
253
60
52
1.68
1.92
6019
330
79
56
1.27
1.78
825
699
Age 29-40
3
4
5
6
7
8
Government Office Region
NE; Merseyside; E Midlands; SW; Wales;
Scotland;
Satellite/Cable TV
Have satellite/cable TV
Do not have satellite/cable TV
NW; Yorkshire & Humberside; W. Midlands;
Eastern; London; SE
CD Player
Have CD Player
Household type
1 pers; 2+ unrelated adults, m. couple – no
children; 2 + families; cohabiting couple –
no children
M. couple - dep children; lone parent; lone
parent – indep children; same sex cohab;
cohab couple – dep children
Tumble drier
Have tumble drier
Do not have tumble drier
Do not have CD player
Age 41-63
9
10
Video recorder
Have video recorder
Do not have video recorder
Age 64-70
11
12
Government Office Region
NE; E Midlands; W Midlands; SE; SW; Wales
NW; Merseyside; Yorkshire & Humberside;
Eastern; London; Scotland
37
SMB 49 7/01
Joanne Maher
Weighting
Class
The GHS as a sampling frame: the example of the PSE
Description
Response
Rate (%)
Weight
for Nonresponse
Weighted
Base
Age 71-100
13
14
15
Equivalised income (Quintiles)
1 (top), 2, 3
77
1.30
649
70
1.44
306
53
1.88
203
51
1.96
275
Quintile 5 (bottom); <missing>
Washing machine
Have washing machine
Do not have washing machine
47
38
2.15
2.65
617
204
All
56
Quintile 4
Microwave
Have Microwave
Limiting Longstanding Illness
Have non-limiting longstanding illness; no
longstanding illness
Have limiting longstanding illness
16
17
18
Do not have microwave
38
14914
SMB 49 7/01
Evaluation of survey data quality using matched Censussurvey records
Amanda White, Stephanie Freeth and Jean Martin
1. Introduction
In England and Wales the Office for National Statistics (ONS) is responsible for conducting
the decennial Census of Population. Following the censuses in 1971, 1981 and 1991
investigations into survey nonresponse on major government household surveys were
carried out by the Social Survey Division (SSD) of ONS which carries out many of the
major government household surveys. Addresses sampled for surveys taking place around
the time of the censuses were linked with individual census records for the same addresses.
Since all the Census variables are available for both responding and nonresponding
addresses, this provides a very powerful means of investigating the characteristics of
nonrespondents, measuring nonresponse bias and evaluating methods of adjusting for the
bias.
For the studies carried out following the 1971 and 1981 censuses, the actual matched
datasets were not available to the methodologists investigating nonresponse, only specified
aggregate tables, which limited the analysis to descriptive comparisons of the characteristics
of respondents and nonrespondents, and measurement of bias in terms of census
characteristics. However, in 1991 matched micro records were made available under strict
confidentiality arrangements which allowed much more detailed statistical modelling to be
undertaken to investigate the interrelationship between different variables which relate to
nonresponse bias and to assess different methods of re-weighting data to compensate for the
bias. Five surveys were included in the study in 1991, allowing comparison of results for
surveys with very different designs. Key results are presented in Foster (1998).
A more ambitious study is planned in connection with the 2001 Census. Some 8-10 surveys
are likely to be included, with very varying survey designs and response rates. The response
rates are expected to range from just under 60% to over 80%, based on current levels. More
importantly, our knowledge of factors contributing to survey nonresponse has advanced
considerably in the past ten years suggesting enhancements to the basic study design, while
more powerful and sophisticated statistical modelling techniques are now available for the
analysis.
This paper outlines our plans for the study following the 2001 Census and presents the
results of the pilot for some aspects of the study.
2. Design of the 2001 Census-linked Study of Survey Nonresponse
The 2001 study builds on the previous census-linked studies of survey nonresponse carried
out by ONS and by Groves and Couper (1998). The key feature of the 2001 study is the
collection and use of a significant amount of auxiliary data to supplement the information
available from the census. Advances in research on non-response have led to a better
understanding of the probable causal mechanisms that determine likelihood of response.
Groves and Couper (1998) have put forward conceptual models of the factors determining
likelihood of contact and likelihood of co-operation given contact. They list four broad
categories of influence:
•
area characteristics
•
household characteristics
39
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
•
survey design features
•
interviewer characteristics
Each of these combines with the others to affect both likelihood of contact, the interaction
between the household and the interviewer and hence the likelihood of co-operation given
contact. Their work emphasises that it is not demographic characteristics per se which
determine non-response; rather that people with certain characteristics are likely to lead
lifestyles or hold attitudes which determine how easy they are to find at home or persuade to
take part in a survey. Interviewers have a huge influence on response outcomes. The
attitudes and strategies they bring to their work and their detailed behaviour at the household
have been shown to be major determinants of response outcome. The 2001 census-linked
study of survey non-response plans to measure as many factors as possible which are likely
to influence response outcomes in order to explore how area, household, interviewer and
survey design characteristics interact to impact on non-response.
3. The data to be collected
The data to be used in the study will be drawn from census records and data of survey
interviews carried out around the time of the 2001 Census (which took place on 29 April).
The census data for households sampled for the participating surveys will be extracted and
matched with the corresponding survey data and other information relevant to the analysis.
The study will collect the following information:
•
Information about the areas sampled for the surveys
•
Information about households
•
Survey design features
•
Information about the interviewers
•
Information about interviewer behaviour and outcome of visits to each sampled address.
Figure 1 Design of matched dataset for 1991 study
Response outcome
information
Census
variables
MATCHING
BY HOUSEHOLD
Nonrespondents
40
Survey responses
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
Survey responses
3.1. Information about the area
Much of the information available about the areas sampled for the surveys in the study will
come from the census itself but some other information may also be available. We plan to
use the following area level information:
•
population density
•
whether urban or rural
•
crime rate
•
unemployment rate
•
proportion of owner occupiers
•
proportion of multi-occupied units
We will also ask interviewers to record information about the area that might be predictive
of ease of contact or gaining co-operation. In addition, we will investigate various area
classifications that are currently available. Area level information can be linked to samples
for any general population survey we carry out and is therefore available for routine nonresponse adjustment (and for sample stratification and analysis). This study will allow us to
assess how well it compensates for non-response bias and how it may best be used.
3.2. Information about households
The study’s database will have all the variables included in the census. We also aim to ask
interviewers to record their observations about characteristics of the selected
addresses/households which might be predictive of ease of contact (eg presence of entry
phone or other barriers to access) or gaining co-operation.
41
SMB 49 7/01
Interviewer characteristics
Nonrespondents
Response outcomes
Doorstep interaction and
interviewer behaviour
Interviewer observations
MATCHING
Census variables
Area information
Figure 2 The design for the 2001 study
White, Freeth & Martin
Evaluation of survey data quality
3.3. Survey design features
The surveys that are likely to be included in the study are all carried out by face-to-face
interview but vary with respect to the following design features:
•
whether information is required about all the individuals in the household or a selected
adult
•
whether all adults are to be interviewed in person or whether proxy information is
allowed
•
the response rules which determine the response rate
•
the average length of interview
•
the length of the field period
•
the survey topics covered
•
whether non interview data collected is included (e.g. diaries)
We will record survey design information on the database for use in the analysis.
3.4. Interviewers characteristics
We plan to carry out a survey of interviewers, repeating an earlier survey of 1998 (Martin
and Beerten, 1999) which ONS carried out as part of an international project (Hox, 1999).
Based on the earlier results we expect to collect the following data:
•
socio-demographic characteristics
•
length and nature of survey experience
•
performance grade (based on response and other factors over a year)
•
confidence in ability to gain response
•
knowledge of techniques which encourage response
•
reports of calling strategies
•
reports of strategies used to persuade people to respond
3.5. Interviewer behaviour and outcome of calls
As with the earlier studies, we will record the information about the visits interviewers make
to each sampled address and the outcome of each call. We will also ask interviewers to
record other potentially useful information about their own behaviour and the interaction
with respondents. We expect to have the following information:
•
time of day and day of week of each call
•
outcome of each call
•
number of calls to make initial contact
•
number of calls to complete interview after making contact
42
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
•
reasons given for not granting an interview
•
questions and comments of respondent
4. Sample size
The sample size for each participating survey will be similar to the 1991 study - around
5,000 addresses. We plan to carry out a lot of analyses separately for each survey and a
sample of 5,000 addresses will yield a large enough number of non-responding households
for survey level analyses. As in previous years, responding households will not be subsampled because this will have adverse effects on the statistical models to be developed.
Addresses in Scotland and Wales will be included for surveys that cover these countries.
5. Timing of the study
Ideally we want to match addresses which will be contacted as near to the census night
(April 29 2001) as possible. This was carried out in previous years by including addresses
selected for survey interviews in the months on either side of census night. In 2001, we
propose to match census data with the data of addresses selected for interview from April
and to start collecting the data outlined in section 3 in May. Matching addresses selected for
interview before April is not feasible due to data compatibility problems linked to the
adoption in April of the National Statistics Socio-economic Classification (NS-SEC) and the
modification of a number of classificatory questions used on government surveys in Britain.
6. Analysis plan
Organisations sponsoring the surveys included in the study will be paying for their
participation and will commission outputs for their particular survey. As with the 1991
study they will be offered a range of analyses from which to select. On the 1991 study all
survey sponsors wanted basic descriptive analysis of the characteristics of nonrespondents
and extent of nonresponse bias but only one commissioned detailed investigation of
alternative methods of nonresponse adjustment.
Apart from survey specific analyses we plan to use this rich dataset to examine influences on
nonresponse across the surveys. This follows the approach we used for the analysis of the
effect of interviewer characteristics on nonresponse carried out recently (Martin and Beerten,
1999). For that study we did not have any information at household level whereas for this
we will have a great deal. In planning these analyses we will take account of the hierarchical
structure of the dataset, using multilevel modelling to distinguish effects at the different
levels. Essentially we have three levels of information (Figure 3):
a) interviewer – characteristics and attitudes
b) assignment – survey design features, area characteristics
c) household – interviewer observations, interviewer behaviour, household characteristics
43
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
Figure 3 Data structure
Interviewer
Survey/
Area
Interviewer
Survey/
Area
Survey/
Area
Survey/
Area
Hhold
Hhold
Hhold
Hhold
Hhold
Hhold
Hhold
Hhold
Outcome
Outcome
Hhold
Outcome
Hhold
Outcome
Hhold
Outcome
Hhold
Outcome
Hhold
Outcome
Hhold
Outcome
Hhold
Our plan is to explore two aspects of non-response separately: whether the interviewer made
contact or not; and for each household where contact was made, whether the household cooperated or not. Multi-level modelling is most suitable for investigating the determinants of
response. With so many variables potentially available we need to have clear hypotheses
about the likely relationships between them in order to decide in which order to enter
variables and to interpret the results.
7. Output options
Reports giving the results relating to a particular survey will be prepared for the
organisations sponsoring each participating survey. Other methodological papers and
reports comparing the results across surveys and drawing more general conclusions about
the nature of non-response will also be produced.
The analysis options we will offer include:
a) Descriptive comparisons of characteristics of non-respondents:
comparisons of characteristics in terms of census variables of responding, non-contacted and
refusing households.
b) Measurement of non-response bias in terms of census characteristics:
summaries of over or under-representation of different subgroups in terms of census
characteristics.
c) Comparison of the above with results from previous years (if available) to identify
changes over time.
d) Logistic regression modelling to determine relative influence of different census variables
on response outcomes.
44
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
e) Multilevel modelling to determine the relative effects of area, household and, if available
or appropriate, survey design and interviewer level variables on response outcomes.
f) Development of weighting schemes based on the above analyses. Ideally these will
incorporate information that could be collected routinely by interviewers as well as the
information used in current non-response adjustment.
8. Development and implementation of the study
Development work for the study started last year. We have had discussions with colleagues
in ONS Census Division about the arrangements required to match records. We have also
carried out a lot of work to design the procedures for collecting the auxiliary data required
for analysis. These were piloted in January and February 2001. The main purpose of the
pilot was to:
•
check that the procedures operate satisfactorily when used simultaneously on a number
of large continuous surveys.
•
assess the time interviewers need to collect the information to ensure that the module
can be fitted into interviewers’ assignments.
•
obtain suggestions for improving the design to minimise burden on interviewers and to
help develop training material for the main stage.
Nine interviewers and one interviewer supervisor working on a number of government
household surveys carried out by ONS were involved in the pilot. The pilot interviewers
provided feedback and advice on question wording and layout, and practical matters such as
how to handle the material in the field. The key findings of the pilot were:
•
interviewers can obtain the information required from observation or from their
introductory conversation with the informant.
•
the information required must be recorded as soon as practicable because the details can
easily be forgotten. Interviewers working on the pilot recommended recording the
information immediately after the contact with the household or on the same day if
immediate recording is not possible.
•
the procedure for recording the information has to be designed to encourage interviewers
to record the information immediately after the contact. This means that a compact and
user-friendly paper interviewer observation form has to be developed because it is not
always safe or convenient for interviewers to use their laptop computer in their car.
We incorporated the findings of the pilot in re-designing the procedures and circulated the
revised interviewer observation form to three of the original pilot interviewers, eleven other
interviewers and an interviewer supervisor. This second round of consultation produced
some suggested simplification to the lay-out of the form which were incorporated into the
final design. (Appendix A).
The collection of auxiliary information has begun and most of the surveys will have
completed the data collection by the end of September 2001. The census data will be
extracted for matching in 2002 after Census Division has completed processing the census
data. We expect to begin analysis in 2003 and to produce the first survey specific outputs
later that year.
45
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
References
Barton, J. (1999) Effective calling strategies for interviewers on household surveys. Survey
Methodology Bulletin, 44, London: Office for National Statistics
Campanelli, P., Sturgis, P. and Purdon, S. (1997) Can you hear me knocking? An
investigation into the impact of interviewers on survey response rates. London: Social and
Community Planning Research.
Foster, K. (1998) Evaluating nonresponse on household surveys. GSS Methodology Series
No. 8. London: Government Statistical Service.
Groves, R.M. and Couper, M. P. (1998) Nonresponse in household interview surveys. New
York: Wiley.
Hox, J. J. (1999) The Influence of Interviewer Attitudes and Behaviour on Household
Survey Nonresponse: an International Comparison. Invited paper presented at the
International Conference on Survey Nonresponse, October 28-31 1999, Portland, Oregon.
Lynn, P., Clarke, P., Martin, J. and Sturgis, P. (1999) The Effects of Extended Interviewer
Efforts on Nonresponse Bias. Invited paper presented at the International Conference on
Survey Nonresponse, October 28-31 1999, Portland, Oregon.
Martin, J. and Beerten, R. (1999) The Effect of Interviewer Characteristics on Survey
Response Rates. Paper presented at the International Conference on Survey Nonresponse,
October 28-31 1999, Portland, Oregon.
Morton-Williams, J. (1993) Interviewer Approaches. Cambridge: Dartmouth.
46
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
Appendix A
The Interviewer Observation Form
The Interviewer Observation Form is a double-sided A5 booklet. We have only reproduced
the key pages here; the pages for recording calls 02 to 10 are broadly the same as the ones
for recording call 01.
47
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
48
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
49
SMB 49 7/01
White, Freeth & Martin
Evaluation of survey data quality
50
SMB 49 7/01
Interviewer Observation Form
Survey
Area
Address
Household
Month
Is this a Reissue?
Yes
1
No
2
Interviewer Name
Interviewer Number
Please complete this form for every telephone or visiting call made at
the selected address.
This includes the contact when an interview is obtained.
The questions refer to the introductory conversation with the
member of the household. This is the conversation that took place
between the time you introduced yourself and the time you either
started the interview, or ended the contact.
Office for National Statistics, 1 Drummond Gate, London SW1V 2QQ.
Call 01
If Q3 = 1 to 8, → Q11.
If non-contact (Q3 = 9 to 13), → Q4.
Otherwise, → Q5.
Q1. Date of call
(ddmmyyyy)
Q2. Time of call
(24h: hh.mm)
HQ refusal
Ineligible: not built /demolished/derelict
Ineligible: vacant/empty
Ineligible: non-residential
Ineligible: residential but no resident hhld
Ineligible: communal estab / institution
Unable to establish eligibility
Address temporarily inaccessible due to
weather or other causes
No answer on the phone or got
answerphone
No face-to face contact with anyone at
address
No contact with anyone at the address
but spoke to a neighbour
Contact made with sampled household
but not with a responsible resident
Contact made with responsible resident
but not with selected respondent
Refusal at intro/bef int - by hhld member
Refusal at intro/bef int - by proxy
Refusal at intro/bef int – DK hhld or proxy
Refusal at intro/bef int - by selected resp
Refusal during interview
No interview due to language, age,
infirmity, disability etc.
Appointment made
Interviewer withdrew to try again later
Appointment broken
Placement interview / checking /reminder
call completed (diary surveys only)
Partial household interview completed
Hhld interview completed but
non-contact with one or more elements
Hhld interview completed but refusal or
incomplete interview by one or more
elements
Full co-operation / interview
Other outcomes
2
Card/message left at address
1
2
3
4
5
6
7
Don’t know 5
1
Message left on the phone 2
Message left on the answerphone 3
No card/message left 4
If non-contact with anyone at
address (Q3 = 9 to 11), → Q10.
Otherwise, → Q5.
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Q5. How did you first contact the
household at this call?
Code ALL that apply
Spoke through entryphone/ intercom 1
Spoke through closed door/ window
letter box/ door on a chain 2
Spoke through open door/ window 3
Spoke to informant who was
outside the household unit 4
Went /invited inside the household
unit and spoke to informant 5
Spoke over the telephone
Q8. Did the main person make any of the
following comments during your
introductory conversation?
Code ALL that apply
Main person did not comment
8
9
10
16-34 2
35-59 3
60 and over 4
Q4. Did you leave a card or message
at the address / on the phone /
on the answerphone?
Q3. Outcome of this call
Code ALL that apply
Q7. What is the approximate age of the
main person?
Less than 16 1
6
Q6 to Q8 refer to the MAIN PERSON
you made contact with.
Q6. Was the main person you talked to:
a man/boy?
1
a woman/girl?
2
Don’t know, not sure
3
Positive/neutral comments:
Received / remembered advance
letter
Expecting someone to call
Make an appointment and come
back
I’ll think about it
Survey topic is important / or other
positive comments about the
survey topic
Enjoy doing surveys
Other positive /neutral comments
Negative comments:
Not interested / can’t be bothered
I’m too busy / Bad time / Just going
out / About to go away
I don’t know anything
We are not typical
Not capable / too sick/ old/ infirm
Waste of time
Waste of money
Government knows everything
Don’t trust study is confidential
Invasion of privacy / too many
personal questions
Don’t trust surveys
Never do surveys / I hate forms
Already participated in surveys
Negative comments about the
survey topic
Other negative comments
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Q9. Did the main person ask any of the
following questions during your
introductory conversation?
Code ALL that apply
Main person did not ask questions
What is the purpose of the survey /
What’s it all about?
Who is paying for this/who is the
sponsor?
What will happen to the information
How will the results be used?
Why/how was I chosen?
How long will the interview take?
Who’s going to see my answers?
Can I be identified?
Is it confidential?
Is this compulsory?
Can I get more information?
Can I get a copy of the results?
What’s in it for me?
Do I get an incentive? How much is
the incentive? When / how will I
get paid?
Other questions
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Q10. Diary surveys only. Non-diary
surveys → Q11.
Was this call a:
call to make an appointment? 1
placement call? 2
checking call? 3
reminder call? 4
collection call? 5
Q11. When did you fill in the details
about this call?
17
18
19
20
On a different day from the call 3
21
22
Go to Accommodation Section (p.22)
Immediately after the call 1
On the same day 2
3
Accommodation
A4.
Please complete for all addresses
including ineligibles.
Are there any physical barriers to
entry to the house/flat/building?
Code ALL that apply
Locked common entrance
1
Mainly very good
Mainly good
1
2
1
Locked gates
2
Mainly fair
Semi-detached
2
Terrace/end of terrace
3
Security staff or other
gatekeeper
Entry phone access
3
4
None
5
Don’t know
6
A1. What type of accommodation is it?
House or bungalow
Detached
Flat or maisonette
In a purpose built block
Part of a converted house /
some other kind of building
4
Room or rooms
6
Caravan, mobile home or houseboat
Some other kind of accommodation
7
8
Don’t know / not applicable /
unable to code
5
9
If flat, maisonette or rooms
(A1 = 4 to 6), → A2. Otherwise, → A4.
A2. Is the household’s accommodation
self-contained?
Yes, all rooms are behind a door
that only this household can use 1
No
Don’t know
A3. What is the floor level of this
household’s accommodation?
Basement or semi-basement
Ground floor (street level)
1st floor (floor above street level)
Second floor
Third floor
Fourth floor
Fifth to ninth floor
Tenth floor or higher
Don’t know
22
N2.* Are the houses/blocks in this area
in a good or bad state of repair?
2
3
A5. Which of the following are visible
at the sampled address?
Code ALL that apply
1
2
3
A bit unsafe
3
Mainly bad
Mainly very bad
4
5
Very unsafe
4
Unable to code
6
N7. Is this your final call to this
household?
N3.* How many boarded-up or
uninhabitable buildings are there in
this area?
None
One or two
1
2
1
A few
3
Security gate over front door
Bars/grills on any windows
2
3
Several or many
Unable to code
4
5
Other security device(s)
e.g. CCTV
4
Security staff or security
lodge on estate or block
5
None of these
6
Don’t know
7
The term “area” in the following
questions refers to the area that
you can see from the address.
1
2
3
4
5
6
7
8
9
Very safe
Fairly safe
Burglar alarm
Neighbourhood
N1.* Is the sampled house/flat/building
in a better or worse condition than
the others in the area?
Better
Worse
1
2
About the same
3
Unable to code
4
N6.* How safe would you feel walking
alone in this area after dark?
Yes 1 → Go to Household
Section (p.24)
No 2 → You are now finished
until the next call
N4.* Are most of the buildings in the
area residential or commercial /
non-residential?
All residential
1
Mainly residential with some
commercial or non-residential
2
Mainly commercial or other
non-residential
3
Unable to code
4
N5.* Is the house/flat part of a council or
Housing Association housing
estate?
Yes, part of a large council
estate
1
Yes, part of a council block
No
2
3
Unable to code / not applicable
4
23
Household information
H3. Are there any children aged 5
or under?
H1. Interviewer code final outcome:
Full or partial
interview / co-operation
1
Refusal
2
Yes, definitely 1
Possibly 2
No 3
→ H2
No interview due to
language, age, infirmity etc 3
Don’t know 4
National Travel Survey → H7.
Other surveys → H8.
H7. Does the household at present own or have
available for use any cars or vans?
Include any company cars or vans
if available for private use.
Refused 5
Non-contact
4
Ineligible
HQ refusal
5
6
→ H8
Yes
1
H4. Is any adult in paid work?
No
2
Yes 1
Don’t know /refused
3
No 2
Don’t know /Refused 3
H2. Please enter household details. Use “DK”
for don’t know and “Ref” for refused.
No. of adults
(aged 16 or over)
No. of children
(less than 16)
H5. How long has the household
lived at this address?
You have finished at this address.
Thank you for completing the questionnaire.
Please key the information into your laptop at home and return the
questionnaires to the office when you have completed this quota.
Months →
Sex
M - Male
F
- Female
Age band
1 16 - 34
2 35 - 59
3 60+
DK - Don’t know DK- Don’t know
Ref - Refused
Ref - Refused
Adult 1
Adult 2
Adult 3
Adult 4
OR
Years
H8. Please check that you have completed
the Accommodation & Neighbourhood
Section (p.22-23) and enter the number
of calls made to this address
→
Don’t know/ refused
99
H6. Do you know or think the
occupants are:
Code from observation
Code ALL that apply
White 1
Mixed 2
Adult 5
Asian (Indian, Pakistani,
Bangladeshi, other) 3
Adult 6
Black (Caribbean,
African, other) 4
Adult 7
Chinese and other
ethnic group 5
Don’t know 8
24
25
Forthcoming conferences, seminars and courses
1. SRA training days
19 th September 2001
Introduction to qualitative data analysis
This training day follows on from the Introduction to Qualitative
Research training day. Introduction to Qualitative Data Analysis is
aimed at people with little experience in this area. The day will
cover theoretical as well as practical approaches to qualitative data
analysis. Practical sessions will cover the interpretation of
qualitative data. There will be time for questions and discussions
throughout the day.
7 th November 2001
Telephone research methods in social research
Use of the telephone as a methodology for social research is
increasing, as it can provide a cost effective and easily controllable
way of conducting fieldwork. This training day will help
participants to identify when a telephone methodology would be
appropriate and when it should be avoided. It is primarily aimed at
those with relatively little experience of research in this field,
although the day will provide an overview of telephone research
methods and their benefits and drawbacks and will provide an
opportunity to listen and debate with experienced practitioners.
Topics covered will include: a comparison of telephone and faceto-face methods; sampling issues including sample sources and
coverage; practicalities of conducting telephone research from
recruiting and training interviewers to maximising response rates;
and researching sensitive topics by telephone.
All events will be held at the London Voluntary Resource Centre, 356 Holloway Road,
London.
Courses cost £65 for members, £110 for non-members (includes membership of the SRA for
one year) and £16 for students/unwaged
For further information on the contents of the courses contact:
Joanne Maher
Office for National Statistics
1, Drummond Gate
London SW1V 2QQ
Email: [email protected]
Or visit the SRA’s website: http://www.the-sra.org.uk
2. Centre for Applied Social Surveys
CASS is currently developing a program of courses for the next academic year. Full details
will be available in September.
51
SMB 49 7/01
Forthcoming conferences, seminars and courses
Jane Schofield
Department of Social Statistics
University of Southampton
Southampton SO17 1BJ
Tel:
02380 593048
Fax
02380 593846
3. Royal Statistical Society Seminars
2-5 September 2002
University of Plymouth will be hosting the 2002 RSS conference
For more information on any of the courses listed above or to book a place contact Celia
Macintyre
e-mail [email protected]
The Society has accepted an invitation to host an Ordinary Meeting organised by the
Research Section, at the IMS Annual Meeting in Banff, Canada, July 27-August 1, 2002,
subject to availability of a suitable paper.
Papers for reading at Ordinary meetings organised by the Research Section may be
submitted at any time although, to be considered for the Banff meeting, submission by July
31, 2001 is likely to be necessary.
Further information about Research Section read
http://www.maths.soton.ac.uk/staff/JJForster/RS/rsobj.html
papers
is
available
at
4. The Cathie Marsh Centre for Census and Survey Research
27 September 2001; 22 January 2002; SPSS for social scientists (1day)
20 March 2002
3 October 2001; 14 January 2002
Surveys and Sampling (1 day)
10 October 2001; 18 January 2002
Questionnaire Design (1 day)
24 October 2001; 30 January 2002
Exploratory Data analysis (1 day)
28 November 2001; 6 February 2002
Basic Data Analysis (1 day)
7 December 2001
STATA for SARS (1 day)
27 February 2002
Introduction to STATA (1day)
23 November 2001
Analysing Hierarchical Surveys (1 day)
23 November 2001
Analysing Hierarchical data (1 day)
15- 16 January 2002; 4-5 June 2002
Multiple Regression (2 days)
13 February 2002; 3 July 2002
Logistic Regression ( 1 day)
4 March 2002
Conceptualising Longitudinal Data Analysis (1
day)
52
SMB 49 7/01
Forthcoming conferences, seminars and courses
5 March 2002
Introducing Longitudinal Data Analysis (1
day)
13 March 2002
Factor Analysis (1 day)
10 April 2002
Cluster Analysis (1 day)
8 May 2002
Multilevel modelling (1 day)
23-35 January 2002
Design and Analysis of Complex surveys (3
days)
25-27 March 2002
Longitudinal Data Analysis (3 days)
31 October – 1 November 2001; 8 – 9 Demographic forecasting with POPGROUP
January 2002
For further information on the above courses or to book a place contact Kate Thomas
Tel 0161 275 4736
Or visit Http:\\les1.man.ac.uk/ccsr/setvices.htm
5. Miscellaneous Courses and Conferences
12-14 September
2001
7 th International Blaise Users Conference
Washington, USA
http://www.blaiseusers.org
16-19 October 2001
Q2001 Symposium: Achieving Data Quality in
a Statistical Agency: A Methodological
Perspective
Quebec, Canada
http://www.statcan.ca/english/conferences/sym
posim2001/index.htm
18 February – 8
March 2002
Scaling and Cluster Analysis: 31 st Spring
Seminar at the Zentralarchiv
Köln, Germany
http://www.gesis.org/en/index.htm
16 November 2001
What have we learned from SARS: a one day
conference to showcase some of the most
important and innovative research findings
based on the 1991 SARS
MANDEC centre,
University of
Manchester
http://www.les1.man.ac.uk/ccsr/cmu/conferenc
eprograme.html
53
SMB 49 7/01
The National Statistics Methodology Series
This new series, aimed at disseminating National Statistics methodology quickly and easily,
comprises monographs with a substantial methodological interest produced by members of
the Government Statistical Service.
Currently available
1. Software to weight and gross survey data, Dave Elliot
2. Report of task force on seasonal adjustment
3. Report of the task force on imputation
4. Report of the task force on disclosure
5. GDP: Output Methodological guide, Peter Sharp
6. Interpolating annual data to monthly or quarterly data , Michael Baxter
7. Sample design options for an integrated household survey, Dave Elliot and Jeremy Barton
8. Evaluating non-response on household surveys, Kate Foster
9. Reducing statistical burdens in business, Andrew Machin
10. Statistics on Trade in Goods, David Ruffles
11. The 1997 UK Pilot of the Eurostat Time Use Survey, Patrick Sturgis and Peter Lynn
12. Monthly statistics of Public Sector Finances – a methodological guide, Jeff Golland,
David Savage, Tim Pike, Stephen Knight
13. A review of Sample Attrition and Representativeness in Three Longitudinal Surveys, Gad
Nathan
14. Measuring and Improving Data Quality, Vera Ruddock
15. Gross Domestic Product: Output Approach, Peter Sharp
16. Report of the Task Force on Weighting and Estimation, Dave Elliot
17. Methodological Issues in the Production and Analysis of Longitudinal Data from the
Labour Force Survey, P S Clarke and P F Tate (Winter 1999)
18. Comparisons of income data between the Family Expenditure Survey and the Family
Resources Survey, Margaret Frosztega and the Households Below Average Income team
(February 2000)
19. European Community Household Panel: Robustness Assessment Report for United
Kingdom Income Data, Waves 1 and 2(March 2000)
20. Producer Price Indices: Principles and Procedures, Ian Richardson (March 2000)
21.Variance Estimation for Labour Force Survey Estimates of Level and Change, D J
Holmes and C J Skinner
22. Longitudinal Data for Policy Analysis by Michael White, Joan Payne and Jane Lakey
54
SMB 49 7/01
The National Statistics Methodology Series
23. GSS Report of Survey Activity 1997, Survey Control Unit.
24. Obtaining information about drinking through surveys of the general population, Eileen
Goddard
25. Methods for automatic record matching and linkage and their use in National Statistics,
Leicester Gill
26. Designing surveys using variances from Census data, Sharon Bruce, Charles Lound and
Dave Elliot
Forthcoming
GSS Report of Survey Activity 1998
GSS Report of Survey Activity 1999
Evaluation criteria for statistical editing and imputation, Ray Chambers
Copies are available free of charge to GSS members, otherwise £5, from National Statistics
Direct (Tel 01633 812 078). If you would like to submit a monograph please send an e-mail
for details of house style and procedures to [email protected].
55
SMB 49 7/01
Recent Social Survey Division Publications
1. Family Resources Survey, Annual Technical Report 1999/2000
Mark Rowland and Mark McConaghy
ISBN 1 85774 422 5
2. Contraception and Sexual Health, 1999
Fiona Dawe and Howard Meltzer
ISBN 1 85774 444 6
3. Drinking: Adults’ Behaviour and Knowledge in 2000
Deborah Lader and Howard Meltzer
ISBN 1 85774 443 8
4. Psychiatric Morbidity Among Women Prisoners in England and Wales
Maureen O’Brien, Linda Mortimer, Nicola Singleton and Howard Meltzer
ISBN 1 85774 442 X
5. Overseas Travel and Tourism: Business Monitor MQ6 Data for Q3 2000
ISBN 0 11 538063 9
6. Overseas Travel and Tourism: Business Monitor MQ6 Data for Q4 2000
ISBN 0 11 538064 7
Copies of the above publications can be obtained from National Statistics Direct
Tel:01633 812078
Fax: 01633 2762
In addition a selection of recently released ONS publications, including some of those listed
above, are available to download from www.statistics.gov.uk/OnlineProducts/default.asp.
7. Assessing people's perceptions of their neighbourhood and community
involvement. Part 1: A guide to questions for use in the measurement of
social capital based on the General Household Survey module.
Melissa Coulthard, Alison Walker, and Antony Morgan,
This guide is available, free of charge from:
Marston Book Services, P O Box 269, Abingdon, Oxon OX14 4YN.
Tel: 01235 465565
Fax: 01235 465556.
or http://www.hda-online.org.uk/downloads/pdfs/peoplesperceptions.pdf
56
SMB 49 7/01
Interviewer Observation Form
Survey
Area
Address
Household
Month
Is this a Reissue?
Yes
1
No
2
Interviewer Name
Interviewer Number
Please complete this form for every telephone or visiting call made at
the selected address.
This includes the contact when an interview is obtained.
The questions refer to the introductory conversation with the
member of the household. This is the conversation that took place
between the time you introduced yourself and the time you either
started the interview, or ended the contact.
Office for National Statistics, 1 Drummond Gate, London SW1V 2QQ.
Call 01
If Q3 = 1 to 8, → Q11.
If non-contact (Q3 = 9 to 13), → Q4.
Otherwise, → Q5.
Q1. Date of call
(ddmmyyyy)
Q2. Time of call
(24h: hh.mm)
HQ refusal
Ineligible: not built /demolished/derelict
Ineligible: vacant/empty
Ineligible: non-residential
Ineligible: residential but no resident hhld
Ineligible: communal estab / institution
Unable to establish eligibility
Address temporarily inaccessible due to
weather or other causes
No answer on the phone or got
answerphone
No face-to face contact with anyone at
address
No contact with anyone at the address
but spoke to a neighbour
Contact made with sampled household
but not with a responsible resident
Contact made with responsible resident
but not with selected respondent
Refusal at intro/bef int - by hhld member
Refusal at intro/bef int - by proxy
Refusal at intro/bef int – DK hhld or proxy
Refusal at intro/bef int - by selected resp
Refusal during interview
No interview due to language, age,
infirmity, disability etc.
Appointment made
Interviewer withdrew to try again later
Appointment broken
Placement interview / checking /reminder
call completed (diary surveys only)
Partial household interview completed
Hhld interview completed but
non-contact with one or more elements
Hhld interview completed but refusal or
incomplete interview by one or more
elements
Full co-operation / interview
Other outcomes
2
Card/message left at address
1
2
3
4
5
6
7
Don’t know 5
1
Message left on the phone 2
Message left on the answerphone 3
No card/message left 4
If non-contact with anyone at
address (Q3 = 9 to 11), → Q10.
Otherwise, → Q5.
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Q5. How did you first contact the
household at this call?
Code ALL that apply
Spoke through entryphone/ intercom 1
Spoke through closed door/ window
letter box/ door on a chain 2
Spoke through open door/ window 3
Spoke to informant who was
outside the household unit 4
Went /invited inside the household
unit and spoke to informant 5
Spoke over the telephone
Q8. Did the main person make any of the
following comments during your
introductory conversation?
Code ALL that apply
Main person did not comment
8
9
10
16-34 2
35-59 3
60 and over 4
Q4. Did you leave a card or message
at the address / on the phone /
on the answerphone?
Q3. Outcome of this call
Code ALL that apply
Q7. What is the approximate age of the
main person?
Less than 16 1
6
Q6 to Q8 refer to the MAIN PERSON
you made contact with.
Q6. Was the main person you talked to:
a man/boy?
1
a woman/girl?
2
Don’t know, not sure
3
Positive/neutral comments:
Received / remembered advance
letter
Expecting someone to call
Make an appointment and come
back
I’ll think about it
Survey topic is important / or other
positive comments about the
survey topic
Enjoy doing surveys
Other positive /neutral comments
Negative comments:
Not interested / can’t be bothered
I’m too busy / Bad time / Just going
out / About to go away
I don’t know anything
We are not typical
Not capable / too sick/ old/ infirm
Waste of time
Waste of money
Government knows everything
Don’t trust study is confidential
Invasion of privacy / too many
personal questions
Don’t trust surveys
Never do surveys / I hate forms
Already participated in surveys
Negative comments about the
survey topic
Other negative comments
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Q9. Did the main person ask any of the
following questions during your
introductory conversation?
Code ALL that apply
Main person did not ask questions
What is the purpose of the survey /
What’s it all about?
Who is paying for this/who is the
sponsor?
What will happen to the information
How will the results be used?
Why/how was I chosen?
How long will the interview take?
Who’s going to see my answers?
Can I be identified?
Is it confidential?
Is this compulsory?
Can I get more information?
Can I get a copy of the results?
What’s in it for me?
Do I get an incentive? How much is
the incentive? When / how will I
get paid?
Other questions
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Q10. Diary surveys only. Non-diary
surveys → Q11.
Was this call a:
call to make an appointment? 1
placement call? 2
checking call? 3
reminder call? 4
collection call? 5
Q11. When did you fill in the details
about this call?
17
18
19
20
On a different day from the call 3
21
22
Go to Accommodation Section (p.22)
Immediately after the call 1
On the same day 2
3
Accommodation
A4.
Please complete for all addresses
including ineligibles.
Are there any physical barriers to
entry to the house/flat/building?
Code ALL that apply
Locked common entrance
1
Mainly very good
Mainly good
1
2
1
Locked gates
2
Mainly fair
Semi-detached
2
Terrace/end of terrace
3
Security staff or other
gatekeeper
Entry phone access
3
4
None
5
Don’t know
6
A1. What type of accommodation is it?
House or bungalow
Detached
Flat or maisonette
In a purpose built block
Part of a converted house /
some other kind of building
4
Room or rooms
6
Caravan, mobile home or houseboat
Some other kind of accommodation
7
8
Don’t know / not applicable /
unable to code
5
9
If flat, maisonette or rooms
(A1 = 4 to 6), → A2. Otherwise, → A4.
A2. Is the household’s accommodation
self-contained?
Yes, all rooms are behind a door
that only this household can use 1
No
Don’t know
A3. What is the floor level of this
household’s accommodation?
Basement or semi-basement
Ground floor (street level)
1st floor (floor above street level)
Second floor
Third floor
Fourth floor
Fifth to ninth floor
Tenth floor or higher
Don’t know
22
N2.* Are the houses/blocks in this area
in a good or bad state of repair?
2
3
A5. Which of the following are visible
at the sampled address?
Code ALL that apply
1
2
3
A bit unsafe
3
Mainly bad
Mainly very bad
4
5
Very unsafe
4
Unable to code
6
N7. Is this your final call to this
household?
N3.* How many boarded-up or
uninhabitable buildings are there in
this area?
None
One or two
1
2
1
A few
3
Security gate over front door
Bars/grills on any windows
2
3
Several or many
Unable to code
4
5
Other security device(s)
e.g. CCTV
4
Security staff or security
lodge on estate or block
5
None of these
6
Don’t know
7
The term “area” in the following
questions refers to the area that
you can see from the address.
1
2
3
4
5
6
7
8
9
Very safe
Fairly safe
Burglar alarm
Neighbourhood
N1.* Is the sampled house/flat/building
in a better or worse condition than
the others in the area?
Better
Worse
1
2
About the same
3
Unable to code
4
N6.* How safe would you feel walking
alone in this area after dark?
Yes 1 → Go to Household
Section (p.24)
No 2 → You are now finished
until the next call
N4.* Are most of the buildings in the
area residential or commercial /
non-residential?
All residential
1
Mainly residential with some
commercial or non-residential
2
Mainly commercial or other
non-residential
3
Unable to code
4
N5.* Is the house/flat part of a council or
Housing Association housing
estate?
Yes, part of a large council
estate
1
Yes, part of a council block
No
2
3
Unable to code / not applicable
4
23
Household information
H3. Are there any children aged 5
or under?
H1. Interviewer code final outcome:
Full or partial
interview / co-operation
1
Refusal
2
Yes, definitely 1
Possibly 2
No 3
→ H2
No interview due to
language, age, infirmity etc 3
Don’t know 4
National Travel Survey → H7.
Other surveys → H8.
H7. Does the household at present own or have
available for use any cars or vans?
Include any company cars or vans
if available for private use.
Refused 5
Non-contact
4
Ineligible
HQ refusal
5
6
→ H8
Yes
1
H4. Is any adult in paid work?
No
2
Yes 1
Don’t know /refused
3
No 2
Don’t know /Refused 3
H2. Please enter household details. Use “DK”
for don’t know and “Ref” for refused.
No. of adults
(aged 16 or over)
No. of children
(less than 16)
H5. How long has the household
lived at this address?
You have finished at this address.
Thank you for completing the questionnaire.
Please key the information into your laptop at home and return the
questionnaires to the office when you have completed this quota.
Months →
Sex
M - Male
F
- Female
Age band
1 16 - 34
2 35 - 59
3 60+
DK - Don’t know DK- Don’t know
Ref - Refused
Ref - Refused
Adult 1
Adult 2
Adult 3
Adult 4
OR
Years
H8. Please check that you have completed
the Accommodation & Neighbourhood
Section (p.22-23) and enter the number
of calls made to this address
→
Don’t know/ refused
99
H6. Do you know or think the
occupants are:
Code from observation
Code ALL that apply
White 1
Mixed 2
Adult 5
Asian (Indian, Pakistani,
Bangladeshi, other) 3
Adult 6
Black (Caribbean,
African, other) 4
Adult 7
Chinese and other
ethnic group 5
Don’t know 8
24
25
Subscriptions and Enquiries
The Survey Methodology Bulletin is published twice yearly, in January and July, price
£5 in the UK and EIRE or £6 elsewhere. Copies of the current and previous editions
are available from:
Ann McIntosh
Survey Methodology Bulletin Orders
Social Survey Division
Office for National Statistics
D1/15
1, Drummond Gate
London
SW1V 2QQ
[email protected]
Telephone: (020) 7533 5500
www.statistics.gov.uk
Social Survey Division
Office for National Statistics
1, Drummond Gate
London SW1V 2QQ
www.statistics.gov.uk
ISBN 1 85774 447 0
ISSN 0263 - 158X
Price £5