the development of computer assisted interviewing

THE DEVELOPMENT OF COMPUTER ASSISTED INTERVIEWING (CAI) FOR HOUSEHOLD
SURVEYS: THE CASE OF THE BRITISH LABOUR FORCE SURVEY
Tony Manners
1. The need for CAI
Computer assisted interviewing (CAI) for household sample surveys is, if
implemented correctly, a means of obtaining one or more of the following
improvements in data collection over pencil and paper (PAPI) methods:
better quality, improved speed and lower cost after the initial
investment.
Better quality is to be obtained essentially from data editing taking
place where it is most likely to be successful – in the interview. More
specifically, it follows from these features of CAI:(1) automatic routing
through the questionnaire, so that missing values arise only from
respondents being unable or unwilling to answer and not from interviewers’
mistakes; (2) range and consistency errors being detected at a point where
they can be checked with respondents as opposed to PAPI’s reliance on
manual or automatic imputation after the interview.
Improved speed arises from the omission of the time-consuming keying and
manual editing stages of PAPI processing. Data capture on computer also
provides the opportunity to send data to the central location by
telephone. Increased speed may also lead indirectly to better quality data
if more up-to-date information allows for better field management.
The potentially lower cost of surveys using CAI rather than PAPI derives
from lower staff costs and less use of mainframe computing capacity.
Manual keying and editing at the central location can be reduced or
omitted.
Questionnaire
software
which
translates
a
researcher’s
specifications into programs for running CAI can reduce or eliminate the
need for specialist programmers.
CAI has been available since at least the early 1970s for telephone
interviewing (CATI) from central locations where the requisite computing
power could be provided by mainframes and minis. For many surveys, of
course, telephone interviewing was not the best method of data collection.
Even amongst surveys for which the telephone was an appropriate medium,
any advantages were often outweighed by the problems and costs associated
with incomplete telephone coverage of the population. The appearance in
the mid-1980s of hand-held and lap-top computers overcame this obstacle by
making feasible CAI for face-to-face or “personal” interviewing (CAPI is
the commonly-accepted acronym). At about the same time, desktop microcomputers and packages intended for end-users became sufficiently powerful
to be considered for CATI instead of the much more expensive hardware and
bespoke programming previously necessary. When lap-top computers (LTCs)
became able to run the same software as desktop micro-computers, as
rapidly happened, the possibility existed for large, complex household
surveys to be carried out in face-to-face interviews using CAI. The gains
in quality, speed and cost could be realised, particularly if telephone
interviewing (with its own speed and cost efficiencies) could be combined
with face-to-face interviewing in the survey design.
This paper was presented at the Conference Of Commonwealth Statisticians,
April 1990.
There are no drawbacks in principle to CAI. In practice, however, there
are many potential pitfalls concerning the acceptability to public and
interviewers of the new technology and the robustness of the hardware,
software and the system in which they are configured. It has been the job
of the feasibility and development work to probe for potential problems
and find solutions to them.
There can clearly be a heavy initial investment for a statistical
organisation in the equipment for CAI and in the early feasibility and
development work while CAI itself is relatively new and untried. How heavy
it is will depend on the size and complexity of the tasks to be covered.
Software packages now exist for data collection by CAI which interface
with analysis packages such as SPSS; such packages could probably be used
with rather little development work for surveys which do not require
complex editing and derived variable creation. This paper describes the
feasibility and development work for CAI for a survey with a relatively
complex design: the British Labour Force Survey. The quarterly element of
this annual survey, referred to as the Quarterly LFS, was selected as the
lead survey for CAI development because it was of sufficient size and
continuity for the cost efficiencies of CAI to be realised despite the
large initial investment.
2. The lead survey in OPCS for CAI development: the Quarterly LFS
The Quarterly LFS is one element in the UK Annual Labour Force Survey
which is carried out for the Employment Department (ED). A CAI system for
the Quarterly LFS must maintain the performance and characteristics of the
present system or improve on them. This section describes those
characteristics.
The Quarterly LFS is a panel survey with sample rotation and a weekly
placing pattern in which the residents at the sampled address are to be
interviewed five times at 13-week intervals. In March to May each year
there are additional questions. At maximum the questionnaire covers some
15 questions about the household, about 30 questions of which most are
repeated for all household members, and nearly 160 questions of which a
sub-set applies to each household member aged 16 and over. Proxy
interviews are permitted. Virtually all the information from the previous
interview is available in the current interview, and is used to check if
any changes have occurred. The set sample yields over 15,000 responding
households per quarter (about 40,000 persons per quarter). Some 20% of the
interviews are with household and persons being interviewed for the first
time, and about 80% are recall interviews. The response rate is about 88%
for first interviews and about 78% (of the original sample) for fifth
interviews.
The first interview at an address is always carried out face-to-face.
Whenever respondents’ agreement can be secured at the first interview,
recall interviews are carried out by telephone. About 55% of all Quarterly
LFS interviews (70% of recall interviews) are carried out by telephone
form a central installation. The questionnaires used face-to-face and on
the telephone are identical. The interviewing techniques are essentially
the same for the two modes. At recall interviews, most of the information
from the previous interview is available to the interviewer, who uses it
to check if the situation has changed. The danger of under-estimation of
change by this method is considered to be more than outweighed by the
danger of over-estimation of change and of adverse effects on response if
questions are asked afresh each time. In the PAPI system cleaned data from
the current interview are preprinted onto the questionnaires for the next
interview. The CAI systems must be able to redisplay data on screen at the
next interview.
The PAPI system includes a field management system. An enhanced system for
CAI should be capable of producing daily reports on the progress of
fieldwork. Consideration was given to creating for the telephone
interviewing (CATI) an automated system for scheduling calls which would
be constantly updated with the latest information on completions and
appointments. After some initial trials, the difficultly of specifying an
algorithm which would deal adequately with every situation that might be
encountered led to the decision to retain the reliable existing manual
allocation system, at least for the start of the production system.
Changes to the questionnaire are permitted every 6 months , though as much
continuity as possible must be preserved. These amendments currently
require considerable programming effort (for edits, and preprinting
programs) which would largely be saved by CAI.
3. the questionnaire and editing software: BLAISE
When OPCS investigated the software available for CAI in 1987 it concluded
that the BLAISE system developed by the Netherlands Central Bureau of
Statistics was the most promising for the Quarterly LFS. BLAISE has
continued to be developed in ways which seem particularly suitable for
data collection for government statistics; it is the software which OPCS
intends to use in its full-scale production system for CAI for the
Quarterly LFS. OPCS has obtained BLAISE as part of a most useful exchange
of in-house software with the Netherlands CBS, to whom it is very grateful
for the considerable attention and assistance given as BLAISE has
developed. A short description of the BLAISE system is given by Denteneer
et al (1987).
The advantages which OPCS saw in BLAISE were that: (1) it was designed
from the start as an integrated CAI system in which specification of the
questionnaire with its edit checks in a simple English-like language leads
to automatic generation of all the programs required for CAPI, CATI and
CADI (computer assisted data input form paper questionnaires) and for
interfaces to other standard databases and analysis packages; (2) the
BLAISE language itself was sufficiently powerful for LFS purposes yet did
not require professional programming expertise from the survey designer or
anyone else for writing the questionnaire or for preparing it for
interviewers to use; (3) a single 720K diskette for the LTCs use in CAPI
could easily hold two weeks’ LFS interview data (30-40 households) in
addition to the BLAISE programs*; (4) Netherlands CBS was using BLAISE
extensively for government statistical work, and had considerable
experience of CAPI: its own Labour Force Survey has run entirely on CAPI
since January 1987 using a precursor of BLAISE.
4. The CAI system for the Quarterly LFS
Much of OPCS’s development effort has gone into providing a reliable
system for managing the data collected through CAI. Standard microcomputer packages have been used as far as possible. For each month the
planned system is:
(1)
a new address sample from the automated Postcode Address File
on mainframe is moved to the LFS CAI database (CLIPPER) on a 386
micro-computer which already holds recall interview data (i.e. from
the last interviews);
* This sizing is given only as a rough guide: perhaps double this number
of households could be included. Moreover, the Quarterly LFS, unlike many
surveys, needs the BLAISE conversion program on the diskette, halving the
space for survey data.
(2) recall interview data is output from the database and converted
to BLAISE for telephone interviewing on up to 30 networked desktop
computers and, for face-to-face interviewing, organised in 100
interviewer workloads;
(3) each face-to-face interviewer receives two diskettes (BLAISE for
interviews and CROSSTALK for telephone transmission) for use with an
LTC (currently Toshiba T1000) and external modem;
(4) face-to-face interviewers contact households as suits them
within the weekly placing pattern; telephone interviewers are
allocated work manually by supervisors;
(5) for recall interviews any changes since the last interview
overwrite
the
former
answers
(new
routing
is
automatically
calculated and redundant codes deleted);
(6)
after
each
day’s
work,
face-to-face
interviewers
enter
occupation and some other codes and administrative details on
screen, then they convert the day’s partially and wholly completed
work to text files which they telephone to the CLIPPER database via
a computer mail sub-system (BT GOLD);
(7) there is a similar pattern for the telephone interviewers, who
use micro-computers networked to the database;
(8) a field management system reports the current status of each
sampled household;
(9) completed work for the month is sent to a more powerful database
(SIR) for such processes as automatic imputation, derived variable
creation, inter-wave linkage, final production and storage. There is
no manual coding or editing.
5. Trials and results
Two small-scale trials were carried out in March and November 1987. The
second trial used BLAISE and telephone recalls as well as face-to-face
interviews. The purpose of these trials was to test the acceptability of
CAI to public and interviewers. The results were very encouraging, so a
full prototype system was created for a large-scale test in November 1988.
The sample for this trial comprised 320 addresses which were newly drawn
and 570 households which had already been interviewed five times in the
Quarterly LFS; 315 of the households for recall had agreed to be
interviewed by telephone. The work was carried out by 20 face-to-face and
6 telephone interviewers, selected by the usual criteria for allocating
Quarterly LFS workloads. They comprised about 10% of the panel of
Quarterly LFS interviewers. Very few had any previous experience of
computers or keyboard skills.
The trial showed that the system outlined in section 4 was capable of
carrying out Quarterly LFS data collection in present timetables (with
capacity for improvement) and achieving the same level of response. Nearly
all interviewers said that they preferred CAI to PAPI. Analysis of the
data, which has been successfully passed to the mainframe system, will
edit program to check that errors were detected in the CAI interview, and
checks on the levels of change recorded.
The main problems encountered concerned interviewer training, the
automated call scheduling for CATI and some features of the LTC used by
face-to-face interviewers.
The attempt to give interviewers practice at home in advance of office
training in the use of LTCs was counter-productive: interviewers arrived
for office training more apprehensive about computers than has the
interviewers in previous trials who had not had the LTCs in advance. For a
trial which started in February 1989 (see below) we gave no advance
training but ran a 2-day residential course with up to 3 further days
allowed for practice at home. This appears to have been very successful.
The CATI call-scheduling system did not deal adequately with the range of
different situations that were created by the interaction of many factors,
such as the need to spread appointments, work flow – particularly the
balance of new and previously unsuccessful calls -, criteria for the
timing of repeated attempts to contact households and interviewers’
shifts, so the telephone interviewers reverted to a manual system. The
weighting algorithm was very complex; it could be seriously distorted by
new situations and could not be altered quickly. It was decided that the
productions system would use the reliable manual allocation system, at
least initially.
The Toshiba T1000 was selected because it was much lighter (at 6.1 lbs)
than other machines which met our needs. Interviewers found the screen
less clear than they would have liked. However, when asked if they were
prepared to use heavier LTCs with better screens they preferred to
continue with the T1000. The way that the interview was programmed
required considerable battery use outside the actual interview, to avoid a
hiatus in the household while programs were loaded, and this led to
batteries sometimes running out within the working day. For the February
1989 trial, the start of the interview has been improved so that batteries
can be used just for interviews; this appears to have solved the problem.
6. Future developments
The success of the trial in November 1988 was such that OPCS intends to
proceed to full implementation. The procurement procedure through
selective tender above GATT limits is lengthy; it is intended to replace
PAPI for the Quarterly LFS with CAI in September 1990. In the meantime,
several trials are envisaged to refine the CAI system. In February 1989 a
sample in the same PSUs as the conventional survey had the first of its 5
interviews using CAI. In the autumn of 1989 there will be volume tests
with simulation of full-size data being returned by face-to-face
interviewers in the conventional survey.
A large amount of interviewer training (100 face-to-face interviewers in
the first month of CAI) can only take place just before the start of the
production system, and this will need careful coordination with the other
demands on the training staff during the period. The CAI system and the
training of all the people operating it must be good enough to take over
seamlessly and reliably from the highly successful existing PAPI system in
September 1990.
Investigation of the potential uses of CAI on other household surveys
carried out by OPCS has begun.
Postscript (May 1990): the project has continued to develop as envisaged
in the paper. The procurement process has been completed with the
acquisition of improved CATI microcomputers and fact-to-face laptop
computers which give satisfactory speed and battery performance. Large-
scale training, dress rehearsal testing and transfer of data from the
current computing system to the CAI system has been arranged for July and
August, ready for the first CAI fieldwork in September.
Reference
Denteneer, D et al Blaise, a new approach to Computer Assisted survey
processing (Netherlands CBS, BPA no:13436-87-M3, 1987)
DEVELOPING COMPUTER ASSISTED INTERVIEWING ON THE LABOUR FORCE SURVEY: A
FIELD BRANCH PROSPECTIVE
Nora Blackshaw, Dave Trembath and Alison Birnie
Introduction
To date Field Branch have been involved in four separate fieldwork trials
involving computer assisted interviewing(CAI). The first two trials,
designed to test the feasibility of interviewers using computers during
the interview, have already been reported on (Survey Methodology
Bulletins No 21 and 23). The aim of the latter two trials has been the
development and testing of a questionnaire suitable for main stage
fieldwork on the quarterly element of the Labour Force Survey (QLFS) and
the testing of other computer systems needed to run the survey. These
trials have allowed Field Branch to gain valuable practical experience in
the training of interviewers and in working with research staff to
eliminate sources of potential error in the QLFS questionnaire program.
As described elsewhere in this bulletin (paper by T. Manners), the QLFS
is a panel survey in which, at re-interview, virtually all of the data
collected at the previous wave is checked for changes. The QLFS is a
mixed mode survey; and, as a consequence, the QLFS computer assisted
interviewing program must work for both face-to-face and telephone
interviewing.
1. Developing a questionnaire suitable for computer assisted interviewing
The design of questionnaires on all surveys involves reaching a
compromise between the survey researcher and the needs of other groups of
staff working on the survey – the interviewers, data prep, coders and
programmers. In this respect, designing computer assisted questionnaires
is no different except that fewer groups of staff are involved in the
process.
But,
in
several
respects,
designing
computer
assisted
questionnaires is different from designing paper questionnaires in that
more effort needs to be put into the design and testing of the
questionnaire before main stage fieldwork begins and, in particular, more
thought needs to be given to the needs of interviewers. In computer
assisted interviewing, interviewers are forced to follow a particular
routing and to resolve range and consistency errors before moving to the
next question. If the questionnaire software is poorly designed or the
questionnaire program does not cater for all eventualities – a not
uncommon occurrence on complex factual surveys – the interviewer can be
left in a straightjacket either trying to force the situations
encountered through an inappropriate routing or answers into question
categories that do not fit.
Throughout the trials on QLFS, with the exception of the first, Social
Survey Division has used a questionnaire software program called “Blaise”
which has been developed by the Netherlands Central Bureau of Statistics.
The software has a number of features which have allowed a very sensible
compromise to be reached between the needs of the researcher to obtain
good quality data and of interviewers to respond flexibly to the
situation they meet.
These features include:
a) Don’t Know/Answer Refused Facility.
When a respondent refuses to answer a question or genuinely cannot give
an answer the interviewer, working on a paper questionnaire, can simply
note this fact and move on. On CAI these situations must be allowed for.
Rather than offering “don’t know” and “refused to answer” as precode
categories with appropriate routing at every question, the Blaise
software designates a key for each of these responses. These keys make no
assumptions about routing and the program simply presents the next
question to the interviewer. Although the next question may not be
appropriate, by repeating the “don’t know” or “answer refused” key the
interviewer can continue moving through the questions in sequence until
the next appropriate question appears on the screen.
b) Hard and Soft Consistency Checks.
When the answers to two or more questions fail a consistency check, the
interviewer is asked to reconcile the situation. In doing so the
respondent may confirm that the original answers were correct even though
this combination may appear to be impossible. With a “hard check” the
interviewer cannot move on until the inconsistency is resolved and faces
a dilemma if rightly or wrongly the respondent insists that both answers
are correct. On the other hand, the “soft check” draws the interviewers
attention to the apparent inconsistency; but, if this cannot be resolved
the interviewer can confirm that both answers are correct and move on.
Both checks have their place in the questionnaire program but the “soft
check” certainly allows the interviewer more scope to deal with
unanticipated situations.
c) Correction of information after it has been entered.
Some software programs do not allow corrections to be made once the
answer is confirmed by pressing “Enter”. Such an arrangement would prove
very frustrating for interviewers as information frequently emerges in
the later stages of an interview which involves amendment to previously
given answers. The Blaise program allow the interviewer to go back to
change earlier data and provides a corrected routing where appropriate.
d) Skipping blocks of relevant questions.
In some situations, it is important to be able to skip a block of
questions and return to it at a later point in time. For example much of
the work of Social Survey Division involves interviewing households where
a separate set of questions is asked of each adult household member. On
some surveys each household member must be interviewed in person whilst,
on others, one delegated person can answer for all household members as
on the QLFS. Even in the latter situation an allowance needs to be made
for cases where the person delegated does not wish to take responsibility
for answering on behalf of another household member. If the questionnaire
software program forces the interviewer to deal with each person in a
prescribed order, this would result in interviewers having to make more
calls on some addresses to complete the interview. With the Blaise
software, the back correction facility allows household members to be
interviewed out of sequence simply by adding an extra question which asks
the interviewer to choose for each household member whether to interview
“now”
or
“later”.
“Later”
allows
the
respondent’s
individual
questionnaire block to be skipped. At a subsequent call, amending “later”
to “now” brings up the uncompleted questionnaire block.
Apart from features which allow the interviewer to respond flexibly to
situations encountered in the field, Blaise also has features which can
make the interviewer’s job easier. For example,
a) Post Interview Coding.
On a number of surveys carried out by Social Survey Division,
interviewers are asked to carry out some administrative tasks at home,
the most obvious example of which is the common practice of asking
interviewers to classify each job by industry and by the type of
occupation. On the QLFS, these data occur at different points in the
interview – main job, last job, second job. One useful facility Blaise
has is that it allows all the answer descriptions which require coding to
be moved to a separate block. This block can be separately accessed and
interviewers routed only to those items which require attention. Checks
can be built in to ensure that all post-interview coding has been carried
out.
b) Menus and Help Screens.
The researcher is able to adapt menus, help screens and exiting
instructions (the latter is always present when the questionnaire is in
progress). This gives interviewers greater confidence that screen
instructions will be meaningful and helpful.
Generally Field Branch staff have been very satisfied with the way the
Blaise program has operated on the QLFS. Having said this, there are two
enhancements we would like to see made. Firstly, in the interview, it
would be helpful if the answer recorded was highlighted prior to pressing
“Enter” to assist the interviewer in checking that the correct code has
been keyed. As it stands, the interviewer has to look away from the part
of the screen which presents the question and code list to check what has
been entered in the answer space. Secondly, it would be useful if the
program could print a paper version of the questionnaire in a form which
could be used in the field by training staff assessing interviewer
performance (see section 5).
2. Developing questionnaire programs
It is a fallacy to think that a paper questionnaire can simply be
translated into a questionnaire program. The rigid nature of CAI means
that much more thought needs to be given to the needs of interviewers.
The program must take account of the practical situations an interviewer
may be faced with; and answer codes, routing and consistency checks need
to be well tested. This is likely to mean that research staff will need
to work more closely with field staff to identify and resolve potential
conflicts between interviewer working practices and the questionnaire
program, although a number of those already identified on the OLFS are
likely t be common to a number of our surveys.
A second point relating to questionnaire design is that it is more
difficult for staff not trained in the questionnaire programming language
to suggest sensible solutions to problems. On QLFS it has proved more
efficient once the questionnaire has been set up on the computer for the
researcher and field officer to reach the best compromise by discussing
the areas causing concern and what possible solutions Blaise may be able
to offer.
Finally it is much more difficult for staff to identify routing errors in
CAI. As a consequence, customer testing should form part of the
questionnaire design phase.
3. Training interviewers in computer assisted interviewing
To date Field Branch has trained 50 face-to-face interviewers and 26
telephone interviewers from the pool of interviewers who work on the
QLFS. Interviewers were selected in a systematic way to ensure that they
were broadly representative of the interviewing force as a whole. In
addition, training was given to four face-to-face interviewers who had
not worked on the QLFS. (We have not as yet attempted to train any new
recruits in CAI as a first survey). All interviewers successfully made
the transition to computer assisted interviewing.
The training adopted in the latter trials and the one we shall use when
training interviewers for the main QLFS consisted of:
Telephone interviewers – 10 hours practical work, fully supported by
trainers, with recourse to further training as required.
Face-to-face interviewers – 2 days office training plus home study
and practice interviewing
(It would be noted that face-to-face interviewers needed to be trained in
a wider range of tasks).
The training for both groups of interviewers concentrated heavily on the
use of practice tapes and practice interviewing. Both are felt to be
important. The tapes allow staff to build in a variety of situations so
that interviewers gain experience in handling these. But, when using the
tapes, interviewers do tend to concentrate on the lower half of the
screen where the answer spaces occur. Practice interviews, on the other
hand, allow interviewers to familiarise themselves with the very
different appearance of the top half of the screen which contains the
question, instructions and, in some cases, a list of answer categories.
A number of points of wider interest emerged from the training given on
the various trials.
a)
There is apprehension amongst interviewers who have not worked on
CAI as to how well they will be able to cope with the new technology.
This has eased somewhat as a growing number of interviewers have become
involved in the various trials but is still present to some extent.
Several of the interviewers who attended training described themselves as
the non technical member of the family who could not cope with modern
pieces of equipment e.g. the tape recorder, the video etc. Because of
this unease, in designing training, we have deliberately moved away from
sending detailed advance instructions and towards a brief reassuring
note.
b)
Those with some interviewing experience but who had not previously
worked on the QLFS coped as well with the computerised survey as those
already working on the survey. In some ways, in fact, they found the task
easier. Interviewers already trained on the survey found the change in
the structure and appearance of the CAI questionnaire disorientating,
particularly if they needed to go back to alter an answer.
c)
There was considerable variation in how quickly individual
interviewers came to grips with CAI. About 20% of interviewers were
noticeably slower than the average and a further 20% much quicker. This
means that any initial training has to cope with a wide variation in
assimilation and for this reason we made extensive use of tape recorded
interviews so that interviewers could work at their own pace.
d)
Interviewers on the final trial re-interviewed households at
quarterly intervals. This meant that interviewers had a 2-month gap after
the initial quota before resuming work. Some home study materials were
provided to act as a refresher training. These worked very well, with
interviewers having few problems using the computers on their second
quota.
e)
Learning how to handle the questionnaire proved more difficult for
interviewers than learning how to back-up data, convert from Blaise to
ASCII (a general format) and transmit data. This was partly because
Survey Branch were able to streamline the non questionnaire programs and
provide clear error messages and partly because these procedures consist
of following a list of printed steps. In the latter training courses a
decision was taken to spend less time on practical training in these
areas allowing more time to be spent on practising questionnaire
handling. This worked very well.
f) The areas of CAI which interviewers found most difficult were making
corrections to entries, finding and altering previously answered
questions, and dealing with consistency checks.
The first of these, making corrections to entries, was initially
confusing because of the number of steps, and the nature of the steps
needed to correct an entry varied considerably depending on the nature of
the error. Once learned however this ceased to be a problem.
Finding and altering previously answered questions should become easier
once interviewers become accustomed to the new order and appearance of
the questions but is always likely to be more difficult than on paper
questionnaires because the interviewer no longer has to take decisions
about which question to ask next.
Finally consistency checks are always likely to be an area on which
training will need to concentrate. They will often occur when the
interviewer has not noticed that two items of information are in
conflict; and, as a result, the interviewer can be disconcerted when the
consistency check message appears and totally reliant on the error
message for cues as to how to resolve the problem. In Blaise, the error
message indicates which questions are in conflict using the questions’
eight letter names e.g. “reltoHOH” or “Avpothrs” and the answers given to
each. In addition, an explanatory note can be provided. Although some
consistency error messages are self explanatory, others are less so.
Examples of each are given below.
Example 1 Self Explanatory Error Message
In this example which occurs at the question named Age, the explanatory
text makes it very clear how to correct the error.
HOH, Wife and parents or grandparents can’t be under 16.
Change Age or ReltoHOH
Person[1].ReltoHOH
Person[2].Age
HOH
=15
Example 2 Complex Error Message
At the end of the interview, the interviewer is routed past the
administrative block to a section which asks the interviewer to confirm
that all household members have been interviewed as a means of ensuring
that no one is overlooked. At home, after completing the administrative
section, the routing takes the interviewer through the same sequence
where some codes need to be altered to confirm that not only all
interviewing but also all administrative tasks have been completed. If
the interviewer miskeys at the end of the interview, the result is a
rather complex message as shown below.
Someone’s coding block still has to be checked (go to
CodChk or DoneCode should be No (go to Donecode) or
IntvNow is wrong (go to IntvNow)
Person[1].IntvNov(1)
Outcome.DoneCode
C[1].CodChk
Now
YesCoded
=Empty
g) With face-to-face interviewers working from home, it is essential that
the majority of problems they encounter in using the equipment can be
resolved by telephone discussion. This proved to be easier than expected.
Even on one trial where the interviewers’ office training proved to be
inadequate and where we received many calls from interviewers, all
problems were resolved by telephone. There were no cases where staff had
to be despatched to give further tuition in person.
4. Interviewer reactions to computer assisted interviewing
On the whole both telephone and face-to-face interviewers were very
enthusiastic
about
computer
assisted
interviewing.
When
asked
approximately 90% said that they preferred the new method of interviewing
to using paper questionnaires. By far the most common reason given was
that it removed form interviewers the burden of deciding which questions
to ask next. This is not surprising in that the QLFS has particularly
complex routing instructions. Other reasons given were that the
interviewers felt a great sense of achievement that they had coped
successfully with the new technology and felt that, with the spread of
computers, their use for interviewing enhanced the job’s professionalism.
In addition to the above point, the telephone interviewers found the fact
that respondents could hear them entering the data very helpful in
controlling the interview. (There is a tendency for telephone respondents
to try to fill the gap if there is a pause whilst the interviewer records
a verbatim answer on the paper survey).
The areas of computer assisted interviewing with which the interviewers
were least satisfied were the screen quality (face-to-face interviewers)
and the pace of the interview (both groups of interviewers).
Face-to-face interviewers used Toshiba T1000 laptops. These machines have
an LCD supertwist screen display with the ability to adjust the angle of
the screen and the degree of contrast. Interviewers were advised to give
more thought to where they sat in relation to lighting and, if necessary,
to ask whether lights could be switched on. Even so, interviewers
reported that in 20% of interviews the screens were difficult to read. In
most cases they did manage to complete the interview although in just
under 1% of cases the lighting was so poor that the interviewer was
forced to revert to paper questionnaires. Despite these difficulties,
interviewers on the whole regarded the screens as adequate and preferable
to paper questionnaires. The majority of interviewers also preferred a
lightweight machine – the Toshiba’s all inclusive weight (including
carrying case, papers and transformer for emergency use) was 8.5 to 9
pounds – to a heavier machine weighing 14 to 15 pounds with a better
quality screen.
Interviewers reported that the speed with which the next question
appeared on the screen was only just adequate particularly when dealing
with large households. Staff at the Netherlands Central Bureau of
Statistics are already working on improvements to the Blaise program
which are designed to deal with this problem.
Within the last few months, a number of new or improved lightweight
laptops have been put on the market (including a new version of the
Toshiba T1000). These latest laptop models generally run at a faster
speed and have better quality screens than the Toshiba T1000s used in the
trials. It is therefore expected that the laptop model selected for main
stage interviewing will be more than adequate for our requirements.
5. Reaction of interviewer trainers to computer assisted interviewing
As part of the trials, Field Branch needed to consider what effect the
move to CAI would have on the way in which interviewer trainers carry out
their role of training and assessing interviewer performance.
Several trainers were given the same training as interviewers and then
asked to accompany two CAI interviewers on their fieldwork. Trainers were
asked to record the interview on a laptop – in effect both interviewer
and trainer recording the interview – as this was the only way the
trainer could check that questions were correctly asked and probed. After
the interview, the trainers were instructed to discuss the interviewer’s
performance in the usual way.
Trainers reported that there were no adverse comments form members of the
public to two people entering answers on laptops. They did however find
that it was impossible to both keep a detailed record of the interview
and key in the respondent’s answers. If they stopped to make a note, they
were in danger of missing the next answer; and, if this formed part of
the questionnaire routing, they were unable to rejoin the interview in
the way that they can on a paper questionnaire. As a consequence they
felt that their feedback to interviewers was impressionistic and
incomplete. From our discussions with the trainers, we have come to the
conclusion that, in order to do their job effectively, trainers will need
some form of paper questionnaire although the style of this document may
be quite different from questionnaires used as data entry documents.
Trainers also reported that they felt at a disadvantage when joining
interviewers part-way through the CAI quota. By this time the interviewer
was feeling quite confident about using the equipment. Trainers, on the
other hand, felt less confident and were particularly concerned about
whether they would be able to help the interviewer if assistance was
needed during the course of an interview. It was felt that trainers
needed to undertake some practice interviewing in order to consolidate
their office training and home study. In addition, as trainers will not
be using laptops regularly in the field, thought needs to be given to
ways of maintaining their expertise and confidence.
6. In conclusion
We now have considerable experience in training interviewers to use
computers on the Labour Force Survey. The trials have shown that all
interviewers can successfully learn how to use the new technology but
that there is considerable variation in the speed with which they grasp
the new working techniques. Those designing CAI training will need to
take account of this fact.
In designing questionnaires for CAI, much thought needs to be given to
providing interviewers with an acceptable working environment and to
minimising the chances of an interviewer meeting an unresolvable problem
in an interview. Testing of questionnaires needs to be much more thorough
than for paper questionnaires.
One point that other researchers should bear in mind is that the QLFS is
a fairly short interview with complex routing but relatively simple
consistency checks. Those working on surveys with large numbers of
consistency checks or where the consistency checks are very complex may
need to consider whether interviewers should be expected to resolve all
inconsistencies during the interview or whether there remains a role for
a final office editing stage.
TOTAL HOUSEHOLD INCOME
NATIONAL TRAVEL SURVEY
AS
MEASURED
BY
GENERAL
HOUSEHOLD
SURVEY
AND
Daniel Kelly
This article presents a secondary analysis of data from the 1986 General
Household Survey (GHS) and the 1985/86 National Travel Survey (NTS),
relating to income.
The variable ‘Usual gross household income’ on the NTS was compared to
the variable of the same name on the GHS to see how similar the data
were.
The GHS collects ‘Usual gross household income’ in the form of separate
amounts from different sources which are then added together; the NTS
simply asks the informant to indicate the band into which it falls.
Owing to differences in non response in the two samples, the results are
inconclusive. However, the apparent similarity of NTS and GHS income data
found here suggests that further research might be useful.
Background
The GHS has a large core section including questions on Population and
Fertility, Housing, Employment, Education, Health, and Income. Core
questions are funded by OPCS and are selected partly because they are
used by many clients.
In 1979, it was decided to expand the GHS income section so that it would
be closer in design to the Family Expenditure Survey (FES), in particular
to produce comparable measures of current income. Previous to 1979, GHS
income data were used almost entirely for classificatory purposes, but it
was felt that the new section would provide useful income variables for
analysis in their own right. It was also suggested that FES and GHS
income data could be combined in order to give a larger sample.
Though the core now includes a 15-page income questionnaire, income data
supplied to clients by the GHS still consists mainly of tables which use
banded income data for classificatory purposes. Though means and medians
are occasionally used, it is very rare for these tables to include
component income information, e.g. on an individual benefit. The detailed
income data are available on tapes supplied to some government clients
and also on tapes deposited with the ESRC Data Archive for release to
academics but the GHS Unit is not aware of any use by secondary analysts
of GHS and FES income data in combination.
The GHS calculates the income variables used in analysis from the
detailed income data which are collected. This is a time-consuming and
expensive procedure which delays the production of GHS results.
This procedure also lowers the effective response rate for income
variables. Derived income variables on the GS are not calculated for a
case where component data are missing such as building society interest.
The missing components usually make a negligible contribution to total
income. In 1986, the GHS derived ‘gross household income’ for only 71% of
households co-operating in the GHS. This gives an effective response rate
of only 60% for this variable, increasing the possibility of bias.
Recently the GHS has considered simplifying the income section by asking
directly for the banded information used in the majority of analyses.
Though largely modelled on the GHS, the Continuous Household Survey of
Northern Ireland asks directly for the income variables use in analysis.
The head of household (HoH0 is asked into which income band the total
household income falls.
The proposal to simplify the GHS Income section was put to the CSO Subcommittee on the GHS in 1989. In the same year, however, the Department
of Social Security (DSS) proposed an extension of the income section for
a project on the characteristics of poorer households. DSS agreed that
banded data would be acceptable if they were sufficiently accurate.
The NTS asks a simple direct question about household income. It also
measures various household characteristics, such as the number of
children and the number of cars in the household. It was therefore
decided that secondary analysis of GHS and NTS income data should be
carried out in order to throw light on the extent to which the measures
of household income provided by the two surveys are similar.
Comparability of the Data from the Two Surveys
There were three ways in which the data were not exactly comparable.
The first was that on the two surveys, most questions on similar topics
were not asked in exactly the same ways. Question differences were
largely overcome by regrouping the responses. For example, on the GHS the
categories ‘waiting to take up job’ and ‘seeking work’ were combined to
make the category ‘unemployed’, in line with the NTS. When regrouping
responses to questions did not solve this problem, certain cases were
excluded.
Eleven household variables were derived which, it was felt, had
equivalent categories on the two surveys. These were: Number employed in
household, Number of household cars, Working status of HoH, Number of
adults in household, Number of children in household, Length of residence
of HoH, Address type, SEG of HoH, Number of persons in household, Family
structure and Tenure. These variables were then used for analysis.
The second problem of comparability was inflation. Data were taken from
the GHS (Jan-Dec 86) and from the NTS (July 85-June 86). To deal with
this problem, households on the GHS were split into six groups which were
roughly sextiles but were formed to coincide with the proportions of
households falling into NTS income bands. These six groups were then used
for analysis.
The proportions of Households in Each Income Group
Group
Group
Group
Group
Group
Group
A (highest):
B:
C:
D:
E:
F (lowest):
18.6% of households
16.1%
16.7%
15.7%
14.1%
18.8%
The third problem of comparability resulted from a difference in nonresponse between the two surveys as shown below.
Response Rates for the Two Surveys
GHS
NTS
a:
response rate for survey
84%
76%
b:
response rate for gross hshld income
71%
86%
c:
effective response rate (a*b)
60%
65%
[all analyses reported in this paper
imputation for missing income data]
were
carried
out
without
any
The GHS had a higher overall response rate. However, for reasons
mentioned above, gross household income was only calculated for 71% of
households co-operating with the GHS. The NTS, however, recorded this
variable for 86% of cooperating households so that the effective response
rate in the NTS was higher.
Non-response in the two surveys may have introduced bias. This bias may
be different in the two samples.
Analysis
The first part of the analysis compared the income characteristics of
sub-groups from the two samples. See Table 1. For each sub-group (e.g.
one-person households) the percentage of households falling into each
income sextile on one survey was compared to the other.
As can be seen in Table 1, the percentage of one-person households
falling into each income group is not significantly different in the two
surveys. The same is true for one-car households.
The above analysis was carried out for each value of the eleven variables
listed above. The results obtained from this analysis were generally as
similar as the examples shown here in Table 1.
The DSS is particularly interested in an analysis of the characteristics
of poorer households. It was therefore decided to compare the
characteristics of households in the lowest income group on the two
surveys, i.e. group F, those in the bottom 19% of the household income
distribution.
We first checked whether the overall distribution of each of our eleven
variables was similar in the two samples. A variable was considered
similar on the two surveys if none of the pairs of percentages showed a
difference significant at the 5% level.
Unfortunately, a comparison of the two samples showed that eight out of
the eleven variables had significantly different distributions for the
samples as a whole. This probably due to the differences in sample bias
discussed above. These eight variables were excluded from this part of
the analysis. The other three variables are presented in the accompanying
diagram.
------------------------------------------------------------------------Table 1: Income Distribution for One-Person Households and One-Car
Households
Income
Groups
One-person
households C.I.
One-car
households C.I.
GHS % NTS %
GHS % NTS %
GROUP A (highest):
2.8
2.5
y
19.5
19.4
y
GROUP B:
3.4
3.8
y
20.8
20.3
y
GROUP C:
8.5
8.6
y
23.7
22.1
y
GROUP D:
14.0
12.6
y
19.6
20.4
y
GROUP E:
16.2
15.8
y
11.7
13.4
y
GROUP F (lowest):
55.1
56.7
y
4.7
4.3
y
tot%:
100
100
100
100
base:
2103
2321
3172
3923
(C.I. = confidence interval with a standard error of
where p1 is the percentage and n1 the base for survey i)
(y = yes, the difference between the two percentages is not significant
at the 5% level, applying a two-tailed test and assuming a design factor
of 1.3)
------------------------------------------------------------------------The proportion of households in the lowest income group owning or buying
their homes is 28.8% in the GHS and 30.7% in the NTS. This is the largest
difference that was found and it could be due to sampling error alone.
None of the three characteristics it was possible to compare in the
lowest income group on the NTS is significantly different from those in
the lowest group on the GHS. Using these three variables, the banded
income question from the NTS seem as good for classificatory purposes as
the derived variable from the GHS.
Conclusion
The results presented in this report are inconclusive. Though the income
data from a direct income question seem to be similar to GHS data, it is
unclear what effect the sample differences had on the analysis.
A comparison of detailed and banded income data collected from the same
individuals would be useful.
MEMORY PROBLEMS IN FAMILY BUDGET SURVEYS: I. DIARIES
Bob Redpath*
Last summer in Paris at the International Statistical Institute
Conference there were a number of papers presented which dealt with the
topic of memory errors in survey research; three papers dealt with
research carried out on behalf of the Bureau of Labour Statistics (BLS)
into memory errors which occur on the US family budget survey, the
Consumer Expenditure (CE) Survey and also how to reduce these errors**.
References to Jacobs below relate to the paper by Jacobs, Jacobs and
Dippo. This article attempts to cover the research findings about similar
memory problems which occur on two continuous diary surveys carried out
in Great Britain, the National Food Survey (NFS) and the Family
Expenditure Survey (FES).
The NFS is sponsored by the Ministry of Agriculture, Fisheries and Food
and is carried out continuously throughout the year to provide the basis
for National Accounts estimates of consumer expenditure on food and for
economic and nutrition analyses. The fieldwork and coding are currently
contracted out to British Market Research Bureau: OPCS draws the sample
and oversees the methodology of the survey. Roughly 7,800 housewives
cooperate each year, a response rate of about 58%.
One person, the housewife, the person who is responsible for most of the
purchased food brought into the home, keeps a seven-day diary record of
expenditure on food brought home by her/himself and all other members of
their menus and whether or not each member of the household was present
at the meal. Demographic details about the household are collected and
there is one retrospective question which asks the respondent to recall
how much food was bought for the household during the week prior to the
date of interview.
The FES is mainly sponsored by the Central Statistical Office and its
chief purpose is to provide the CSO with up-to-date expenditure weights
for the Retail Prices Index. The data are also used in some of the
estimates of consumers’ expenditure in the National Accounts. As well as
expenditure data the FES collects detailed information about income
*
Bob Redpath was the Principal Social Survey Officer in charge of the
Family Expenditure Survey from 1976 to early 1990.
**
“The U.S. Consumer Expenditure Survey” E Jacobs, C Jacobs, C Dippo.
“The Use of Cognitive Laboratory Techniques for Investigating Memory
Retrieval Errors in Retrospective Surveys” C Dippo.
“Reduction of Memory Errors in Survey Research: a Research Agenda” J
Lessler. th
I.S.I. 47 session Paris, August 29 – September 6 1989
and this is widely used by other government departments in analyses of
the impact of policy, such as the take-up of income-tested state
benefits, the impact of existing or alternative taxation systems. OPCS
carries out the sampling, fieldwork and coding of the survey as well as
methodological studies.
All persons aged 16 years or over (called ‘spenders’) are asked to keep
14-day diaries of all personal expenditure and business expenditure
(which is ultimately deleted). In addition there is an initial interview
before diaries are placed in which retrospective questions are asked
about both regular of infrequent and large expenditure and income
sources. The approach to collecting data by the retrospective method
varies from asking about the last payment/receipt to asking for
expenditure or income over a fixed period which can range from as little
as a week to a year from the date of interview.
All spenders are asked to keep diaries and all spenders are asked the
interview questions; in fact cooperation is only counted if all spenders
offer to keep diaries and try their best to provide information in the
interview. An incentive of a £5 postal order is sent to each spender
several weeks after the completion of the survey.
Response is around 70-72%, yielding about 7,200 households a year in
Great Britain.
Memory problems on family budget surveys can be divided into memory
problems that occur when respondents are asked to keep diaries and memory
problems that occur when respondents are asked in an interview to recall
transactions that have occurred in the past. Kemsley referred to these
broadly as current enumeration and post enumeration.
This article will
be presented in two parts, the first which treats problems encountered
with diary recordkeeping and the second with retrospective recall.
Jacobs mentioned problems encountered with the CE diary survey which is
one of two BLS surveys measuring consumer expenditure. Fourteen day
diaries are used as with the FES. Even though Bureau of Census
interviewers are instructed to place diaries on specific days of the week
in order to insure an even distribution within the week, it is still the
case that the first day of recordkeeping shows higher expenditure of the
first week is greater than that of the second week. Jacobs concluded that
respondents use the first day more than the following days to comply with
diary reporting and that the interviewers are checking the first-day page
more carefully than the other pages.
Jacobs also mentioned a cueing effect on diaries which seemed to occur
when items were described on diaries. Listing more examples on the diary
page had a beneficial effect on reporting. Research also showed that, if
diaries had specific items printed on lines allowing respondents to only
record price, this led to greater first day bias than diaries with no
preprinted descriptions.
First day effect
There is conclusive evidence from the American CE diary survey that
recordkeeping on the first day is higher than on subsequent days. However
this is not the case with the FES at the times when this has been tested.
It should be pointed out that OPCS interviewers do not have to “place”
diaries (that is, to get diary keeping started) on specified days of the
week, as the Bureau of Census interviewers do with the CE diary survey.
Table 1 below compares the average total household expenditure per
recordkeeping day as measured during February-March 1978 and also during
two experimental periods in 1976 and 1978. In order to facilitate
comparison, averages per recordkeeping day are expressed as indices based
on the average expenditure per day over the fortnight.
Table 1 Expenditure per recordkeeping day/x daily expenditure x100
Family Budget
Survey
Family Budget
Survey
Experiment 1
(1976)
Experiment 2
(1978)
FES (Feb-Mar 1978)
Day 1
123
102
110
Day 2
110
121
111
Day 3
113
125
123
Day 4
104
123
104
Day 5
92
77
96
Day 6
78
67
86
Day 7
96
59
77
Day 8
107
112
100
Day 9
98
104
102
Day 10
98
128
120
Day 11
104
127
109
Day 12
92
103
92
Day 13
81
66
82
Day 14
102
87
85
Base (per
Day
£9.03
£6.43
£4.60
N.B.
It should be pointed out that the experimental figures are higher
than those on the FES because they include all outgoings and not
just expenditure.
The experimental periods warrant some explanation. During the mid 1970’s
there was concern about under-recording of alcohol and tobacco
expenditure in relation to National Accounts. Two feasibility studies,
entitled the Family Budget Survey experiments, tested a methodology which
asked respondents to balance all cash outgoings against cash incomings on
a daily basis. Respondents were asked to count and record all cash in the
house at the beginning of each recording day; then during the day to
record all cash outgoings and incomings. At the end of the day the
incomings were added to the cash counted at the beginning of the day and
the outgoings subtracted. The calculated residual should equal the cash
in the house remaining at the end of the day – which was again counted.
Any difference was an indication that either outgoings or incomings had
been unrecorded depending on the sign of the difference.
Overall the amount of unrecorded expenditure was very little, roughly one
percent of total outgoings; so the discipline of balancing and probing
for missing recorded expenditure meant that recordkeeping was nearly
perfect for a 14 day period.
Table 1 shown that only in the first family budget survey experiment was
first day expenditure higher than on all other days. In the second
experiment the highest days occurred on days 10 and 11 and in the FES, on
the third and tenth days.
On the other hand all three studies showed similar patterns of above
average expenditure on days 1-4 and below average expenditure on days 5,
6 and 7. What is noteworthy is the bi-modal feature of all three indexes.
The upsurge in expenditure in the second week would seem to argue against
the notion that respondents become increasingly fatigued as recordkeeping progresses. Or, put another way, record-keeping fatigue does not
explain these distributions. Therefore other possible causes were
investigated. Similarly expenditure on the last two days of the second
week was generally lower than on other days.
Two hypotheses were tested to try to explain this pattern. First it was
assumed that recordkeeping would increase in anticipation of or directly
following the interviewer’s return visit to the household. Table 2 shows
the distribution of interviewer return visits to 564 households during
February-March 1978 by recording day. The number of visits per recording
day is divided by the average number of visits per day over the 14 day
period and expressed as an index with a base = 100.
Table 2
Distribution of index of interviewer return visits per
recording day
Day of recording
Index of interviewer
return visits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
3
10
62
98
93
101
83
176
94
37
37
20
23
566
Base = 125 visits per day during fortnight
The peaks for interviewer return visits occur on days 8 and 14 and yet as
Table 1 shows there appears to be no anticipatory increase in spending
before either of these days; nor does it show any reactive increase in
expenditure in the days immediately following the two peaks in the
distribution of interviewer return visits.
Another hypothesis was that the recordkeeping pattern could be brought
about by uneven placing of diaries within the week. OPCS interviewers are
asked to place roughly four to five households each week over a month’s
duration; however they are not required to place on specific days during
the week (unlike their American counterparts).
Table 3 compares days of the week on which FES diaries were placed in the
February-March 1978 study and also throughout 1976.
Table 3
% of FES diaries placed on each day of the week
(Feb-Mar 1978)
%
placed
(1976)
%
placed
Monday
16
18
Tuesday
22
23
Wednesday
23
22
Thursday
20
18
Friday
14
12
4
5
1
___
2
___
Day of week
Saturday
Sunday
Base (nos) = 100%
1179
7051
The placing patterns were fairly consistent between the two studies in
that almost two thirds of households commenced diary recordkeeping on
either Tuesday,
Wednesday or Thursday.
The effect of this skewed
placing pattern was, as is shown in the first column of Table 4, to
concentrate a higher than expected proportion (3out of 7 = 43%) of
Thursdays, Fridays and Saturdays in days 3/10 (65%), 4/11 (61%), and 2/9
(57%). The second column (Index) shows the percentage distribution for
each day divided by the expected percentage (43%) x 100.
Table 4
% of Thursdays,
recordkeeping
Fridays,
Saturdays
occurring
by
day
of
Index
Day of recordkeeping
%
1/8
38
88
2/9
57
132
3/10
65
151
4/11
61
142
5/12
39
91
6/13
21
49
7/14
19
44
Based on the only data available (shown in Table 5), these three weekdays
were the days of highest average expenditure: Thursday because of
extended evening opening hours; Fridays because wages were received; and
Saturdays for obvious reasons. (The 1978 data predate on this day
legalised Sunday shopping).
Table 5
Average expenditure per weekday (1966-67)
£
Monday
4.18
Tuesday
4.18
Wednesday
3.54
Thursday
4.83
Friday
7.08
Saturday
6.76
Sunday
1.61
It then became apparent that the irregular pattern of expenditure by
recordkeeping day (Table 1) might be at least partially explained by the
irregular placing pattern (Table 3) in conjunction with the irregular
pattern of expenditure by weekday (Table 5). Figure 1 compares three
indexes: week 1 of the February-March 1978 FES; week 2 of the FebruaryMarch 1978 FES; and the index of the % of Thursdays, Fridays and
Saturdays occurring on each recordkeeping day (placing pattern index).
The numbers on which Figure 1 is based are shown in Table 6.
Table 6
Indexes of placing pattern and daily expenditure
Expenditure indexes
(FES, Feb-Mar 1978)
Day
Placing Pattern
Index
Day
Day
1/8
88
1
110
8
100
2/9
132
2
111
9
102
3/10
151
3
123
10
120
4/11
142
4
104
11
109
5/12
91
5
96
12
92
6/13
49
6
86
13
82
7/14
44
7
77
14
85
Figure 1:
Comparison of index of expenditure per day (FES,
Feb – Mar 1978) with placing pattern index
The first noteworthy feature is the remarkable fit between the curves of
week 1 and week 2 of the FES data. In one sense this is reassuring in
that the biases are more or less equal in each of the two recordkeeping
weeks. The bimodal characteristic also dispels the notion that
recordkeeping fatigue increases with the length of recordkeeping. One
must hasten to add that these conclusions can only be tentative, base as
they are on one study.
However, comparing the curves of the FES data with the curve of the
placing pattern index seems to give substance to the hypothesis that
uneven placing patterns may give rise to the variations in expenditure by
day of recordkeeping. The peak in all three sets of data occurs on days
3/10 and the curves decline thereafter although at varying gradients.
One would not expect to explain day by day variation in expenditure by a
single all-embracing hypothesis. Nevertheless the point that Jacobs made
about the need to control diary placement by day of week is worth bearing
in mind. In the case of the FES the biases to expenditure were not
thought to be serious enough to warrant a major and costly revision to
fieldwork procedures. It is worth noting that the US CE diary survey was
designed form the outset to try to place diaries by day of week.
Inter-week variation
At the same time this clustering effect seems to be independent of any
inter-week bias, i.e. the first week’s expenditure on average is greater
than that of the second week.
Average household expenditure as recorded in the diaries in week 1 of FES
recording has always been marginally higher than average expenditure in
week 2. In 1978, the average expenditure per spender over the whole
sample in week 1 was £32.63 and in week 2 £31.76, a decrease of 2.7%; in
1987 the decrease was 2% (Table 7).
Table 7 shows the inter-week variation per recording spender in 1987 for
the following:
-
average expenditure per sample (base = total sample)
number of recording spenders
number of transactions recorded
average expenditure per recording spender (base = recording
spenders).
Table 7
Inter-week variation by expenditure groups
Week 1
Week 2
Week 2/
Week 1 x 100
Expenditure Group
Food
average expenditure (£)
19.14
number of recording spenders
13315
number of transactions
327531
average exp/recording spenders (£) 20.39
18.30
13043
304808
19.90
96
98
93
97
4.65
6894
20188
9.57
4.48
6526
19285
9.74
96
95
96
102
2.51
4680
18956
7.59
2.36
4512
17739
7.43
94
95
96
98
3.60
2250
5900
22.50
4.01
2254
5695
25.26
110
100
99
112
6.42
2540
4025
35.85
6.06
2413
3748
35.62
94
95
93
100
6.63
5184
11101
18.13
6.35
4889
10465
18.42
94
95
94
102
5.49
4424
7121
17.60
5.25
4133
6808
18.00
96
93
96
102
8.01
12654
85819
7.58
12213
79296
95
92
92
Alcohol
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Tobacco
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Fuel, light and power
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Table 7 (continued)
Housing
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Clothing and footwear
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Durable household goods
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Other goods
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
8.98
8.80
98
12.54
9210
25391
19.32
13.05
8738
24534
21.17
104
95
97
110
12.99
9325
22257
19.76
11.96
8786
20338
19.34
92
95
97
98
0.31
1022
22257
4.37
0.31
864
20338
5.21
100
85
97
120
average expenditure (£)
82.27
number of recording spenders
14182
number of transactions
530575
average exp/recording spenders (£) 82.27
79.72
14048
494379
80.48
97
99
93
98
Transport
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Services
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
Miscellaneous
average expenditure (£)
number of recording spenders
number of transactions
average exp/recording spenders (£)
All expenditure
With the exception on fuel, light and power (+10%), transport (+4%) and
miscellaneous expenditure (=) all expenditure groups showed decreases in
the second week. Admittedly significance tests have not been carried out;
yet this study was based on recording by 14,182 spenders during 1987.
The deterioration in recordkeeping is shown by the consistent decreases
in the numbers of recording spenders and the numbers of transactions
recorded in the second week for all expenditure groups (except for fuel
where the numbers of recording spenders grew by 4).
The second balancing experiment, which used residual discrepancies to
probe for unrecorded expenditure, did not exhibit any interweek bias. It
therefore seems likely that the FES results reflect a decline in
recording rather than a decline in spending with fewer spenders keeping
records and more transactions being forgotten.
There are wider implications of this kind of study for diary
recordkeeping. Measuring interweek variation for different sub groups
provides the researcher with clues as to which group have difficulties in
keeping diary records. Table 8 shows the interweek variations in
expenditure for sex and age sub groups.
Table 8 Ratios of second to first week expenditure x 100 by sex and age
(1987)
Age
Men
Women
Under 31
94
102
31 – 49
99
101
50 – 64
85
102
65 – 75
103
98
Over 75
92
87
Base (Spenders) 6708
7474
I would appear that up until age 65 the recordkeeping of women does not
deteriorate in the second week, whereas it does for men of these ages.
From age 65 onwards it appears that men are better recordkeepers. Of
course this does not take into account the fact that women make more
expenditure transactions than men and therefore have more diary entries
to make each day as Table 9 shows.
Table 9
Average number of diary entries per day by sex and age (1987)
Age
Men
Women
Under 31
24.33
39.84
31 – 49
24.93
60.10
50 – 64
22.15
49.20
65 – 75
21.18
40.19
Over 75
20.18
30.41
6708
7474
Base (Spenders)
Women make more diary entries than men; almost double the number. There
is also an ageing effect, slight with men, but more dramatic for women
from ages 31 onwards. This suggest that memory problems may affect the
elderly particularly.
There is another implication of the study of interweek bias particularly
if the deterioration in the recordkeeping is admitted. Often there are
suggestions that recordkeeping should be extended – not necessarily in
order to capture all expenditure.
In 1985 FES respondents were asked if they would keep records of
expenditure on home improvements, extensions, central heating and a
number of other large infrequent purchases for varying periods of time.
The proportions prepared to carry on with records declined the longer the
period. 84% would keep records of expenditure on these items for an
additional month; 78% for an additional three months; 67% for six months
and only 54% for a full additional year (which was the client’s ideal
time period).
Therefore there was a reasonable estimate of the potential wastage to the
sample of extending recordkeeping in terms of numbers lost as well as
possible increase in non response bias. However, there is no measure
other than inter-week variation which can measure the deterioration in
recordkeeping that certainly occurs over a two week period and is likely
to occur increasingly over time the longer records are kept.
It is for this reason as well as for response reasons that diary
recordkeeping is limited to relatively short periods – short in terms of
the ideal period that clients would like captured and short even in terms
of what respondents feel is typical of their own individual patterns.
Often interviewers are told by respondents that they have been sampled in
an atypical two weeks.
In sum the interest of OPCS and its clients in monitoring interweek
variation has been mainly to see if the understatement of FES alcohol and
tobacco expenditure when compared with National Accounts could be
explained by a decrease in expenditure on these two items during
recordkeeping. The interweek variation is too small to account for the
discrepancies of 40% (alcohol) and 30% (tobacco). Nevertheless measuring
interweek bias is a tool which can indicate sub groups which have
difficulty in keeping records and also remind the researcher how
‘unnatural’ recordkeeping is for most people, notably that their
attention span and ability to record accurately wanes with time. Two
weeks may be just about enough time to ask people to keep daily records
without beginning to attack reliability unduly.
Possible remedies
One does not want to leave the reader with the impression that
deterioration of recordkeeping is a phenomenon that researchers must
accept passively. The balancing of incomings and outgoings has been
mentioned but a warning should be added that this is a very testing
methodology and was used only experimentally without any intention of
implementing it on a large scale.
There are simpler devices such as prompt lists which can be left with the
household as well as the use of checking schedules (see Annex A) which
the interviewer administers at home while she checks the diaries. On the
NFS the interviewer often finds items of food listed in the family menu
for a meal which have not been listed as food purchased and she then
probes whether or not the item has been recorded. For example Annex B
shows the results of a probe that resulted in discovering an unrecorded
mid-day meal menu comprising trout, broccoli and potatoes.
See encircled writing on the menu page). The respondent had forgotten to
record these items in the menu and also as food purchased that day. These
were entered by the interviewer on both pages.
Another example of how interviewers can help respondents with memory
problems is best illustrated by what happens when they are not present to
probe for missing detail.
In 1979 an experiment was carried out on the NFS which compared the
quality of data of 14-day diaries returned by post by respondents with
14-day diaries which had been checked at a final call by interviewers.
There were obvious gains in terms of cost if a final call by
interviewers could be avoided. However, first the quality of data
obtained without the benefit of interviewer probing needed to be
assessed.
Cooperating households in two separate treatment groups in the third
quarter 1979 NFS sample were given 14 day diary records. With one
treatment the interviewer would, as is currently done on the NFS, return
to the household at the end of the 14 days to collect the diaries and
probe for missing detail. With the other treatment the interviewer would
ask the respondent to return the diaries within three days of the end of
the 14 days to the survey agency. Pre-stamped and pre-addressed
envelopes were provided.
With both treatments interviewers made a call midway through the
recordkeeping to see how recordkeeping was proceeding and to probe for
missing detail, thereby ‘educating’ respondents about the degree of
detail needed. Although no financial incentive is offered to NFS
respondents, in this case a £2 postal order was offered to those in both
treatments who completed diaries.
There were two aspects to the experiment which were of interest: first,
there was an assumption that response would suffer if interviewers were
not present at the end of recordkeeping to pick up the diaries; secondly
that the detail entered by respondents would not be sufficient for
coding.
Overall response to the postal return treatment was lower by 3.5%, most
of which was accounted for by housewives abandoning recordkeeping – which
illustrates directly the effect of interviewers on response.
Comparing the proportions of respondents who produced usable diaries
between the two treatments showed no significant differences between age
groups, tenure or household type. However, the average expenditure per
week recorded by those who returned their diaries by post (£6.12) was
lower than that recorded by those whose diaries were collected by
interviewers £6.36).
The main difference between the treatments was in the quality of the
detail recorded for items. NFS respondents are briefed by interviewers to
record price, quantity and description of each food item separately.
Although missing quantities can be imputed if they are standard, lack of
detail nevertheless can lead to difficulties for coders. If too many
items are omitted, this leads to diaries being rejected – although this
is rare. The proportion of rejected budgets for the postal treatment was
only 1.8% compared with 1.2% for the interviewer treatment.
Table 10 shows the types of missing information for the two treatments.
Table 1
Omissions per diary by treatment
Omissions per diary
Postal
Interviewer
Quantities missing
5.03
1.29
Prices missing
0.38
0.16
Total
5.41
1.45
There were on average 5.41 omissions per diary by the postal method
compared with only 1.45 with the interviewer collection method.
Quantities were the main missing item for both treatments but also the
most difficult to impute as most of the missing items turned out to not
be weight-marked. (The housewife is supposed to weigh fruit and
vegetables.
Another indication of the poorer quality of the postal method was that
only 14% of the postal sample were completely free of omitted details
compared with 53% of the interviewer collected sample.
The conclusion was not to proceed with postal return because of the lack
of confidence in the reliability of the results for the non weight-marked
foods and because of the lower response with the postal method.
Cueing
Jacobs mentioned the research which the BLS has carried out into the
effects of the presence or absence of lists of items on diaries.
Broadly listing the items to be included leads to better reporting than
not listing items. She calls this ‘cueing’. An example of possible
‘over-cuing’ may have occurred when there was an attempted merger of the
FES and the NFS in 1981. During two months there were experiments carried
out which asked respondents to record in diaries the more detailed food
requirements of the NFS along with all the remaining expenditure required
by the FES.
Because the NFS was carried out by another agency, most of the training
of OPCS interviewers, who had already mastered the FES requirements, was
to learn the detailed food requirements of the NFS. Some indication of
the greater detail required for food at least was the fact that there
were roughly 180 NFS food codes compared with only 60 FES food codes.
Also quantities and weights were required in addition to prices, whereas
for the FES only expenditure is recorded.
The results, reported in a separate New Methodology series report*,
showed that all expenditure was under-reported in relation to both the
NFS and FES. This was attributed to additional burdens on respondents
leading to under-recording and in turn to interviewers who may have felt
subconsciously that informants were overloaded and so were less
persistent in pressing for detail. This is indeed another insight into
the importance of the interviewer role – if she (or he) is not confident
that respondents can carry out all the tasks required this may affect
respondent’s performance of the tasks.
However, there is another possibility that there was an ‘overcueing’
effect based on an over-emphasis on recording food. During the second
month, prompt cards listing all the food items required were left with
households. Interviewers emphasised the need for detail about recording
food reflecting their own growing mastery of the NFS requirements. The
results for the second month showed that the food averages which had been
11% below that of the NFS were now more or less equal; however, the
averages for non food items were depressed even further.
It is extremely difficult to be authoritative about the reasons why food
recording converged on the expected average and non-food items diverged.
However, one reason may have been the training factor whereby
interviewers learning new food recording requirements for non-food
recording. Also the emphasis in all the ‘cues’ given to respondents were
exclusively related to food recording requirements.
On the basis of the results obtained from the two months studies there
were doubts about the reliability of information obtained from a
combined survey. However there was also optimism that over time the
problems of interviewer confidence would have been reduced.
_________________________________________________________________________
*
The Family Expenditure and Food Survey Feasibility Study 1979-81
R Barnes, R Redpath, E Breeze NM 12.
The intention has been to report some of the research OPCS has carried
out into memory problems with diaries and with retrospective questions.
The papers given by the Jacobses, Dippo and Lessler at the ISI
Conference have served as a stimulus. This first article deals with OPCS
research* into diaries used on both the FES and the NFS.
The next
article will cover research into retrospective recall on the FES.
_________________________________________________________________________
*
The author wishes to acknowledge the research contributions of
Patrick Heady, Terry Kenney, Elizabeth Breeze, Anne Milne,
Malcolm Smyth, Madge Brailsford and June Langham.
FURTHER INFORMATION REQUIRED
It would be helpful if you could have the following
information and/or documents available for the interviewer
when they call next time
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
THE CENSUS OF EMPLOYMENT AS A SAMPLING FRAME
Elizabeth Breeze
Recently the Census of Employment was used as a sampling frame for a
survey of employers in Bristol intended to find out about recruitment
practices and employers’ opinions on selected aspects of the labour
market. Using this experience I have compiled a short list of points to
take into account in choosing frames for future surveys to do with
employers or employees or the organisations they work in.
The Census is carried out by the Employment Department at intervals of 2-3
years. It is a mammoth undertaking with over two thirds of a million
entries to collect. The units are called data units and are base on PAYE
points.
Most of them equate to workplaces, e.g. a shop, office, factory,
but they are not bound by any geographical or physical constraints.
For
instance, we had casual staff and permanent staff as separate data units
within a workplace, clusters of workplaces as one unit, and a group of
data collected are industry, type of business and number of employees.
The
Department
can
assign
units
to
local
authorities,
counties,
parliamentary constituencies and to their travel-to-work areas.
For those with access to the Census it is a prime candidate for a sampling
frame because it is the most comprehensive list of employing organisations
and it is a central computerised list. It includes organisations which
would not be covered by Company registration, and lists units smaller than
companies (people in these units are more likely to know about day-to-day
practices
for
their
employees
than
the
people
in
the
Company
Headquarters).
Also, as in our survey, one can pick out a particular
geographical area. However, the Census is understandably subject to strict
conditions of confidentiality and can only be used with the permission and
assistance of the Employment Department.
They do not have any resources
specifically assigned to research use.
This severely limits the range of
users.
Our survey used size and industry as stratification factors and took equal
probability samples within each stratum.
Units of fewer than 20
employees are subsampled for the Census so this should be taken account
in any probability calculations.
The appropriate person/people to select for interview will depend on the
purpose of the survey.
In our case we wanted to talk to people
responsible for recruitment; however most of the issues we had to resolve
were not exclusive to this group.
It is important to be clear how the
individuals eligible for interview and the units of analysis relate to the
sampled units (data units) so that probabilities of selection are known. We
were aiming to collect data referring to the data unit as a whole.
Other
surveys might be interested in one type of employee at the workplace or a
union branch or a production unit defined for their own purposes.
One of
the most frustrating and challenging features of these surveys is that
organisations vary enormously in their structure so that one cannot have a
simple model which will neatly fit every case.
When drawing up procedures for sampling, fieldwork, and analysis allow for
the following possibilities:
i)
the unit of analysis coincides with the sampled unit;
ii)
the unit of analysis is smaller than the sampled unit, eg the
survey
picks
out
subgroups
like
part-time
employees
or
supervisors: in this situation one needs to consider whether
subsampling within the data unit is necessary.
iii)
The unit of analysis is greater than the sampled unit, eg all
units belonging to the same company in the survey area:
probabilities may be difficult to calculate because the number of
data units involved in the larger unit may not be know.
Stratification of data units by size and industry would
complicate calculations too.
Cutting across the units of analysis are considerations
available form employees and how these can be analysed.
of
the
data
a) will the interviewees be able to talk about that unit in answer
to the questions
or b) can the interviewees only give specific information about
subunits so that the answers have to be combined/aggregated to refer
to the whole unit, for example different recruitment officers for
different grades of staff;
or c)can the interviewees only give specific information about units
larger than the data unit (eg, statistics are only kept at regional
level), for example central recruitment for all branches of a shop;
or d) will a mixture of these circumstances arise.
In quantitative surveys one has to define units in a way which will
enable aggregation of data from the interviewees. One needs to remember
that the categorisation of employees, production units, or other subject
of interest, which suits the researcher’s model may be very hard to apply
in some cases. In the recruitment survey employees were categorised into
5 groups but not all employers made distinctions; most could rework their
numbers by our categories but we had to accept approximations.
Businesses come and go, so with the best will in the world one is
unlikely to have a list of data units which is up-to-date at the time of
fieldwork. Decisions then have to be made about the following:
i)
the selected data units which are no longer at the given address
whether to trace named companies/organisations to their new
address and if so what criteria to adopt to decide whether it is
the “same” data unit or not
whether to select whatever is at the given address at the time of
first call
or whether to lose the selection from the final sample
ii)
the data units which come into existence since the Census
whether to compile a supplementary sampling frame (probably an
expensive undertaking), and if so how to check whether the frames
are mutually exclusive
iii)
data units which have made
business since the Census
some
changes
to
their
nature
of
what criteria to use in deciding how to categorise and how to
allow for mergers between data units.
It should be noted that the stratification information may be out of date
by the time of fieldwork so that analysis by the stratification factors
may be inappropriate although this does not debar stratification from
reducing sampling errors.
I have not attempted to give solutions because the most appropriate
answer for our survey will not necessarily be the best answer for another
survey. However, to be forewarned is to be forearmed.
RECENT METHODOLOGICAL PUBLICATIONS BY SSD STAFF
The use of diaries in data collection
A paper with this title has just been published in The Statistician
(1990, Vol39, pp 25-41). It was written by Bob Butcher and Jack Eldridge
and reports the results of a large-scale experiment carried out on the
National Travel Survey. Before the 1985-86 round of the survey pilot
fieldwork was carried out, using a sample of 1840 addresses, to test two
methods of data collection, one using a seven-day diary the other a oneday diary. The two methods were compared for response rate, data quality
and cost. Both methods achieved acceptable quality and they had similar
response rates (75% and 78% respectively). On balance the seven-day
method was more cost effective, so this was the method used for the 198586 NTS and for the continuous NTS that was launched in 1988.
Women’s Experience of Maternity Care – a Survey Manual
An SSD manual describing a specific survey methodology was published
under this title by HMSO last year. As it was not part of the Methodology
Series, it may not have come to the attention of Bulletin readers.
Written by Val Mason, the manual is a step-by-step guide on how to carry
out surveys of women’s experience of maternity care. It was produced for
the Department of Health and aims to help those in health authorities
wanting to measure consumers’ views of local services. It is intended
both for the experienced researcher and for those who have not done a
survey before and so contains a wealth of practical advice. Although
written around surveys of maternity services, much of the advice is more
widely relevant, particularly for local or small-scale surveys.
Feedback in its first year shows that the manual is proving useful in
districts throughout the UK. Indeed HMSO have already had to reprint the
manual. Some district health authorities have now carried out surveys and
others are planning them. The manual is also being used however as a more
general source book on survey design. It provides an example of how
theory is put into practice. It takes the reader through all the stages
of a survey, from early planning and design, through the practical
organisation to the analysis and the presentation and use of the results.
The chapters on sampling describe the principles behind sound sample
design and discuss the practical problems of obtaining a representative
sample from local records. The manual recommends postal survey methods
for these surveys and gives detail of how to carry out a postal survey,
including advice on maximising response rates. The manual includes two
model questionnaires, developed and tested in local surveys carried out
by Social Survey Division. These provide examples for the design and
layout of questionnaires for postal self-completion. On the whole, they
are designed to minimise the time and resources needed for coding and
keying the data. But the pros and cons of including more time-consuming
open questions are discussed and the chapter on coding provides
suggestions for their analysis including instructions for the development
of coding frames for open answers. Three chapters are devoted to computer
editing, analysis and the presentation of results. These assume little or
no experience in this aspect of survey practice. It guides the researcher
through the use of the SPSSX survey package. An article in June 1989
Survey Methodology Bulletin No.25 outlined the relatively straightforward
method devised for editing the data using SPSSX.
The manual was published as part of a package including the following:
a.
a pamphlet introducing the survey and giving suggestions
for analysis
b.
the manual
c.
Printing masters of the model questionnaires and a computer
program on floppy disk to help with the computing analysis
if using SPSSX.
NEW METHODOLOGY SERIES
NM1
The Census as an aid in estimating the characteristics of nonresponse in the GHS. R Barnes and F Birch.
NM2
FES. A study of differential response based on a comparison of the
1971 sample with the Census.
W Kemsley, Stats. News, No 31, Nov. 1795
NM3
NFS. A study of differential response based on a comparison of the
1971 sample with the Census.
W Kemsley, Stats. News, No 35, Nov. 1976.
NM4
Cluster analysis. D Elliot.
NM5
Response to postal sift of addresses. A Milne.
NM6
The feasibility of conducting a national wealth survey in Great
Britain. I Knight.
NM7
Age of buildings. A further check on the reliability of answers
given on the GHS. F Birch.
NM8
Survey of rent rebates and allowances. A methodological note on the
use of a follow-up sample. F Birch.
NM9
Rating lists: Practical information for use in sample surveys.
E Breeze.
NM10
Variable Quotas – an analysis of the variability. R Butcher.
NM11
Measuring how long things last – some applications of a simple life
table technique to survey data. M Bone.
NM12
The Family Expenditure and Food Survey Feasibility Study 1979-1981.
R Barnes, R Redpath and E Breeze.
NM13
A Sampling Errors Manual. R Butcher and D Elliot.
NM14
An assessment of the efficiency of the coding of occupation and
industry by interviewers. P Dodd.
NM15
The feasibility of a national survey of drug use. E Goddard.
NM16
Sampling Errors on the International Passenger Survey.
D Griffiths and D Elliot.
Prices:
NM13 £6.00 UK - £7.00 overseas
All others £1.50 UK - £2.00 overseas
Orders to: New Methodology Series,
Room 304, OPCS, St. Catherines House,
10 Kingsway, London WC2B 6JP