THE DEVELOPMENT OF COMPUTER ASSISTED INTERVIEWING (CAI) FOR HOUSEHOLD SURVEYS: THE CASE OF THE BRITISH LABOUR FORCE SURVEY Tony Manners 1. The need for CAI Computer assisted interviewing (CAI) for household sample surveys is, if implemented correctly, a means of obtaining one or more of the following improvements in data collection over pencil and paper (PAPI) methods: better quality, improved speed and lower cost after the initial investment. Better quality is to be obtained essentially from data editing taking place where it is most likely to be successful – in the interview. More specifically, it follows from these features of CAI:(1) automatic routing through the questionnaire, so that missing values arise only from respondents being unable or unwilling to answer and not from interviewers’ mistakes; (2) range and consistency errors being detected at a point where they can be checked with respondents as opposed to PAPI’s reliance on manual or automatic imputation after the interview. Improved speed arises from the omission of the time-consuming keying and manual editing stages of PAPI processing. Data capture on computer also provides the opportunity to send data to the central location by telephone. Increased speed may also lead indirectly to better quality data if more up-to-date information allows for better field management. The potentially lower cost of surveys using CAI rather than PAPI derives from lower staff costs and less use of mainframe computing capacity. Manual keying and editing at the central location can be reduced or omitted. Questionnaire software which translates a researcher’s specifications into programs for running CAI can reduce or eliminate the need for specialist programmers. CAI has been available since at least the early 1970s for telephone interviewing (CATI) from central locations where the requisite computing power could be provided by mainframes and minis. For many surveys, of course, telephone interviewing was not the best method of data collection. Even amongst surveys for which the telephone was an appropriate medium, any advantages were often outweighed by the problems and costs associated with incomplete telephone coverage of the population. The appearance in the mid-1980s of hand-held and lap-top computers overcame this obstacle by making feasible CAI for face-to-face or “personal” interviewing (CAPI is the commonly-accepted acronym). At about the same time, desktop microcomputers and packages intended for end-users became sufficiently powerful to be considered for CATI instead of the much more expensive hardware and bespoke programming previously necessary. When lap-top computers (LTCs) became able to run the same software as desktop micro-computers, as rapidly happened, the possibility existed for large, complex household surveys to be carried out in face-to-face interviews using CAI. The gains in quality, speed and cost could be realised, particularly if telephone interviewing (with its own speed and cost efficiencies) could be combined with face-to-face interviewing in the survey design. This paper was presented at the Conference Of Commonwealth Statisticians, April 1990. There are no drawbacks in principle to CAI. In practice, however, there are many potential pitfalls concerning the acceptability to public and interviewers of the new technology and the robustness of the hardware, software and the system in which they are configured. It has been the job of the feasibility and development work to probe for potential problems and find solutions to them. There can clearly be a heavy initial investment for a statistical organisation in the equipment for CAI and in the early feasibility and development work while CAI itself is relatively new and untried. How heavy it is will depend on the size and complexity of the tasks to be covered. Software packages now exist for data collection by CAI which interface with analysis packages such as SPSS; such packages could probably be used with rather little development work for surveys which do not require complex editing and derived variable creation. This paper describes the feasibility and development work for CAI for a survey with a relatively complex design: the British Labour Force Survey. The quarterly element of this annual survey, referred to as the Quarterly LFS, was selected as the lead survey for CAI development because it was of sufficient size and continuity for the cost efficiencies of CAI to be realised despite the large initial investment. 2. The lead survey in OPCS for CAI development: the Quarterly LFS The Quarterly LFS is one element in the UK Annual Labour Force Survey which is carried out for the Employment Department (ED). A CAI system for the Quarterly LFS must maintain the performance and characteristics of the present system or improve on them. This section describes those characteristics. The Quarterly LFS is a panel survey with sample rotation and a weekly placing pattern in which the residents at the sampled address are to be interviewed five times at 13-week intervals. In March to May each year there are additional questions. At maximum the questionnaire covers some 15 questions about the household, about 30 questions of which most are repeated for all household members, and nearly 160 questions of which a sub-set applies to each household member aged 16 and over. Proxy interviews are permitted. Virtually all the information from the previous interview is available in the current interview, and is used to check if any changes have occurred. The set sample yields over 15,000 responding households per quarter (about 40,000 persons per quarter). Some 20% of the interviews are with household and persons being interviewed for the first time, and about 80% are recall interviews. The response rate is about 88% for first interviews and about 78% (of the original sample) for fifth interviews. The first interview at an address is always carried out face-to-face. Whenever respondents’ agreement can be secured at the first interview, recall interviews are carried out by telephone. About 55% of all Quarterly LFS interviews (70% of recall interviews) are carried out by telephone form a central installation. The questionnaires used face-to-face and on the telephone are identical. The interviewing techniques are essentially the same for the two modes. At recall interviews, most of the information from the previous interview is available to the interviewer, who uses it to check if the situation has changed. The danger of under-estimation of change by this method is considered to be more than outweighed by the danger of over-estimation of change and of adverse effects on response if questions are asked afresh each time. In the PAPI system cleaned data from the current interview are preprinted onto the questionnaires for the next interview. The CAI systems must be able to redisplay data on screen at the next interview. The PAPI system includes a field management system. An enhanced system for CAI should be capable of producing daily reports on the progress of fieldwork. Consideration was given to creating for the telephone interviewing (CATI) an automated system for scheduling calls which would be constantly updated with the latest information on completions and appointments. After some initial trials, the difficultly of specifying an algorithm which would deal adequately with every situation that might be encountered led to the decision to retain the reliable existing manual allocation system, at least for the start of the production system. Changes to the questionnaire are permitted every 6 months , though as much continuity as possible must be preserved. These amendments currently require considerable programming effort (for edits, and preprinting programs) which would largely be saved by CAI. 3. the questionnaire and editing software: BLAISE When OPCS investigated the software available for CAI in 1987 it concluded that the BLAISE system developed by the Netherlands Central Bureau of Statistics was the most promising for the Quarterly LFS. BLAISE has continued to be developed in ways which seem particularly suitable for data collection for government statistics; it is the software which OPCS intends to use in its full-scale production system for CAI for the Quarterly LFS. OPCS has obtained BLAISE as part of a most useful exchange of in-house software with the Netherlands CBS, to whom it is very grateful for the considerable attention and assistance given as BLAISE has developed. A short description of the BLAISE system is given by Denteneer et al (1987). The advantages which OPCS saw in BLAISE were that: (1) it was designed from the start as an integrated CAI system in which specification of the questionnaire with its edit checks in a simple English-like language leads to automatic generation of all the programs required for CAPI, CATI and CADI (computer assisted data input form paper questionnaires) and for interfaces to other standard databases and analysis packages; (2) the BLAISE language itself was sufficiently powerful for LFS purposes yet did not require professional programming expertise from the survey designer or anyone else for writing the questionnaire or for preparing it for interviewers to use; (3) a single 720K diskette for the LTCs use in CAPI could easily hold two weeks’ LFS interview data (30-40 households) in addition to the BLAISE programs*; (4) Netherlands CBS was using BLAISE extensively for government statistical work, and had considerable experience of CAPI: its own Labour Force Survey has run entirely on CAPI since January 1987 using a precursor of BLAISE. 4. The CAI system for the Quarterly LFS Much of OPCS’s development effort has gone into providing a reliable system for managing the data collected through CAI. Standard microcomputer packages have been used as far as possible. For each month the planned system is: (1) a new address sample from the automated Postcode Address File on mainframe is moved to the LFS CAI database (CLIPPER) on a 386 micro-computer which already holds recall interview data (i.e. from the last interviews); * This sizing is given only as a rough guide: perhaps double this number of households could be included. Moreover, the Quarterly LFS, unlike many surveys, needs the BLAISE conversion program on the diskette, halving the space for survey data. (2) recall interview data is output from the database and converted to BLAISE for telephone interviewing on up to 30 networked desktop computers and, for face-to-face interviewing, organised in 100 interviewer workloads; (3) each face-to-face interviewer receives two diskettes (BLAISE for interviews and CROSSTALK for telephone transmission) for use with an LTC (currently Toshiba T1000) and external modem; (4) face-to-face interviewers contact households as suits them within the weekly placing pattern; telephone interviewers are allocated work manually by supervisors; (5) for recall interviews any changes since the last interview overwrite the former answers (new routing is automatically calculated and redundant codes deleted); (6) after each day’s work, face-to-face interviewers enter occupation and some other codes and administrative details on screen, then they convert the day’s partially and wholly completed work to text files which they telephone to the CLIPPER database via a computer mail sub-system (BT GOLD); (7) there is a similar pattern for the telephone interviewers, who use micro-computers networked to the database; (8) a field management system reports the current status of each sampled household; (9) completed work for the month is sent to a more powerful database (SIR) for such processes as automatic imputation, derived variable creation, inter-wave linkage, final production and storage. There is no manual coding or editing. 5. Trials and results Two small-scale trials were carried out in March and November 1987. The second trial used BLAISE and telephone recalls as well as face-to-face interviews. The purpose of these trials was to test the acceptability of CAI to public and interviewers. The results were very encouraging, so a full prototype system was created for a large-scale test in November 1988. The sample for this trial comprised 320 addresses which were newly drawn and 570 households which had already been interviewed five times in the Quarterly LFS; 315 of the households for recall had agreed to be interviewed by telephone. The work was carried out by 20 face-to-face and 6 telephone interviewers, selected by the usual criteria for allocating Quarterly LFS workloads. They comprised about 10% of the panel of Quarterly LFS interviewers. Very few had any previous experience of computers or keyboard skills. The trial showed that the system outlined in section 4 was capable of carrying out Quarterly LFS data collection in present timetables (with capacity for improvement) and achieving the same level of response. Nearly all interviewers said that they preferred CAI to PAPI. Analysis of the data, which has been successfully passed to the mainframe system, will edit program to check that errors were detected in the CAI interview, and checks on the levels of change recorded. The main problems encountered concerned interviewer training, the automated call scheduling for CATI and some features of the LTC used by face-to-face interviewers. The attempt to give interviewers practice at home in advance of office training in the use of LTCs was counter-productive: interviewers arrived for office training more apprehensive about computers than has the interviewers in previous trials who had not had the LTCs in advance. For a trial which started in February 1989 (see below) we gave no advance training but ran a 2-day residential course with up to 3 further days allowed for practice at home. This appears to have been very successful. The CATI call-scheduling system did not deal adequately with the range of different situations that were created by the interaction of many factors, such as the need to spread appointments, work flow – particularly the balance of new and previously unsuccessful calls -, criteria for the timing of repeated attempts to contact households and interviewers’ shifts, so the telephone interviewers reverted to a manual system. The weighting algorithm was very complex; it could be seriously distorted by new situations and could not be altered quickly. It was decided that the productions system would use the reliable manual allocation system, at least initially. The Toshiba T1000 was selected because it was much lighter (at 6.1 lbs) than other machines which met our needs. Interviewers found the screen less clear than they would have liked. However, when asked if they were prepared to use heavier LTCs with better screens they preferred to continue with the T1000. The way that the interview was programmed required considerable battery use outside the actual interview, to avoid a hiatus in the household while programs were loaded, and this led to batteries sometimes running out within the working day. For the February 1989 trial, the start of the interview has been improved so that batteries can be used just for interviews; this appears to have solved the problem. 6. Future developments The success of the trial in November 1988 was such that OPCS intends to proceed to full implementation. The procurement procedure through selective tender above GATT limits is lengthy; it is intended to replace PAPI for the Quarterly LFS with CAI in September 1990. In the meantime, several trials are envisaged to refine the CAI system. In February 1989 a sample in the same PSUs as the conventional survey had the first of its 5 interviews using CAI. In the autumn of 1989 there will be volume tests with simulation of full-size data being returned by face-to-face interviewers in the conventional survey. A large amount of interviewer training (100 face-to-face interviewers in the first month of CAI) can only take place just before the start of the production system, and this will need careful coordination with the other demands on the training staff during the period. The CAI system and the training of all the people operating it must be good enough to take over seamlessly and reliably from the highly successful existing PAPI system in September 1990. Investigation of the potential uses of CAI on other household surveys carried out by OPCS has begun. Postscript (May 1990): the project has continued to develop as envisaged in the paper. The procurement process has been completed with the acquisition of improved CATI microcomputers and fact-to-face laptop computers which give satisfactory speed and battery performance. Large- scale training, dress rehearsal testing and transfer of data from the current computing system to the CAI system has been arranged for July and August, ready for the first CAI fieldwork in September. Reference Denteneer, D et al Blaise, a new approach to Computer Assisted survey processing (Netherlands CBS, BPA no:13436-87-M3, 1987) DEVELOPING COMPUTER ASSISTED INTERVIEWING ON THE LABOUR FORCE SURVEY: A FIELD BRANCH PROSPECTIVE Nora Blackshaw, Dave Trembath and Alison Birnie Introduction To date Field Branch have been involved in four separate fieldwork trials involving computer assisted interviewing(CAI). The first two trials, designed to test the feasibility of interviewers using computers during the interview, have already been reported on (Survey Methodology Bulletins No 21 and 23). The aim of the latter two trials has been the development and testing of a questionnaire suitable for main stage fieldwork on the quarterly element of the Labour Force Survey (QLFS) and the testing of other computer systems needed to run the survey. These trials have allowed Field Branch to gain valuable practical experience in the training of interviewers and in working with research staff to eliminate sources of potential error in the QLFS questionnaire program. As described elsewhere in this bulletin (paper by T. Manners), the QLFS is a panel survey in which, at re-interview, virtually all of the data collected at the previous wave is checked for changes. The QLFS is a mixed mode survey; and, as a consequence, the QLFS computer assisted interviewing program must work for both face-to-face and telephone interviewing. 1. Developing a questionnaire suitable for computer assisted interviewing The design of questionnaires on all surveys involves reaching a compromise between the survey researcher and the needs of other groups of staff working on the survey – the interviewers, data prep, coders and programmers. In this respect, designing computer assisted questionnaires is no different except that fewer groups of staff are involved in the process. But, in several respects, designing computer assisted questionnaires is different from designing paper questionnaires in that more effort needs to be put into the design and testing of the questionnaire before main stage fieldwork begins and, in particular, more thought needs to be given to the needs of interviewers. In computer assisted interviewing, interviewers are forced to follow a particular routing and to resolve range and consistency errors before moving to the next question. If the questionnaire software is poorly designed or the questionnaire program does not cater for all eventualities – a not uncommon occurrence on complex factual surveys – the interviewer can be left in a straightjacket either trying to force the situations encountered through an inappropriate routing or answers into question categories that do not fit. Throughout the trials on QLFS, with the exception of the first, Social Survey Division has used a questionnaire software program called “Blaise” which has been developed by the Netherlands Central Bureau of Statistics. The software has a number of features which have allowed a very sensible compromise to be reached between the needs of the researcher to obtain good quality data and of interviewers to respond flexibly to the situation they meet. These features include: a) Don’t Know/Answer Refused Facility. When a respondent refuses to answer a question or genuinely cannot give an answer the interviewer, working on a paper questionnaire, can simply note this fact and move on. On CAI these situations must be allowed for. Rather than offering “don’t know” and “refused to answer” as precode categories with appropriate routing at every question, the Blaise software designates a key for each of these responses. These keys make no assumptions about routing and the program simply presents the next question to the interviewer. Although the next question may not be appropriate, by repeating the “don’t know” or “answer refused” key the interviewer can continue moving through the questions in sequence until the next appropriate question appears on the screen. b) Hard and Soft Consistency Checks. When the answers to two or more questions fail a consistency check, the interviewer is asked to reconcile the situation. In doing so the respondent may confirm that the original answers were correct even though this combination may appear to be impossible. With a “hard check” the interviewer cannot move on until the inconsistency is resolved and faces a dilemma if rightly or wrongly the respondent insists that both answers are correct. On the other hand, the “soft check” draws the interviewers attention to the apparent inconsistency; but, if this cannot be resolved the interviewer can confirm that both answers are correct and move on. Both checks have their place in the questionnaire program but the “soft check” certainly allows the interviewer more scope to deal with unanticipated situations. c) Correction of information after it has been entered. Some software programs do not allow corrections to be made once the answer is confirmed by pressing “Enter”. Such an arrangement would prove very frustrating for interviewers as information frequently emerges in the later stages of an interview which involves amendment to previously given answers. The Blaise program allow the interviewer to go back to change earlier data and provides a corrected routing where appropriate. d) Skipping blocks of relevant questions. In some situations, it is important to be able to skip a block of questions and return to it at a later point in time. For example much of the work of Social Survey Division involves interviewing households where a separate set of questions is asked of each adult household member. On some surveys each household member must be interviewed in person whilst, on others, one delegated person can answer for all household members as on the QLFS. Even in the latter situation an allowance needs to be made for cases where the person delegated does not wish to take responsibility for answering on behalf of another household member. If the questionnaire software program forces the interviewer to deal with each person in a prescribed order, this would result in interviewers having to make more calls on some addresses to complete the interview. With the Blaise software, the back correction facility allows household members to be interviewed out of sequence simply by adding an extra question which asks the interviewer to choose for each household member whether to interview “now” or “later”. “Later” allows the respondent’s individual questionnaire block to be skipped. At a subsequent call, amending “later” to “now” brings up the uncompleted questionnaire block. Apart from features which allow the interviewer to respond flexibly to situations encountered in the field, Blaise also has features which can make the interviewer’s job easier. For example, a) Post Interview Coding. On a number of surveys carried out by Social Survey Division, interviewers are asked to carry out some administrative tasks at home, the most obvious example of which is the common practice of asking interviewers to classify each job by industry and by the type of occupation. On the QLFS, these data occur at different points in the interview – main job, last job, second job. One useful facility Blaise has is that it allows all the answer descriptions which require coding to be moved to a separate block. This block can be separately accessed and interviewers routed only to those items which require attention. Checks can be built in to ensure that all post-interview coding has been carried out. b) Menus and Help Screens. The researcher is able to adapt menus, help screens and exiting instructions (the latter is always present when the questionnaire is in progress). This gives interviewers greater confidence that screen instructions will be meaningful and helpful. Generally Field Branch staff have been very satisfied with the way the Blaise program has operated on the QLFS. Having said this, there are two enhancements we would like to see made. Firstly, in the interview, it would be helpful if the answer recorded was highlighted prior to pressing “Enter” to assist the interviewer in checking that the correct code has been keyed. As it stands, the interviewer has to look away from the part of the screen which presents the question and code list to check what has been entered in the answer space. Secondly, it would be useful if the program could print a paper version of the questionnaire in a form which could be used in the field by training staff assessing interviewer performance (see section 5). 2. Developing questionnaire programs It is a fallacy to think that a paper questionnaire can simply be translated into a questionnaire program. The rigid nature of CAI means that much more thought needs to be given to the needs of interviewers. The program must take account of the practical situations an interviewer may be faced with; and answer codes, routing and consistency checks need to be well tested. This is likely to mean that research staff will need to work more closely with field staff to identify and resolve potential conflicts between interviewer working practices and the questionnaire program, although a number of those already identified on the OLFS are likely t be common to a number of our surveys. A second point relating to questionnaire design is that it is more difficult for staff not trained in the questionnaire programming language to suggest sensible solutions to problems. On QLFS it has proved more efficient once the questionnaire has been set up on the computer for the researcher and field officer to reach the best compromise by discussing the areas causing concern and what possible solutions Blaise may be able to offer. Finally it is much more difficult for staff to identify routing errors in CAI. As a consequence, customer testing should form part of the questionnaire design phase. 3. Training interviewers in computer assisted interviewing To date Field Branch has trained 50 face-to-face interviewers and 26 telephone interviewers from the pool of interviewers who work on the QLFS. Interviewers were selected in a systematic way to ensure that they were broadly representative of the interviewing force as a whole. In addition, training was given to four face-to-face interviewers who had not worked on the QLFS. (We have not as yet attempted to train any new recruits in CAI as a first survey). All interviewers successfully made the transition to computer assisted interviewing. The training adopted in the latter trials and the one we shall use when training interviewers for the main QLFS consisted of: Telephone interviewers – 10 hours practical work, fully supported by trainers, with recourse to further training as required. Face-to-face interviewers – 2 days office training plus home study and practice interviewing (It would be noted that face-to-face interviewers needed to be trained in a wider range of tasks). The training for both groups of interviewers concentrated heavily on the use of practice tapes and practice interviewing. Both are felt to be important. The tapes allow staff to build in a variety of situations so that interviewers gain experience in handling these. But, when using the tapes, interviewers do tend to concentrate on the lower half of the screen where the answer spaces occur. Practice interviews, on the other hand, allow interviewers to familiarise themselves with the very different appearance of the top half of the screen which contains the question, instructions and, in some cases, a list of answer categories. A number of points of wider interest emerged from the training given on the various trials. a) There is apprehension amongst interviewers who have not worked on CAI as to how well they will be able to cope with the new technology. This has eased somewhat as a growing number of interviewers have become involved in the various trials but is still present to some extent. Several of the interviewers who attended training described themselves as the non technical member of the family who could not cope with modern pieces of equipment e.g. the tape recorder, the video etc. Because of this unease, in designing training, we have deliberately moved away from sending detailed advance instructions and towards a brief reassuring note. b) Those with some interviewing experience but who had not previously worked on the QLFS coped as well with the computerised survey as those already working on the survey. In some ways, in fact, they found the task easier. Interviewers already trained on the survey found the change in the structure and appearance of the CAI questionnaire disorientating, particularly if they needed to go back to alter an answer. c) There was considerable variation in how quickly individual interviewers came to grips with CAI. About 20% of interviewers were noticeably slower than the average and a further 20% much quicker. This means that any initial training has to cope with a wide variation in assimilation and for this reason we made extensive use of tape recorded interviews so that interviewers could work at their own pace. d) Interviewers on the final trial re-interviewed households at quarterly intervals. This meant that interviewers had a 2-month gap after the initial quota before resuming work. Some home study materials were provided to act as a refresher training. These worked very well, with interviewers having few problems using the computers on their second quota. e) Learning how to handle the questionnaire proved more difficult for interviewers than learning how to back-up data, convert from Blaise to ASCII (a general format) and transmit data. This was partly because Survey Branch were able to streamline the non questionnaire programs and provide clear error messages and partly because these procedures consist of following a list of printed steps. In the latter training courses a decision was taken to spend less time on practical training in these areas allowing more time to be spent on practising questionnaire handling. This worked very well. f) The areas of CAI which interviewers found most difficult were making corrections to entries, finding and altering previously answered questions, and dealing with consistency checks. The first of these, making corrections to entries, was initially confusing because of the number of steps, and the nature of the steps needed to correct an entry varied considerably depending on the nature of the error. Once learned however this ceased to be a problem. Finding and altering previously answered questions should become easier once interviewers become accustomed to the new order and appearance of the questions but is always likely to be more difficult than on paper questionnaires because the interviewer no longer has to take decisions about which question to ask next. Finally consistency checks are always likely to be an area on which training will need to concentrate. They will often occur when the interviewer has not noticed that two items of information are in conflict; and, as a result, the interviewer can be disconcerted when the consistency check message appears and totally reliant on the error message for cues as to how to resolve the problem. In Blaise, the error message indicates which questions are in conflict using the questions’ eight letter names e.g. “reltoHOH” or “Avpothrs” and the answers given to each. In addition, an explanatory note can be provided. Although some consistency error messages are self explanatory, others are less so. Examples of each are given below. Example 1 Self Explanatory Error Message In this example which occurs at the question named Age, the explanatory text makes it very clear how to correct the error. HOH, Wife and parents or grandparents can’t be under 16. Change Age or ReltoHOH Person[1].ReltoHOH Person[2].Age HOH =15 Example 2 Complex Error Message At the end of the interview, the interviewer is routed past the administrative block to a section which asks the interviewer to confirm that all household members have been interviewed as a means of ensuring that no one is overlooked. At home, after completing the administrative section, the routing takes the interviewer through the same sequence where some codes need to be altered to confirm that not only all interviewing but also all administrative tasks have been completed. If the interviewer miskeys at the end of the interview, the result is a rather complex message as shown below. Someone’s coding block still has to be checked (go to CodChk or DoneCode should be No (go to Donecode) or IntvNow is wrong (go to IntvNow) Person[1].IntvNov(1) Outcome.DoneCode C[1].CodChk Now YesCoded =Empty g) With face-to-face interviewers working from home, it is essential that the majority of problems they encounter in using the equipment can be resolved by telephone discussion. This proved to be easier than expected. Even on one trial where the interviewers’ office training proved to be inadequate and where we received many calls from interviewers, all problems were resolved by telephone. There were no cases where staff had to be despatched to give further tuition in person. 4. Interviewer reactions to computer assisted interviewing On the whole both telephone and face-to-face interviewers were very enthusiastic about computer assisted interviewing. When asked approximately 90% said that they preferred the new method of interviewing to using paper questionnaires. By far the most common reason given was that it removed form interviewers the burden of deciding which questions to ask next. This is not surprising in that the QLFS has particularly complex routing instructions. Other reasons given were that the interviewers felt a great sense of achievement that they had coped successfully with the new technology and felt that, with the spread of computers, their use for interviewing enhanced the job’s professionalism. In addition to the above point, the telephone interviewers found the fact that respondents could hear them entering the data very helpful in controlling the interview. (There is a tendency for telephone respondents to try to fill the gap if there is a pause whilst the interviewer records a verbatim answer on the paper survey). The areas of computer assisted interviewing with which the interviewers were least satisfied were the screen quality (face-to-face interviewers) and the pace of the interview (both groups of interviewers). Face-to-face interviewers used Toshiba T1000 laptops. These machines have an LCD supertwist screen display with the ability to adjust the angle of the screen and the degree of contrast. Interviewers were advised to give more thought to where they sat in relation to lighting and, if necessary, to ask whether lights could be switched on. Even so, interviewers reported that in 20% of interviews the screens were difficult to read. In most cases they did manage to complete the interview although in just under 1% of cases the lighting was so poor that the interviewer was forced to revert to paper questionnaires. Despite these difficulties, interviewers on the whole regarded the screens as adequate and preferable to paper questionnaires. The majority of interviewers also preferred a lightweight machine – the Toshiba’s all inclusive weight (including carrying case, papers and transformer for emergency use) was 8.5 to 9 pounds – to a heavier machine weighing 14 to 15 pounds with a better quality screen. Interviewers reported that the speed with which the next question appeared on the screen was only just adequate particularly when dealing with large households. Staff at the Netherlands Central Bureau of Statistics are already working on improvements to the Blaise program which are designed to deal with this problem. Within the last few months, a number of new or improved lightweight laptops have been put on the market (including a new version of the Toshiba T1000). These latest laptop models generally run at a faster speed and have better quality screens than the Toshiba T1000s used in the trials. It is therefore expected that the laptop model selected for main stage interviewing will be more than adequate for our requirements. 5. Reaction of interviewer trainers to computer assisted interviewing As part of the trials, Field Branch needed to consider what effect the move to CAI would have on the way in which interviewer trainers carry out their role of training and assessing interviewer performance. Several trainers were given the same training as interviewers and then asked to accompany two CAI interviewers on their fieldwork. Trainers were asked to record the interview on a laptop – in effect both interviewer and trainer recording the interview – as this was the only way the trainer could check that questions were correctly asked and probed. After the interview, the trainers were instructed to discuss the interviewer’s performance in the usual way. Trainers reported that there were no adverse comments form members of the public to two people entering answers on laptops. They did however find that it was impossible to both keep a detailed record of the interview and key in the respondent’s answers. If they stopped to make a note, they were in danger of missing the next answer; and, if this formed part of the questionnaire routing, they were unable to rejoin the interview in the way that they can on a paper questionnaire. As a consequence they felt that their feedback to interviewers was impressionistic and incomplete. From our discussions with the trainers, we have come to the conclusion that, in order to do their job effectively, trainers will need some form of paper questionnaire although the style of this document may be quite different from questionnaires used as data entry documents. Trainers also reported that they felt at a disadvantage when joining interviewers part-way through the CAI quota. By this time the interviewer was feeling quite confident about using the equipment. Trainers, on the other hand, felt less confident and were particularly concerned about whether they would be able to help the interviewer if assistance was needed during the course of an interview. It was felt that trainers needed to undertake some practice interviewing in order to consolidate their office training and home study. In addition, as trainers will not be using laptops regularly in the field, thought needs to be given to ways of maintaining their expertise and confidence. 6. In conclusion We now have considerable experience in training interviewers to use computers on the Labour Force Survey. The trials have shown that all interviewers can successfully learn how to use the new technology but that there is considerable variation in the speed with which they grasp the new working techniques. Those designing CAI training will need to take account of this fact. In designing questionnaires for CAI, much thought needs to be given to providing interviewers with an acceptable working environment and to minimising the chances of an interviewer meeting an unresolvable problem in an interview. Testing of questionnaires needs to be much more thorough than for paper questionnaires. One point that other researchers should bear in mind is that the QLFS is a fairly short interview with complex routing but relatively simple consistency checks. Those working on surveys with large numbers of consistency checks or where the consistency checks are very complex may need to consider whether interviewers should be expected to resolve all inconsistencies during the interview or whether there remains a role for a final office editing stage. TOTAL HOUSEHOLD INCOME NATIONAL TRAVEL SURVEY AS MEASURED BY GENERAL HOUSEHOLD SURVEY AND Daniel Kelly This article presents a secondary analysis of data from the 1986 General Household Survey (GHS) and the 1985/86 National Travel Survey (NTS), relating to income. The variable ‘Usual gross household income’ on the NTS was compared to the variable of the same name on the GHS to see how similar the data were. The GHS collects ‘Usual gross household income’ in the form of separate amounts from different sources which are then added together; the NTS simply asks the informant to indicate the band into which it falls. Owing to differences in non response in the two samples, the results are inconclusive. However, the apparent similarity of NTS and GHS income data found here suggests that further research might be useful. Background The GHS has a large core section including questions on Population and Fertility, Housing, Employment, Education, Health, and Income. Core questions are funded by OPCS and are selected partly because they are used by many clients. In 1979, it was decided to expand the GHS income section so that it would be closer in design to the Family Expenditure Survey (FES), in particular to produce comparable measures of current income. Previous to 1979, GHS income data were used almost entirely for classificatory purposes, but it was felt that the new section would provide useful income variables for analysis in their own right. It was also suggested that FES and GHS income data could be combined in order to give a larger sample. Though the core now includes a 15-page income questionnaire, income data supplied to clients by the GHS still consists mainly of tables which use banded income data for classificatory purposes. Though means and medians are occasionally used, it is very rare for these tables to include component income information, e.g. on an individual benefit. The detailed income data are available on tapes supplied to some government clients and also on tapes deposited with the ESRC Data Archive for release to academics but the GHS Unit is not aware of any use by secondary analysts of GHS and FES income data in combination. The GHS calculates the income variables used in analysis from the detailed income data which are collected. This is a time-consuming and expensive procedure which delays the production of GHS results. This procedure also lowers the effective response rate for income variables. Derived income variables on the GS are not calculated for a case where component data are missing such as building society interest. The missing components usually make a negligible contribution to total income. In 1986, the GHS derived ‘gross household income’ for only 71% of households co-operating in the GHS. This gives an effective response rate of only 60% for this variable, increasing the possibility of bias. Recently the GHS has considered simplifying the income section by asking directly for the banded information used in the majority of analyses. Though largely modelled on the GHS, the Continuous Household Survey of Northern Ireland asks directly for the income variables use in analysis. The head of household (HoH0 is asked into which income band the total household income falls. The proposal to simplify the GHS Income section was put to the CSO Subcommittee on the GHS in 1989. In the same year, however, the Department of Social Security (DSS) proposed an extension of the income section for a project on the characteristics of poorer households. DSS agreed that banded data would be acceptable if they were sufficiently accurate. The NTS asks a simple direct question about household income. It also measures various household characteristics, such as the number of children and the number of cars in the household. It was therefore decided that secondary analysis of GHS and NTS income data should be carried out in order to throw light on the extent to which the measures of household income provided by the two surveys are similar. Comparability of the Data from the Two Surveys There were three ways in which the data were not exactly comparable. The first was that on the two surveys, most questions on similar topics were not asked in exactly the same ways. Question differences were largely overcome by regrouping the responses. For example, on the GHS the categories ‘waiting to take up job’ and ‘seeking work’ were combined to make the category ‘unemployed’, in line with the NTS. When regrouping responses to questions did not solve this problem, certain cases were excluded. Eleven household variables were derived which, it was felt, had equivalent categories on the two surveys. These were: Number employed in household, Number of household cars, Working status of HoH, Number of adults in household, Number of children in household, Length of residence of HoH, Address type, SEG of HoH, Number of persons in household, Family structure and Tenure. These variables were then used for analysis. The second problem of comparability was inflation. Data were taken from the GHS (Jan-Dec 86) and from the NTS (July 85-June 86). To deal with this problem, households on the GHS were split into six groups which were roughly sextiles but were formed to coincide with the proportions of households falling into NTS income bands. These six groups were then used for analysis. The proportions of Households in Each Income Group Group Group Group Group Group Group A (highest): B: C: D: E: F (lowest): 18.6% of households 16.1% 16.7% 15.7% 14.1% 18.8% The third problem of comparability resulted from a difference in nonresponse between the two surveys as shown below. Response Rates for the Two Surveys GHS NTS a: response rate for survey 84% 76% b: response rate for gross hshld income 71% 86% c: effective response rate (a*b) 60% 65% [all analyses reported in this paper imputation for missing income data] were carried out without any The GHS had a higher overall response rate. However, for reasons mentioned above, gross household income was only calculated for 71% of households co-operating with the GHS. The NTS, however, recorded this variable for 86% of cooperating households so that the effective response rate in the NTS was higher. Non-response in the two surveys may have introduced bias. This bias may be different in the two samples. Analysis The first part of the analysis compared the income characteristics of sub-groups from the two samples. See Table 1. For each sub-group (e.g. one-person households) the percentage of households falling into each income sextile on one survey was compared to the other. As can be seen in Table 1, the percentage of one-person households falling into each income group is not significantly different in the two surveys. The same is true for one-car households. The above analysis was carried out for each value of the eleven variables listed above. The results obtained from this analysis were generally as similar as the examples shown here in Table 1. The DSS is particularly interested in an analysis of the characteristics of poorer households. It was therefore decided to compare the characteristics of households in the lowest income group on the two surveys, i.e. group F, those in the bottom 19% of the household income distribution. We first checked whether the overall distribution of each of our eleven variables was similar in the two samples. A variable was considered similar on the two surveys if none of the pairs of percentages showed a difference significant at the 5% level. Unfortunately, a comparison of the two samples showed that eight out of the eleven variables had significantly different distributions for the samples as a whole. This probably due to the differences in sample bias discussed above. These eight variables were excluded from this part of the analysis. The other three variables are presented in the accompanying diagram. ------------------------------------------------------------------------Table 1: Income Distribution for One-Person Households and One-Car Households Income Groups One-person households C.I. One-car households C.I. GHS % NTS % GHS % NTS % GROUP A (highest): 2.8 2.5 y 19.5 19.4 y GROUP B: 3.4 3.8 y 20.8 20.3 y GROUP C: 8.5 8.6 y 23.7 22.1 y GROUP D: 14.0 12.6 y 19.6 20.4 y GROUP E: 16.2 15.8 y 11.7 13.4 y GROUP F (lowest): 55.1 56.7 y 4.7 4.3 y tot%: 100 100 100 100 base: 2103 2321 3172 3923 (C.I. = confidence interval with a standard error of where p1 is the percentage and n1 the base for survey i) (y = yes, the difference between the two percentages is not significant at the 5% level, applying a two-tailed test and assuming a design factor of 1.3) ------------------------------------------------------------------------The proportion of households in the lowest income group owning or buying their homes is 28.8% in the GHS and 30.7% in the NTS. This is the largest difference that was found and it could be due to sampling error alone. None of the three characteristics it was possible to compare in the lowest income group on the NTS is significantly different from those in the lowest group on the GHS. Using these three variables, the banded income question from the NTS seem as good for classificatory purposes as the derived variable from the GHS. Conclusion The results presented in this report are inconclusive. Though the income data from a direct income question seem to be similar to GHS data, it is unclear what effect the sample differences had on the analysis. A comparison of detailed and banded income data collected from the same individuals would be useful. MEMORY PROBLEMS IN FAMILY BUDGET SURVEYS: I. DIARIES Bob Redpath* Last summer in Paris at the International Statistical Institute Conference there were a number of papers presented which dealt with the topic of memory errors in survey research; three papers dealt with research carried out on behalf of the Bureau of Labour Statistics (BLS) into memory errors which occur on the US family budget survey, the Consumer Expenditure (CE) Survey and also how to reduce these errors**. References to Jacobs below relate to the paper by Jacobs, Jacobs and Dippo. This article attempts to cover the research findings about similar memory problems which occur on two continuous diary surveys carried out in Great Britain, the National Food Survey (NFS) and the Family Expenditure Survey (FES). The NFS is sponsored by the Ministry of Agriculture, Fisheries and Food and is carried out continuously throughout the year to provide the basis for National Accounts estimates of consumer expenditure on food and for economic and nutrition analyses. The fieldwork and coding are currently contracted out to British Market Research Bureau: OPCS draws the sample and oversees the methodology of the survey. Roughly 7,800 housewives cooperate each year, a response rate of about 58%. One person, the housewife, the person who is responsible for most of the purchased food brought into the home, keeps a seven-day diary record of expenditure on food brought home by her/himself and all other members of their menus and whether or not each member of the household was present at the meal. Demographic details about the household are collected and there is one retrospective question which asks the respondent to recall how much food was bought for the household during the week prior to the date of interview. The FES is mainly sponsored by the Central Statistical Office and its chief purpose is to provide the CSO with up-to-date expenditure weights for the Retail Prices Index. The data are also used in some of the estimates of consumers’ expenditure in the National Accounts. As well as expenditure data the FES collects detailed information about income * Bob Redpath was the Principal Social Survey Officer in charge of the Family Expenditure Survey from 1976 to early 1990. ** “The U.S. Consumer Expenditure Survey” E Jacobs, C Jacobs, C Dippo. “The Use of Cognitive Laboratory Techniques for Investigating Memory Retrieval Errors in Retrospective Surveys” C Dippo. “Reduction of Memory Errors in Survey Research: a Research Agenda” J Lessler. th I.S.I. 47 session Paris, August 29 – September 6 1989 and this is widely used by other government departments in analyses of the impact of policy, such as the take-up of income-tested state benefits, the impact of existing or alternative taxation systems. OPCS carries out the sampling, fieldwork and coding of the survey as well as methodological studies. All persons aged 16 years or over (called ‘spenders’) are asked to keep 14-day diaries of all personal expenditure and business expenditure (which is ultimately deleted). In addition there is an initial interview before diaries are placed in which retrospective questions are asked about both regular of infrequent and large expenditure and income sources. The approach to collecting data by the retrospective method varies from asking about the last payment/receipt to asking for expenditure or income over a fixed period which can range from as little as a week to a year from the date of interview. All spenders are asked to keep diaries and all spenders are asked the interview questions; in fact cooperation is only counted if all spenders offer to keep diaries and try their best to provide information in the interview. An incentive of a £5 postal order is sent to each spender several weeks after the completion of the survey. Response is around 70-72%, yielding about 7,200 households a year in Great Britain. Memory problems on family budget surveys can be divided into memory problems that occur when respondents are asked to keep diaries and memory problems that occur when respondents are asked in an interview to recall transactions that have occurred in the past. Kemsley referred to these broadly as current enumeration and post enumeration. This article will be presented in two parts, the first which treats problems encountered with diary recordkeeping and the second with retrospective recall. Jacobs mentioned problems encountered with the CE diary survey which is one of two BLS surveys measuring consumer expenditure. Fourteen day diaries are used as with the FES. Even though Bureau of Census interviewers are instructed to place diaries on specific days of the week in order to insure an even distribution within the week, it is still the case that the first day of recordkeeping shows higher expenditure of the first week is greater than that of the second week. Jacobs concluded that respondents use the first day more than the following days to comply with diary reporting and that the interviewers are checking the first-day page more carefully than the other pages. Jacobs also mentioned a cueing effect on diaries which seemed to occur when items were described on diaries. Listing more examples on the diary page had a beneficial effect on reporting. Research also showed that, if diaries had specific items printed on lines allowing respondents to only record price, this led to greater first day bias than diaries with no preprinted descriptions. First day effect There is conclusive evidence from the American CE diary survey that recordkeeping on the first day is higher than on subsequent days. However this is not the case with the FES at the times when this has been tested. It should be pointed out that OPCS interviewers do not have to “place” diaries (that is, to get diary keeping started) on specified days of the week, as the Bureau of Census interviewers do with the CE diary survey. Table 1 below compares the average total household expenditure per recordkeeping day as measured during February-March 1978 and also during two experimental periods in 1976 and 1978. In order to facilitate comparison, averages per recordkeeping day are expressed as indices based on the average expenditure per day over the fortnight. Table 1 Expenditure per recordkeeping day/x daily expenditure x100 Family Budget Survey Family Budget Survey Experiment 1 (1976) Experiment 2 (1978) FES (Feb-Mar 1978) Day 1 123 102 110 Day 2 110 121 111 Day 3 113 125 123 Day 4 104 123 104 Day 5 92 77 96 Day 6 78 67 86 Day 7 96 59 77 Day 8 107 112 100 Day 9 98 104 102 Day 10 98 128 120 Day 11 104 127 109 Day 12 92 103 92 Day 13 81 66 82 Day 14 102 87 85 Base (per Day £9.03 £6.43 £4.60 N.B. It should be pointed out that the experimental figures are higher than those on the FES because they include all outgoings and not just expenditure. The experimental periods warrant some explanation. During the mid 1970’s there was concern about under-recording of alcohol and tobacco expenditure in relation to National Accounts. Two feasibility studies, entitled the Family Budget Survey experiments, tested a methodology which asked respondents to balance all cash outgoings against cash incomings on a daily basis. Respondents were asked to count and record all cash in the house at the beginning of each recording day; then during the day to record all cash outgoings and incomings. At the end of the day the incomings were added to the cash counted at the beginning of the day and the outgoings subtracted. The calculated residual should equal the cash in the house remaining at the end of the day – which was again counted. Any difference was an indication that either outgoings or incomings had been unrecorded depending on the sign of the difference. Overall the amount of unrecorded expenditure was very little, roughly one percent of total outgoings; so the discipline of balancing and probing for missing recorded expenditure meant that recordkeeping was nearly perfect for a 14 day period. Table 1 shown that only in the first family budget survey experiment was first day expenditure higher than on all other days. In the second experiment the highest days occurred on days 10 and 11 and in the FES, on the third and tenth days. On the other hand all three studies showed similar patterns of above average expenditure on days 1-4 and below average expenditure on days 5, 6 and 7. What is noteworthy is the bi-modal feature of all three indexes. The upsurge in expenditure in the second week would seem to argue against the notion that respondents become increasingly fatigued as recordkeeping progresses. Or, put another way, record-keeping fatigue does not explain these distributions. Therefore other possible causes were investigated. Similarly expenditure on the last two days of the second week was generally lower than on other days. Two hypotheses were tested to try to explain this pattern. First it was assumed that recordkeeping would increase in anticipation of or directly following the interviewer’s return visit to the household. Table 2 shows the distribution of interviewer return visits to 564 households during February-March 1978 by recording day. The number of visits per recording day is divided by the average number of visits per day over the 14 day period and expressed as an index with a base = 100. Table 2 Distribution of index of interviewer return visits per recording day Day of recording Index of interviewer return visits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 3 10 62 98 93 101 83 176 94 37 37 20 23 566 Base = 125 visits per day during fortnight The peaks for interviewer return visits occur on days 8 and 14 and yet as Table 1 shows there appears to be no anticipatory increase in spending before either of these days; nor does it show any reactive increase in expenditure in the days immediately following the two peaks in the distribution of interviewer return visits. Another hypothesis was that the recordkeeping pattern could be brought about by uneven placing of diaries within the week. OPCS interviewers are asked to place roughly four to five households each week over a month’s duration; however they are not required to place on specific days during the week (unlike their American counterparts). Table 3 compares days of the week on which FES diaries were placed in the February-March 1978 study and also throughout 1976. Table 3 % of FES diaries placed on each day of the week (Feb-Mar 1978) % placed (1976) % placed Monday 16 18 Tuesday 22 23 Wednesday 23 22 Thursday 20 18 Friday 14 12 4 5 1 ___ 2 ___ Day of week Saturday Sunday Base (nos) = 100% 1179 7051 The placing patterns were fairly consistent between the two studies in that almost two thirds of households commenced diary recordkeeping on either Tuesday, Wednesday or Thursday. The effect of this skewed placing pattern was, as is shown in the first column of Table 4, to concentrate a higher than expected proportion (3out of 7 = 43%) of Thursdays, Fridays and Saturdays in days 3/10 (65%), 4/11 (61%), and 2/9 (57%). The second column (Index) shows the percentage distribution for each day divided by the expected percentage (43%) x 100. Table 4 % of Thursdays, recordkeeping Fridays, Saturdays occurring by day of Index Day of recordkeeping % 1/8 38 88 2/9 57 132 3/10 65 151 4/11 61 142 5/12 39 91 6/13 21 49 7/14 19 44 Based on the only data available (shown in Table 5), these three weekdays were the days of highest average expenditure: Thursday because of extended evening opening hours; Fridays because wages were received; and Saturdays for obvious reasons. (The 1978 data predate on this day legalised Sunday shopping). Table 5 Average expenditure per weekday (1966-67) £ Monday 4.18 Tuesday 4.18 Wednesday 3.54 Thursday 4.83 Friday 7.08 Saturday 6.76 Sunday 1.61 It then became apparent that the irregular pattern of expenditure by recordkeeping day (Table 1) might be at least partially explained by the irregular placing pattern (Table 3) in conjunction with the irregular pattern of expenditure by weekday (Table 5). Figure 1 compares three indexes: week 1 of the February-March 1978 FES; week 2 of the FebruaryMarch 1978 FES; and the index of the % of Thursdays, Fridays and Saturdays occurring on each recordkeeping day (placing pattern index). The numbers on which Figure 1 is based are shown in Table 6. Table 6 Indexes of placing pattern and daily expenditure Expenditure indexes (FES, Feb-Mar 1978) Day Placing Pattern Index Day Day 1/8 88 1 110 8 100 2/9 132 2 111 9 102 3/10 151 3 123 10 120 4/11 142 4 104 11 109 5/12 91 5 96 12 92 6/13 49 6 86 13 82 7/14 44 7 77 14 85 Figure 1: Comparison of index of expenditure per day (FES, Feb – Mar 1978) with placing pattern index The first noteworthy feature is the remarkable fit between the curves of week 1 and week 2 of the FES data. In one sense this is reassuring in that the biases are more or less equal in each of the two recordkeeping weeks. The bimodal characteristic also dispels the notion that recordkeeping fatigue increases with the length of recordkeeping. One must hasten to add that these conclusions can only be tentative, base as they are on one study. However, comparing the curves of the FES data with the curve of the placing pattern index seems to give substance to the hypothesis that uneven placing patterns may give rise to the variations in expenditure by day of recordkeeping. The peak in all three sets of data occurs on days 3/10 and the curves decline thereafter although at varying gradients. One would not expect to explain day by day variation in expenditure by a single all-embracing hypothesis. Nevertheless the point that Jacobs made about the need to control diary placement by day of week is worth bearing in mind. In the case of the FES the biases to expenditure were not thought to be serious enough to warrant a major and costly revision to fieldwork procedures. It is worth noting that the US CE diary survey was designed form the outset to try to place diaries by day of week. Inter-week variation At the same time this clustering effect seems to be independent of any inter-week bias, i.e. the first week’s expenditure on average is greater than that of the second week. Average household expenditure as recorded in the diaries in week 1 of FES recording has always been marginally higher than average expenditure in week 2. In 1978, the average expenditure per spender over the whole sample in week 1 was £32.63 and in week 2 £31.76, a decrease of 2.7%; in 1987 the decrease was 2% (Table 7). Table 7 shows the inter-week variation per recording spender in 1987 for the following: - average expenditure per sample (base = total sample) number of recording spenders number of transactions recorded average expenditure per recording spender (base = recording spenders). Table 7 Inter-week variation by expenditure groups Week 1 Week 2 Week 2/ Week 1 x 100 Expenditure Group Food average expenditure (£) 19.14 number of recording spenders 13315 number of transactions 327531 average exp/recording spenders (£) 20.39 18.30 13043 304808 19.90 96 98 93 97 4.65 6894 20188 9.57 4.48 6526 19285 9.74 96 95 96 102 2.51 4680 18956 7.59 2.36 4512 17739 7.43 94 95 96 98 3.60 2250 5900 22.50 4.01 2254 5695 25.26 110 100 99 112 6.42 2540 4025 35.85 6.06 2413 3748 35.62 94 95 93 100 6.63 5184 11101 18.13 6.35 4889 10465 18.42 94 95 94 102 5.49 4424 7121 17.60 5.25 4133 6808 18.00 96 93 96 102 8.01 12654 85819 7.58 12213 79296 95 92 92 Alcohol average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Tobacco average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Fuel, light and power average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Table 7 (continued) Housing average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Clothing and footwear average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Durable household goods average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Other goods average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) 8.98 8.80 98 12.54 9210 25391 19.32 13.05 8738 24534 21.17 104 95 97 110 12.99 9325 22257 19.76 11.96 8786 20338 19.34 92 95 97 98 0.31 1022 22257 4.37 0.31 864 20338 5.21 100 85 97 120 average expenditure (£) 82.27 number of recording spenders 14182 number of transactions 530575 average exp/recording spenders (£) 82.27 79.72 14048 494379 80.48 97 99 93 98 Transport average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Services average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) Miscellaneous average expenditure (£) number of recording spenders number of transactions average exp/recording spenders (£) All expenditure With the exception on fuel, light and power (+10%), transport (+4%) and miscellaneous expenditure (=) all expenditure groups showed decreases in the second week. Admittedly significance tests have not been carried out; yet this study was based on recording by 14,182 spenders during 1987. The deterioration in recordkeeping is shown by the consistent decreases in the numbers of recording spenders and the numbers of transactions recorded in the second week for all expenditure groups (except for fuel where the numbers of recording spenders grew by 4). The second balancing experiment, which used residual discrepancies to probe for unrecorded expenditure, did not exhibit any interweek bias. It therefore seems likely that the FES results reflect a decline in recording rather than a decline in spending with fewer spenders keeping records and more transactions being forgotten. There are wider implications of this kind of study for diary recordkeeping. Measuring interweek variation for different sub groups provides the researcher with clues as to which group have difficulties in keeping diary records. Table 8 shows the interweek variations in expenditure for sex and age sub groups. Table 8 Ratios of second to first week expenditure x 100 by sex and age (1987) Age Men Women Under 31 94 102 31 – 49 99 101 50 – 64 85 102 65 – 75 103 98 Over 75 92 87 Base (Spenders) 6708 7474 I would appear that up until age 65 the recordkeeping of women does not deteriorate in the second week, whereas it does for men of these ages. From age 65 onwards it appears that men are better recordkeepers. Of course this does not take into account the fact that women make more expenditure transactions than men and therefore have more diary entries to make each day as Table 9 shows. Table 9 Average number of diary entries per day by sex and age (1987) Age Men Women Under 31 24.33 39.84 31 – 49 24.93 60.10 50 – 64 22.15 49.20 65 – 75 21.18 40.19 Over 75 20.18 30.41 6708 7474 Base (Spenders) Women make more diary entries than men; almost double the number. There is also an ageing effect, slight with men, but more dramatic for women from ages 31 onwards. This suggest that memory problems may affect the elderly particularly. There is another implication of the study of interweek bias particularly if the deterioration in the recordkeeping is admitted. Often there are suggestions that recordkeeping should be extended – not necessarily in order to capture all expenditure. In 1985 FES respondents were asked if they would keep records of expenditure on home improvements, extensions, central heating and a number of other large infrequent purchases for varying periods of time. The proportions prepared to carry on with records declined the longer the period. 84% would keep records of expenditure on these items for an additional month; 78% for an additional three months; 67% for six months and only 54% for a full additional year (which was the client’s ideal time period). Therefore there was a reasonable estimate of the potential wastage to the sample of extending recordkeeping in terms of numbers lost as well as possible increase in non response bias. However, there is no measure other than inter-week variation which can measure the deterioration in recordkeeping that certainly occurs over a two week period and is likely to occur increasingly over time the longer records are kept. It is for this reason as well as for response reasons that diary recordkeeping is limited to relatively short periods – short in terms of the ideal period that clients would like captured and short even in terms of what respondents feel is typical of their own individual patterns. Often interviewers are told by respondents that they have been sampled in an atypical two weeks. In sum the interest of OPCS and its clients in monitoring interweek variation has been mainly to see if the understatement of FES alcohol and tobacco expenditure when compared with National Accounts could be explained by a decrease in expenditure on these two items during recordkeeping. The interweek variation is too small to account for the discrepancies of 40% (alcohol) and 30% (tobacco). Nevertheless measuring interweek bias is a tool which can indicate sub groups which have difficulty in keeping records and also remind the researcher how ‘unnatural’ recordkeeping is for most people, notably that their attention span and ability to record accurately wanes with time. Two weeks may be just about enough time to ask people to keep daily records without beginning to attack reliability unduly. Possible remedies One does not want to leave the reader with the impression that deterioration of recordkeeping is a phenomenon that researchers must accept passively. The balancing of incomings and outgoings has been mentioned but a warning should be added that this is a very testing methodology and was used only experimentally without any intention of implementing it on a large scale. There are simpler devices such as prompt lists which can be left with the household as well as the use of checking schedules (see Annex A) which the interviewer administers at home while she checks the diaries. On the NFS the interviewer often finds items of food listed in the family menu for a meal which have not been listed as food purchased and she then probes whether or not the item has been recorded. For example Annex B shows the results of a probe that resulted in discovering an unrecorded mid-day meal menu comprising trout, broccoli and potatoes. See encircled writing on the menu page). The respondent had forgotten to record these items in the menu and also as food purchased that day. These were entered by the interviewer on both pages. Another example of how interviewers can help respondents with memory problems is best illustrated by what happens when they are not present to probe for missing detail. In 1979 an experiment was carried out on the NFS which compared the quality of data of 14-day diaries returned by post by respondents with 14-day diaries which had been checked at a final call by interviewers. There were obvious gains in terms of cost if a final call by interviewers could be avoided. However, first the quality of data obtained without the benefit of interviewer probing needed to be assessed. Cooperating households in two separate treatment groups in the third quarter 1979 NFS sample were given 14 day diary records. With one treatment the interviewer would, as is currently done on the NFS, return to the household at the end of the 14 days to collect the diaries and probe for missing detail. With the other treatment the interviewer would ask the respondent to return the diaries within three days of the end of the 14 days to the survey agency. Pre-stamped and pre-addressed envelopes were provided. With both treatments interviewers made a call midway through the recordkeeping to see how recordkeeping was proceeding and to probe for missing detail, thereby ‘educating’ respondents about the degree of detail needed. Although no financial incentive is offered to NFS respondents, in this case a £2 postal order was offered to those in both treatments who completed diaries. There were two aspects to the experiment which were of interest: first, there was an assumption that response would suffer if interviewers were not present at the end of recordkeeping to pick up the diaries; secondly that the detail entered by respondents would not be sufficient for coding. Overall response to the postal return treatment was lower by 3.5%, most of which was accounted for by housewives abandoning recordkeeping – which illustrates directly the effect of interviewers on response. Comparing the proportions of respondents who produced usable diaries between the two treatments showed no significant differences between age groups, tenure or household type. However, the average expenditure per week recorded by those who returned their diaries by post (£6.12) was lower than that recorded by those whose diaries were collected by interviewers £6.36). The main difference between the treatments was in the quality of the detail recorded for items. NFS respondents are briefed by interviewers to record price, quantity and description of each food item separately. Although missing quantities can be imputed if they are standard, lack of detail nevertheless can lead to difficulties for coders. If too many items are omitted, this leads to diaries being rejected – although this is rare. The proportion of rejected budgets for the postal treatment was only 1.8% compared with 1.2% for the interviewer treatment. Table 10 shows the types of missing information for the two treatments. Table 1 Omissions per diary by treatment Omissions per diary Postal Interviewer Quantities missing 5.03 1.29 Prices missing 0.38 0.16 Total 5.41 1.45 There were on average 5.41 omissions per diary by the postal method compared with only 1.45 with the interviewer collection method. Quantities were the main missing item for both treatments but also the most difficult to impute as most of the missing items turned out to not be weight-marked. (The housewife is supposed to weigh fruit and vegetables. Another indication of the poorer quality of the postal method was that only 14% of the postal sample were completely free of omitted details compared with 53% of the interviewer collected sample. The conclusion was not to proceed with postal return because of the lack of confidence in the reliability of the results for the non weight-marked foods and because of the lower response with the postal method. Cueing Jacobs mentioned the research which the BLS has carried out into the effects of the presence or absence of lists of items on diaries. Broadly listing the items to be included leads to better reporting than not listing items. She calls this ‘cueing’. An example of possible ‘over-cuing’ may have occurred when there was an attempted merger of the FES and the NFS in 1981. During two months there were experiments carried out which asked respondents to record in diaries the more detailed food requirements of the NFS along with all the remaining expenditure required by the FES. Because the NFS was carried out by another agency, most of the training of OPCS interviewers, who had already mastered the FES requirements, was to learn the detailed food requirements of the NFS. Some indication of the greater detail required for food at least was the fact that there were roughly 180 NFS food codes compared with only 60 FES food codes. Also quantities and weights were required in addition to prices, whereas for the FES only expenditure is recorded. The results, reported in a separate New Methodology series report*, showed that all expenditure was under-reported in relation to both the NFS and FES. This was attributed to additional burdens on respondents leading to under-recording and in turn to interviewers who may have felt subconsciously that informants were overloaded and so were less persistent in pressing for detail. This is indeed another insight into the importance of the interviewer role – if she (or he) is not confident that respondents can carry out all the tasks required this may affect respondent’s performance of the tasks. However, there is another possibility that there was an ‘overcueing’ effect based on an over-emphasis on recording food. During the second month, prompt cards listing all the food items required were left with households. Interviewers emphasised the need for detail about recording food reflecting their own growing mastery of the NFS requirements. The results for the second month showed that the food averages which had been 11% below that of the NFS were now more or less equal; however, the averages for non food items were depressed even further. It is extremely difficult to be authoritative about the reasons why food recording converged on the expected average and non-food items diverged. However, one reason may have been the training factor whereby interviewers learning new food recording requirements for non-food recording. Also the emphasis in all the ‘cues’ given to respondents were exclusively related to food recording requirements. On the basis of the results obtained from the two months studies there were doubts about the reliability of information obtained from a combined survey. However there was also optimism that over time the problems of interviewer confidence would have been reduced. _________________________________________________________________________ * The Family Expenditure and Food Survey Feasibility Study 1979-81 R Barnes, R Redpath, E Breeze NM 12. The intention has been to report some of the research OPCS has carried out into memory problems with diaries and with retrospective questions. The papers given by the Jacobses, Dippo and Lessler at the ISI Conference have served as a stimulus. This first article deals with OPCS research* into diaries used on both the FES and the NFS. The next article will cover research into retrospective recall on the FES. _________________________________________________________________________ * The author wishes to acknowledge the research contributions of Patrick Heady, Terry Kenney, Elizabeth Breeze, Anne Milne, Malcolm Smyth, Madge Brailsford and June Langham. FURTHER INFORMATION REQUIRED It would be helpful if you could have the following information and/or documents available for the interviewer when they call next time .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. THE CENSUS OF EMPLOYMENT AS A SAMPLING FRAME Elizabeth Breeze Recently the Census of Employment was used as a sampling frame for a survey of employers in Bristol intended to find out about recruitment practices and employers’ opinions on selected aspects of the labour market. Using this experience I have compiled a short list of points to take into account in choosing frames for future surveys to do with employers or employees or the organisations they work in. The Census is carried out by the Employment Department at intervals of 2-3 years. It is a mammoth undertaking with over two thirds of a million entries to collect. The units are called data units and are base on PAYE points. Most of them equate to workplaces, e.g. a shop, office, factory, but they are not bound by any geographical or physical constraints. For instance, we had casual staff and permanent staff as separate data units within a workplace, clusters of workplaces as one unit, and a group of data collected are industry, type of business and number of employees. The Department can assign units to local authorities, counties, parliamentary constituencies and to their travel-to-work areas. For those with access to the Census it is a prime candidate for a sampling frame because it is the most comprehensive list of employing organisations and it is a central computerised list. It includes organisations which would not be covered by Company registration, and lists units smaller than companies (people in these units are more likely to know about day-to-day practices for their employees than the people in the Company Headquarters). Also, as in our survey, one can pick out a particular geographical area. However, the Census is understandably subject to strict conditions of confidentiality and can only be used with the permission and assistance of the Employment Department. They do not have any resources specifically assigned to research use. This severely limits the range of users. Our survey used size and industry as stratification factors and took equal probability samples within each stratum. Units of fewer than 20 employees are subsampled for the Census so this should be taken account in any probability calculations. The appropriate person/people to select for interview will depend on the purpose of the survey. In our case we wanted to talk to people responsible for recruitment; however most of the issues we had to resolve were not exclusive to this group. It is important to be clear how the individuals eligible for interview and the units of analysis relate to the sampled units (data units) so that probabilities of selection are known. We were aiming to collect data referring to the data unit as a whole. Other surveys might be interested in one type of employee at the workplace or a union branch or a production unit defined for their own purposes. One of the most frustrating and challenging features of these surveys is that organisations vary enormously in their structure so that one cannot have a simple model which will neatly fit every case. When drawing up procedures for sampling, fieldwork, and analysis allow for the following possibilities: i) the unit of analysis coincides with the sampled unit; ii) the unit of analysis is smaller than the sampled unit, eg the survey picks out subgroups like part-time employees or supervisors: in this situation one needs to consider whether subsampling within the data unit is necessary. iii) The unit of analysis is greater than the sampled unit, eg all units belonging to the same company in the survey area: probabilities may be difficult to calculate because the number of data units involved in the larger unit may not be know. Stratification of data units by size and industry would complicate calculations too. Cutting across the units of analysis are considerations available form employees and how these can be analysed. of the data a) will the interviewees be able to talk about that unit in answer to the questions or b) can the interviewees only give specific information about subunits so that the answers have to be combined/aggregated to refer to the whole unit, for example different recruitment officers for different grades of staff; or c)can the interviewees only give specific information about units larger than the data unit (eg, statistics are only kept at regional level), for example central recruitment for all branches of a shop; or d) will a mixture of these circumstances arise. In quantitative surveys one has to define units in a way which will enable aggregation of data from the interviewees. One needs to remember that the categorisation of employees, production units, or other subject of interest, which suits the researcher’s model may be very hard to apply in some cases. In the recruitment survey employees were categorised into 5 groups but not all employers made distinctions; most could rework their numbers by our categories but we had to accept approximations. Businesses come and go, so with the best will in the world one is unlikely to have a list of data units which is up-to-date at the time of fieldwork. Decisions then have to be made about the following: i) the selected data units which are no longer at the given address whether to trace named companies/organisations to their new address and if so what criteria to adopt to decide whether it is the “same” data unit or not whether to select whatever is at the given address at the time of first call or whether to lose the selection from the final sample ii) the data units which come into existence since the Census whether to compile a supplementary sampling frame (probably an expensive undertaking), and if so how to check whether the frames are mutually exclusive iii) data units which have made business since the Census some changes to their nature of what criteria to use in deciding how to categorise and how to allow for mergers between data units. It should be noted that the stratification information may be out of date by the time of fieldwork so that analysis by the stratification factors may be inappropriate although this does not debar stratification from reducing sampling errors. I have not attempted to give solutions because the most appropriate answer for our survey will not necessarily be the best answer for another survey. However, to be forewarned is to be forearmed. RECENT METHODOLOGICAL PUBLICATIONS BY SSD STAFF The use of diaries in data collection A paper with this title has just been published in The Statistician (1990, Vol39, pp 25-41). It was written by Bob Butcher and Jack Eldridge and reports the results of a large-scale experiment carried out on the National Travel Survey. Before the 1985-86 round of the survey pilot fieldwork was carried out, using a sample of 1840 addresses, to test two methods of data collection, one using a seven-day diary the other a oneday diary. The two methods were compared for response rate, data quality and cost. Both methods achieved acceptable quality and they had similar response rates (75% and 78% respectively). On balance the seven-day method was more cost effective, so this was the method used for the 198586 NTS and for the continuous NTS that was launched in 1988. Women’s Experience of Maternity Care – a Survey Manual An SSD manual describing a specific survey methodology was published under this title by HMSO last year. As it was not part of the Methodology Series, it may not have come to the attention of Bulletin readers. Written by Val Mason, the manual is a step-by-step guide on how to carry out surveys of women’s experience of maternity care. It was produced for the Department of Health and aims to help those in health authorities wanting to measure consumers’ views of local services. It is intended both for the experienced researcher and for those who have not done a survey before and so contains a wealth of practical advice. Although written around surveys of maternity services, much of the advice is more widely relevant, particularly for local or small-scale surveys. Feedback in its first year shows that the manual is proving useful in districts throughout the UK. Indeed HMSO have already had to reprint the manual. Some district health authorities have now carried out surveys and others are planning them. The manual is also being used however as a more general source book on survey design. It provides an example of how theory is put into practice. It takes the reader through all the stages of a survey, from early planning and design, through the practical organisation to the analysis and the presentation and use of the results. The chapters on sampling describe the principles behind sound sample design and discuss the practical problems of obtaining a representative sample from local records. The manual recommends postal survey methods for these surveys and gives detail of how to carry out a postal survey, including advice on maximising response rates. The manual includes two model questionnaires, developed and tested in local surveys carried out by Social Survey Division. These provide examples for the design and layout of questionnaires for postal self-completion. On the whole, they are designed to minimise the time and resources needed for coding and keying the data. But the pros and cons of including more time-consuming open questions are discussed and the chapter on coding provides suggestions for their analysis including instructions for the development of coding frames for open answers. Three chapters are devoted to computer editing, analysis and the presentation of results. These assume little or no experience in this aspect of survey practice. It guides the researcher through the use of the SPSSX survey package. An article in June 1989 Survey Methodology Bulletin No.25 outlined the relatively straightforward method devised for editing the data using SPSSX. The manual was published as part of a package including the following: a. a pamphlet introducing the survey and giving suggestions for analysis b. the manual c. Printing masters of the model questionnaires and a computer program on floppy disk to help with the computing analysis if using SPSSX. NEW METHODOLOGY SERIES NM1 The Census as an aid in estimating the characteristics of nonresponse in the GHS. R Barnes and F Birch. NM2 FES. A study of differential response based on a comparison of the 1971 sample with the Census. W Kemsley, Stats. News, No 31, Nov. 1795 NM3 NFS. A study of differential response based on a comparison of the 1971 sample with the Census. W Kemsley, Stats. News, No 35, Nov. 1976. NM4 Cluster analysis. D Elliot. NM5 Response to postal sift of addresses. A Milne. NM6 The feasibility of conducting a national wealth survey in Great Britain. I Knight. NM7 Age of buildings. A further check on the reliability of answers given on the GHS. F Birch. NM8 Survey of rent rebates and allowances. A methodological note on the use of a follow-up sample. F Birch. NM9 Rating lists: Practical information for use in sample surveys. E Breeze. NM10 Variable Quotas – an analysis of the variability. R Butcher. NM11 Measuring how long things last – some applications of a simple life table technique to survey data. M Bone. NM12 The Family Expenditure and Food Survey Feasibility Study 1979-1981. R Barnes, R Redpath and E Breeze. NM13 A Sampling Errors Manual. R Butcher and D Elliot. NM14 An assessment of the efficiency of the coding of occupation and industry by interviewers. P Dodd. NM15 The feasibility of a national survey of drug use. E Goddard. NM16 Sampling Errors on the International Passenger Survey. D Griffiths and D Elliot. Prices: NM13 £6.00 UK - £7.00 overseas All others £1.50 UK - £2.00 overseas Orders to: New Methodology Series, Room 304, OPCS, St. Catherines House, 10 Kingsway, London WC2B 6JP
© Copyright 2026 Paperzz