American Journal of Epidemiology Copyright O 1997 by The Johns HopHna University School of Hygiene and Public Health All rights reserved Vol. 148, No. 11 Printed In U.SA. Cognitive Aspects of Recalling and Reporting Health-related Events: Papanicolaou Smears, Clinical Breast Examinations, and Mammograms Richard B. Warnecke,1 Seymour Sudman,1-2 Timothy P. Johnson,1 Diane O'Rourke,1 Andrew M. Davis,3 and Jared B. Jobe 4 This paper reports an examination of cognitive processes used by 178 women aged 50 years and older in retrieving information about the frequency with which they received Papanicolaou smears, mammograms, and clinical breast examinations. Women were selected from a health maintenance organization in which they had been enrolled for at least 51/> years. The literature suggested that reporting of regular events such as these kinds of tests is likely to be based on schemas, which is an estimation technique in which events are reported in a format with generic content. Thus, if the procedure is believed to occur annually, the respondent will report receiving five tests in 5 years. The study attempted to evaluate whether use of episodic recall, in which respondents are forced to report individual events, would be more accurate than reports based on estimation using a schema format. The results indicated that most of the errors occurred in Papanicolaou smear reporting, which is consistent with the literature, and that the fewest errors occurred with mammograms. Regardless of the questionnaire format, respondents persisted in using schemas based on the date of annual physical examination. Most reporting errors occurred because the interval between examinations was estimated incorrectly. Am J Epidemiol 1997;146:982-92. mammography; Papanicolaou smear; questionnaires; retrospective studies construct histories of Papanicolaou smears, mammograms, and clinical breast examinations. Epidemiologic research generally relies on selfreports or histories obtained via interview or questionnaire to acquire information about events or activities that may have taken place within days, weeks, months, or even many years prior to the study. Collection of these histories relies heavily on respondents' recall, and only rarely are there existing records available for validation of whatever information the respondent provides. These retrospective interviews require the subjects to recall autobiographical information about the nature of exposures or other events, the circumstances under which they occurred, and, quite often, the date. The focus of this paper is respondent recall in response to retrospective questions. To illustrate the potential impact of recall on histories obtained by interview, we examine recall strategies used by respondents to re- COGNITIVE THEORY AND RETROSPECTIVE RECALL In recent years, improvement of the accuracy of self-report in surveys has received considerable attention by survey specialists who have joined with cognitive psychologists to apply the theories and methods of cognitive psychology to question design. This collaboration has enhanced understanding of how respondents retrieve autobiographical information (1) and how to design questions that improve the validity of retrieved information (2). It is now generally agreed upon by most researchers who use cognitive methods to understand survey measurement error that there are four steps in the response process (3-5): 1) interpretation of the question; 2) retrieval of either the answer to the question or relevant information that will be used to construct an answer, 3) judgment formation, during which the respondent uses recalled information to formulate an answer; and 4) editing or deciding what judgments formed in the preceding step will be disclosed to the interviewer (6). Not every step is equally important for all questions. The focus here is on step 2, retrieving information. Received for publication January 17, 1997, and in final form September 19, 1997. Abbreviation: HMO, health maintenance organization. 1 Survey Research Laboratory, College of Urban Planning and Public Affairs, University of Illinois at Chicago, Chicago, IL 2 College of Commerce, University of Illinois at UrbanaChampaign, Urbana, IL 3 RUSH-Anchor Health Maintenance Organization, Chicago, IL 4 National Institute of Aging, National Institutes of Health, Department of Health and Human Services, Bethesda, MD. Reprint requests to Dr. Richard B. Warnecke, Survey Research Laboratory, University of Illinois at Chicago, 910 W. Van Buren, Suite 500, M/C 336, Chicago, IL 60607. 982 Recalling and Reporting Health-related Events There is a substantial body of research conducted by both cognitive psychologists and survey researchers on the processes of retrieving information from memory and the effects of question wording on the validity of this retrospective reporting (1, 7). For example, most cognitive psychologists believe that some kinds of activities are not retrieved as individual events, but as "schema." Schema are generalizations based on experience and are recalled as patterns with generic content rather than as specifically enumerated events (8). The accuracy of information obtained from schemas depends on the regularity of the behavior (9) and the length of time since the event occurred (10). Research suggests that response accuracy can be influenced by asking questions that deliberately evoke or discourage the use of schemas. Brewer (11) has made a strong case that the date of an event is its least well-remembered attribute, relative to such other aspects as where the event occurred, who was present, and what took place (12, 13). It has been found, for example, that in many instances the date on which an event occurred may not even be incorporated as part of the memory of that event. In such cases, obtaining information about when an event occurred may require the interviewer to use other aspects of the event as cues to influence recall by ordering the questions about the better-remembered aspects of the event to provide a context for helping the respondent recall the date (10, 11, 14). We present here findings from a study conducted for the National Center for Health Statistics (15) of the cognitive processes that older (50 years or more), female respondents used to answer questions related to their experiences with obtaining Papanicolaou smears, clinical breast examinations, and mammograms. The purpose was to ascertain ways to improve the validity of self-report data about the frequency of obtaining these procedures. WEAKNESSES OF SELF-REPORT DATA FOR PAPANICOLAOU SMEARS, CLINICAL BREAST EXAMINATIONS, AND MAMMOGRAMS Most of what is known about females' knowledge about and frequency of receiving cancer screening procedures has come from self-report, survey interviews, such as the National Health Interview Survey and the Behavioral Risk Factor Surveillance System. The accuracy of self-reports of cancer detection tests obtained by interviews is still a topic of debate. To examine the validity of these self-reports, several researchers have matched medical records with responses from self-report interviews. Four response quality measures were calculated to summarize respondent reports: 1) concordance, or the Am J Epidemiol Vol. 146, No. 11, 1997 983 percentage of all respondents who reported that the test was received or who reported "no test" that agreed with the records (16). This ratio is also commonly referred to as an indicator of "gross accuracy" (15) or the "raw agreement rate" (17). 2) Sensitivity is the number of correctly recalled tests divided by number of cases with tests found in the medical record. In previous research, measures that are operationally identical to sensitivity have been referred to as measures of completeness (18, 19) or as the medical record confirmation rate (17). 3) Specificity is the number of correctly reported "no tests" divided by the number of cases with no test found in the records. 4) Finally, the report-to-records ratio is the percentage who reported that they received the test divided by percentage of those who reported that they had not received a test plus false-negatives (those cases in which no test was reported, but the records indicated that one was received). The report-to-records ratio is a measure of net bias in test reporting (17). Reporting accuracy of the Papanicolaou smear As can be seen in table 1, the majority of the self-report validation has been done on studies using Papanicolaou smear data. Table 1 provides a summary of 12 published studies that evaluated Papanicolaou smear-reporting quality. These studies vary along several important dimensions, as shown in table 1. For example, subjects have been identified from a variety of sources, including random community samples (20-24), health maintenance organization (HMO) membership rolls (15, 16, 24, 25), public health clinics (26), cancer tumor registries (27), and case-control studies (28), and across diverse geographic locations. In addition, subjects varied by age and ethnicity. They were asked to recall their screening histories across a variety of time intervals ranging from immediately after a medical encounter to some time in the 6 years after the test. Sufficient information was nevertheless available for us to calculate a comparable set of measures for each study. Concordance of respondent reports of Papanicolaou smear history ranged from 0.39 through 0.80 (weighted average = 0.65). Although sensitivity was high (range = 0.61-0.97; weighted average = 0.81), specificity was only moderate (range = 0.19-0.76; weighted average = 0.40). Report-to-records ratios indicate that receipt of Papanicolaou smears was overreported in each study (range = 1.21-3.32; weighted average = 2.10). Other Papanicolaou smear validation studies are available that did not contain information necessary for inclusion in table 1 (27-32). The results Study (reference no.) HMO HMO HMO HMO Random HMO Random low-Income housing HMO HMO HMO HMO HMO Random low-income housing Random HMO Random/referred Random HMO* Cervical dysplasia study controls Tumor registry Cervical cancer study controls STD* clinic Random Sample HMO, health maintenance organization; STD, sexually transmitted disease. 3,280 2,655 534 6,478 Weighted means Papanlcolaou smear Mammogram Breast examination AD examinations 1 172 178 371 456 84 441 537 178 199 380 386 178 Breast examination Gordon etal. (16) Loftus et al. (25) Sudman etal. (15) Mammogram King et al. (33) Brown and Adams (17) Gordon etal. (16) Dignan et al. (34) Loftus et al. (25) Paskett et al. (22) Hiatt et al. (23) Sudman etal. (15) 544 173 438 686 318 Michielutts at a). (26) Wamecke and Graham (24) Loftus et al. (25) Paskett et al. (22) Hiatt et al. (23) Sudman etal. (15) 111 352 212 125 218 98 No. Northern California Washington State Chicago, Illinois Pennsylvania Massachusetts Northern California North Carolina Washington State North Carolina Northern California Chicago, Illinois Buffalo, New \brk Washington State North Carolina Northern California Chicago, Illinois Los Angelas, CaTrfomla North Carolina Australia Northern California Ontario, Canada Chicago, Illinois Ontario, Canada Location Previous Papanlcolaou smear, clinical breast examination, and mammogram validation studtes Papanicolaou smears Sawyer etal. (20) Bowman eteJ. (21) Gordon etaJ. (16) Walter et al. (28) McKenna et al. (27) Walter et al. (28) TABLE 1. 2 years 6 months 6 years 1 year 1-2 months 2 years 1 year 6 months 1-2 years 2 years 6 years 3 years 3 years 2 years 1 year 3 years 1 year After examination Syoara 6 months 3 years 2 years 6 years Recall period 0.81 0.68 0.84 0.70 0.78 0.65 0.86 0.73 0.97 0.71 0.70 0.90 0.77 0.51 0.84 0.70 1.00 0.92 0.98 0.99 0.61 0.76 0.90 0.95 0.93 0.97 0.89 0.96 0.72 0.97 0.71 0.97 0.87 0.84 0.81 0.54 0.58 0.78 0.78 0.78 0.75 0.65 0.59 0.47 0.39 0.80 Concordance SenatMty 0.51 0.40 1.39 0.70 1.65 2.10 1.42 2.07 1.26 1.03 2.34 0.84 0.91 0.30 0.81 0.22 1.06 1.16 1.21 1.49 1.21 1.21 1.25 1.44 1.72 2.22 3.32 2.10 1.78 1.42 3.16 1.51 Report-torecords ratio 0.94 0.85 0.58 0.71 0.76 0.19 0.73 0.47 0.55 0.35 0.65 0.41 0.45 0.32 0.21 Spectficty Recalling and Reporting Health-related Events of these studies, however, are consistent with those summarized table 1. Reporting accuracy of the mammogram A summary of findings from eight verified mammography studies is also presented in table 1. Note that these study populations are far more homogeneous than those examined for Papanicolaou smear reporting. Six of the eight mammography studies selected subjects from HMO rosters (15-17, 23, 25, 33); two (22, 34) came from random community surveys. These studies examined a range of recall intervals (from 1-2 months to 6 years) similar to those in the Papanicolaou studies reviewed, and all were published during the 1990s. Data from seven of diese studies allowed us to make all the calculations in table 1. The data of Loftus et al. (25) allowed only calculation of report-to-records ratio. Overall, the data from these studies showed higher degrees of concordance (range = 0.51-0.97; weighted average = 0.76), sensitivity (range = 0.70-1.00; weighted average = 0.86), and specificity (range = 0.30-0.94; weighted average = 0.68) compared with the Papanicolaou smear validation data. Although overreporting was also a problem with mammography self-reports, report-to-records ratios were smaller than among those calculated for Papanicolaou smear data (range = 0.84-2.34; weighted average = 1.39). Reporting accuracy of the clinical breast examination Three studies have attempted to validate self-reports of clinical breast examinations. Gordon et al. (16) reported a concordance rate of 0.70 for this procedure, with high sensitivity (0.97) but low specificity (0.22). Sudman et al. (15) reported a similar concordance rate (0.73), a somewhat lower sensitivity (0.78), and significantly higher specificity (0.70). As with mammography, the data of Loftus et al. (25) allow only calculation of the report-torecords ratio. Unlike Papanicolaou smears and mammograms, medical record validation of clinical breast examinations is generally derived from physician notes as opposed to radiology or pathology reports. As a result, true reports may have an increased likelihood of being classified as "false positives" if the physician failed to record them. Consistent with findings for the other two screening procedures, clinical breast examinations were also likely to be overreported. The weighted report-torecords mean was 1.54. PRELIMINARY STUDIES AND HYPOTHESES We initially identified issues corresponding to each of the four major cognitive process stages discussed above Am J Epidemiol Vol. 146, No. 11, 1997 985 that might be associated with reporting accuracy. We examined three indicators of comprehension: 1) Do older women know what the terms "Papanicolaou smear" and "mammogram" mean? 2) Do they confuse procedures such as clinical breast examination and mammograms? 3) Were any of the questions we planned to ask prone to substantial misinterpretation? We next examined three aspects of information retrieval: 1) How do older women retrieve information on medical screening examinations? 2) Are tests stored in discrete memory, or do respondents use schema? 3) If schema are used, what are the schema? Two dimensions of judgment formation were evaluated: 1) How do older women form judgments about how often they are screened? 2) Do they count each time, or do they estimate the frequency? Finally, editing of responses was assessed by considering three issues: 1) How comfortable are older women discussing their experiences with Papanicolaou smears and mammograms? 2) Do they withhold sensitive personal information or overreport socially prescribed behavior? 3) Does the interviewer's gender influence their comfort with the subject of the interview? We conducted a series of pilot studies with women employed by the University of Illinois at Chicago who had been members of the HMO that was to be used for this project for at least 5 years. This HMO had a policy of providing Papanicolaou smears, mammograms, and clinical breast examinations as part of the annual physical examination offered to all female members of the HMO, so variation in opportunity was held constant. The pilot studies are discussed in detail elsewhere (15); however, they included two focus groups and 16 "think-aloud" or in-depth interviews. The focus group participants were also selected by race (black/white) to allow us to ascertain if there were differences by race in the women's understanding of these procedures; there were no differences. The content of focus group discussion addressed the need for regular screening for cancer and the women's understanding of the nature of and recommended screening interval for the Papanicolaou smear, mammogram, and clinical breast examination. The focus group results confirmed that these women knew what the procedures were and understood the recommended screening intervals. As part of the discussion, they were asked whether the gender of the interviewer would affect their comfort discussing Papanicolaou smears, mammograms, or breast examinations. Gender of the interviewer was not an issue. Four investigators (T. P. J., D. O., S. S., and R. B. W.) conducted 16 think-aloud interviews with members of the selected HMO. Several questionnaire formats were used. In one format, the questions were 986 Wamecke et al. asked exactly as they are on the National Health Interview Survey; the other formats varied the way in which the questions were asked. All versions were followed up with requests for the respondents to think aloud about how they arrived at the answers to the questions, what they thought the questions meant, and what kinds of information they thought about when making their judgments about frequency. At the end of the interview, each respondent was asked about her level of comfort with discussing these procedures in the interview. With the patients' consent, we verified all reported screening experience in their records at the HMO. We asked respondents whether they received any procedures from other sources. As all respondents interviewed came from a single HMO and were selected because they were long-term members, we are quite confident that the respondents did not receive these procedures from other sources. The results of these preliminary tests indicated that comprehension was not an issue. Women over age 50 years knew what these procedures were and understood the questions about frequency. There was some confusion about the distinction between breast examinations and mammograms because most of the women did not distinguish between them. When we used the term "mammogram" first, followed by the term "breast examination," the confusion was eliminated. Confusion about what constituted a physical examination was resolved by using the phrase "a complete physical examination." From the focus groups and the think-aloud responses to the questionnaire versions finally selected, it was clear that few respondents were retrieving isolated individual screening examinations. Almost all respondents relied on schemas (11). The reasons for this are fairly clear. First, a majority of respondents reported that the behavior being studied occurred regularly. (An examination of the records indicated that actual behavior was less regular than what was remembered.) Second, 5 years is a long time period, which makes it difficult to retrieve individual events. Both regularity (9) and long time periods (35) have been shown to lead to the retrieval of schema rather than individual episodes. By far the most common schema used, even when unprompted, was to associate screening tests with annual complete physical checkups. The other major schema was to associate receiving the test with serious illnesses or conditions that required one or more of the procedures for monitoring the condition or as part of follow-up after the condition was corrected. It appeared to be easy for respondents to remember that they had such tests and to count the number they had received. Gynecologic examinations were much less frequent than complete physical checkups, but when they did occur, Papanicolaou tests and mammograms were usually part of the process. Another schema that was mentioned involved retests when the original results were inconclusive. It has been found that memories are often stored in selected subsets that are highly organized cognitively, but are subject to errors of incompleteness and distortion (36). Although respondents appeared to have little trouble retrieving schema related to screening tests, there were some indications of possibly faulty retrieval or incompleteness. For example, several respondents reported getting regular annual checkups, but in later discussion indicated that the time period between checkups varied from 12 to 15 and sometimes even 18 months (incompleteness). A few respondents reported that they knew they were supposed to get the test annually, but skipped 1 or more years out of 5 for a variety of reasons (distortion). However, regardless of the accuracy, the schema comprised the basic structure for estimating frequencies in most instances. DESIGN OF THE MAIN STUDY Objectives and scope Upon completion of phase I, a field study was designed and executed to assess the methods suggested by the pilot phase. Cognitive methods were investigated to improve responses to questions about health promotion and disease prevention, specifically focused on accuracy of reporting Papanicolaou smears, mammograms, and clinical breast examinations. The objectives of this research were to measure how well the experience of receiving these procedures is recalled by the respondents and to determine what methods are most likely to elicit accurate recall and reporting of having received them. This paper examines three hypotheses that the pilot research suggested would be most viable. 1) More accurate reporting will be found for early cancer detection tests that are conducted separately rather than as part of a physical examination because separate events are more salient and easier to retrieve. 2) Controlling for frequency, more accurate reporting (less over- and underreporting) will be found for respondents for whom the diagnostic tests are conducted regularly rather than irregularly because regular events are easier to schematize accurately. 3) More accurate reporting will be found among respondents who use a questionnaire form that activates schemas relating to health events, physical checkups, and gynecologic examinations, compared with a questionnaire that asks about each screening test separately. Am J Epidemiol Vol. 146, No. 11, 1997 Recalling and Reporting Health-related Events Hypotheses 1 and 2 are based on characteristics of the early detection procedures and the context in which they occur. These are events that are not controllable by the researchers, although both regularity and the context in which the tests are delivered can be ascertained from existing data. Hypothesis 1 is based on the idea that screening procedures that are delivered separately are more likely than those that are delivered as part of general physical examinations to be stored in memory as unique events. This logic assumes that the respondent is made aware of the event because of the distinctiveness of the circumstances in which it occurred compared with events that are parts of familiar processes and therefore are not distinct (37, 38). Hypothesis 2 follows from the work of Menon (9) and others who have suggested that the formation and use of schemas to make estimates of one's behavior is facilitated by the regularity of that behavior. The very regularity of the behavior promotes and enhances recall. The context facilitates recall. This hypothesis is distinct from hypothesis 1 because it addresses regular behavior, whereas hypothesis 1 refers to irregular behavior. Hypothesis 3 is also derived from ideas about schema formation. It asserts that schemas will be easier to retrieve and use in estimation if they have been activated previously by questions that parallel the schemas that most respondents use. (Questionnaires are available from the investigators.) TABLE2. (n-178) 987 Demographic characteristics of respondent* Characteristic % ^ o (years)* 50-59 60-69 270 57.9 33.7 8.4 Rac« Black White Hispanic Asian/Pacific Islander 87.6 8.4 3.4 0.6 Education Less than high school graduate High school graduate Some college College graduate 15.2 34.8 25.8 24.2 Employment status Employed full time Employed part time Not employed 71.9 5.6 22.5 Moan age, 59.6 years. pie (106 subjects) was interviewed with a questionnaire that attempted to enumerate each screening test separately (episodic version). Because only 178 respondents agreed to allow us to validate their reports, 86 were interviewed with the schematic version and 92 with the episodic version. Target population As with the pilot phase, women aged 50 years and older who were members of the RUSH-Anchor HMO for at least 5 years comprised the target population; however, they did not have to be university employees. As with the pilot sample, we selected women in this age group because they are the group most likely to misreport having received a Papanicolaou smear (24, 31, 32). Moreover, women over age 50 are the primary target population for mammograms and breast physical examinations. The characteristics of the population are presented in table 2. A total of 211 women met the criteria for the study. However, of these, only 178 signed consent forms, allowing us to access their records for validation. Thus, the sample size in table 2 is 178. Questionnaire formats To test the hypotheses, we randomly assigned half of the sample (105 subjects) to be interviewed with a questionnaire designed to encourage the use of schema to recall screening, general checkups, and gynecologic examinations (schematic version). The rest of the samAm J Epidemiol Vol. 146, No. 11, 1997 Medical record abstraction The sources of specific data for each eligible patient included physician progress notes, cytology reports, laboratory reports, and radiology reports. Information collected about events in the previous 5>/2 years included 1) total number of medical visits; 2) frequency, dates, and results of Papanicolaou smears; 3) frequency, dates, and results of mammograms; 4) frequency, dates, and results of clinical breast examinations; and 5) whether a full or partial hysterectomy was ever performed. The matching between the record and the questionnaire was done by a medical records specialist. Initially, matching was done regardless of discrepancies by date. Once a match was established, the difference in months between the record date and the interview date was computed if a date was given on the interview. Reports of frequency of procedure were classified as "regular," "irregular," or "not received." We followed the procedure of relying on the medical record as the standard, generally adopted in validating self-reports. However, we recognize that many who rely on records have found them to be incomplete 988 Wamecke et al. and not totally reliable (39-42). A record reabstraction study was undertaken to evaluate whether complete information was abstracted from medical records. A credentialed medical records technician employed by Survey Research Laboratory reabstracted 20 records. Overall, there was 98 percent agreement between the initial and the reabstracted data, indicating a very high level of abstractor reliability. A self-reported instance of screening was considered valid if a procedure was found in the chart and was dated within ± 3 months of the date reported by the respondent. RESULTS OF THE MAIN STUDY In formally testing the three hypotheses discussed in the paper, we relied both upon intuitive interpretation of the measures using bivariate analyses and on random effects logistic regression models (43). Thus, for each hypothesis the analyses will begin with bivariate analyses, and then the final evaluation of the hypothesis will use a random effects model. Random effects statistical modeling is used to adjust for the fact that the observations in these analyses are not independent of one another. In fact, most of the 178 respondents contributed 18 self-reports (three screening procedures for each of 5'/2 years) to die analysis. The clustering in these data therefore requires an analytic approach capable of testing our hypotheses while simultaneously modeling this dependency. Our basic finding is that respondents overreported receiving screening tests regardless of the format in which the question was asked. Over the Stt-year period of interest (January 1987 to June 1992), the relative overreporting was 29 percent for all tests combined (16 percent for mammograms, 26 percent for clinical breast examinations, and 51 percent for Papanicolaou smears). Most of the error was in false-positive reports, which averaged 16 percent, compared with omissions, which averaged 5 percent The null hypothesis that memory errors were unbiased (i.e., the rate of false reports equals the rate of omissions) was tested by the McNemar test for related samples (44) on each type of examination for each year. The null hypothesis was rejected (p < 0.005), since overreporting was significandy greater than underreporting in all reporting years for Papanicolaou smears and for 3 of 5x/i reporting years for clinical breast examinations and mammograms. The effects of time lend further support for the general use of schemas. Except for in the partial data for 1992, there was no statistical effect of time in the bivariate data on false reports (range, 14.1-21.3) during the years studied. Another indication that time is not important is seen in the consistent concordance scores over the study period (range, 0.74-0.77), again with the exception that the partial data obtained in 1992 showed a the higher rate (0.91) for the partial year. These data are consistent with the basic finding of the pervasive use of schemas. Respondents reported that they received regular, annual physical examinations that included the procedures and forgot possible exceptions. Null model For testing our hypotheses in these regression analyses, summarized in table 3, we have selected concordance as the dependent variable of interest. The null model is presented as model 1 of table 3. This model estimates the odds ratio and 95 percent confidence interval for one parameter, reporting year. Variability associated with random, person-specific effect was also assessed in the model and was significant. This result indicates that there is a clustering effect within these data (the intracluster correlation in the model is 0.10,/? > 0.001). The results of the null model indicate that concordance improved with time (due in large part to inclusion of partial data for 1992, as expected based TABLE 3. Odds ratios (and confidence intervals) for random effects logistic regression equations predicting respondent-record concordance (1 • yes) Model 35 1t Report year Papanicolaou smear (1 - yes) Breast examination (1 - yes) Test regularity (1 - yes) Odds ratio 95% confidence Interval 1.17** 1.13-1.22 Odds ratio 95% confidence Interval 1.19** 0.64* 0.48** 1.14-1.23 0.51-0.79 0.38-O.59 Odds ratio 95% confidence Interval 1.19** 0.64** 0.50** 1.32* 1.14-1.23 0.50-0.81 0.40-0.62 1.05-1.67 * p < 0.01; * * p < 0.001. t Model 1, nuD model. i Model 2, effects of type of test § Model 3, effects of regularity. Am J Epidemiol Vol. 146, No. 11, 1997 Recalling and Reporting Health-related Events on the bivariate data) as shown by the significant odds ratio for report year in model 1 of table 3. This model is the baseline for evaluating each of our hypotheses. Hypothesis 1: effects of type of test Hypothesis 1 proposes that tests conducted separately will be better remembered. For this sample, almost all Papanicolaou smears and breast examinations were conducted as parts of the physical or gynecologic examination. The exceptions were too few to analyze. Mammograms were conducted at a separate location across the street from the clinic, and in addition, there was some variation in when this test is done. For hypothesis 1, we compared mammograms that were done elsewhere with Papanicolaou smears and breast examinations usually done at the time of the physical. Hypothesis 1 was confirmed using report-to-records ratio (mammograms, 1.16; clinical breast examinations, 1.26; and Papanicolaou smears, 1.51), concordance (mammograms, 0.84; clinical breast examinations, 0.73; and Papanicolaou smears, 0.78), and specificity (mammograms, 0.81; clinical breast examinations, 0.70; and Papanicolaou smears, 0.73). Another difference is in the percentage of false reports, which averaged 18.7 percent for Papanicolaou smears and breast examinations and 11.1 percent for mammograms. These results are consistent over time. For the analysis using the random effects logistic regression model, dummy variables representing concordance in Papanicolaou smear and clinical breast examinations were added to the model to contrast the degree of concordance with that observed for mammograms. The significant odds ratios in table 3, model 2, confirm the effects observed in the bivariate data, in that mammograms were reported with greater accuracy than either of the other procedures. Moreover, the model chi-square improved significantly (A^2 = 43.6, df = 2). The clustering effects were still present in the model. Hypothesis 2: effects of regularity Hypothesis 2 returns to the effects of schema and time on the accuracy of recall. Hypothesis 2 postulated more accurate reporting among respondents for whom the diagnostic tests are conducted regularly rather than irregularly, since regular events are easier to schematize accurately. This hypothesis is supported. Testing hypothesis 2 required that the term "regularity" be defined. Respondents were classified as regularly receiving the procedure if there was evidence in the medical record that they were tested over some regular interval (every year, every other year) or if Am J Epidemiol Vol. 146, No. 11, 1997 989 they received examinations during 4 of the 5 years included in the study. All other respondents were classified as irregular, except for those with no medical record evidence of cancer screening examinations during the 5-year study period, who were excluded from the comparison of regular and irregular respondents. Bivariate results reveal that the major difference between regular and irregular test recipients is that the percentage of false reports was higher among irregular test recipients (23.0 percent compared with 11.0 percent for Papanicolaou smears, 23.2 percent vs. 6.9 percent for breast examinations, and 14.0 compared with 7.1 percent for mammograms) than among regular recipients, respectively. Overall, the net biases (report-to-records ratios) were far smaller for respondents who received tests regularly for each procedure. They were 8 percent for Papanicolaou smears, —18 percent for breast examinations, and only 1 percent for those who received mammograms regularly. On the other hand, the net biases for those who were tested irregularly ranged from 31 percent for those who had mammograms irregularly to 90 percent among those who received Papanicolaou smears irregularly. Review of the concordance measures supports the finding of Menon (9) that regularity increases accurate reporting. Model 3 in table 3 provides an overall test of the effect of obtaining tests regularly on concordance, controlling for test type. Both the regression estimate for this variable and the difference in chi-square likelihood ratio (A^2 = 81.4, df = 1) for the overall model indicate that test regularity improves the accuracy of reporting cancer screening tests. Hypothesis 3: effects of questionnaire format Hypothesis 3 proposed that more accurate reporting would be found for respondents who used a questionnaire form that activates schema related to health events, physical checkups, and gynecologic examinations compared with a questionnaire that asks about each of the screening tests separately. The results do not support this hypothesis. For example, the ratios of reporting to records by form range from 1.26 to 1.33. These differences are not statistically significant, nor do they have any practical importance. There was no consistent pattern of superiority of either questionnaire form across all three screening tests or over years. In retrospect, it appears that the treatment was ineffective because most respondents used schema in answering the questions regardless of the form. To put it another way, use of schemas was not influenced by the form of the questionnaire; they were already activated simply 990 Wamecke et a). by the topic. A random effects model not presented here confirmed this finding (15). DISCUSSION AND RECOMMENDATIONS On the basis of phase I focus groups and phase II interviews, most women regularly used schema to report their tests, such as "I get a mammogram every year" or "I get a Papanicolaou smear along with my annual physical." We thought that we could affect this use of schema by revising the questionnaire format, but we were unsuccessful, as shown by the results of the test of hypothesis 3. It was clear that the specific events in question were imbedded in the recall of the routine physical examination experience and could not be recalled independent of that key, initiating experience. Schemas do not necessarily provide poor estimates. For very regular behavior, schemas may provide better estimates than would efforts to remember individual episodes. Schemas can, however, result in overstatements of behavior when respondents forget occasions when the regular behavior is interrupted. That appears to be the case for all of the health care behaviors studied. As noted above, respondents, on average, overstated receiving health care procedures by 29 percent compared with the records. The overstatement ranged from 16 percent for mammograms to 51 percent for Papanicolaou smears. One consequence of using schemas was the small differences in the levels of false positives by year. Researchers typically observe sharp increases in underreporting caused by forgetting due to longer recall periods, as was observed in the data we presented on false negatives. Since the false positives are far larger, they dominate the net results. When the respondent reported regularly receiving the procedures, using schema facilitated recall, as shown by the positive results of our test of hypothesis 2; when they received the results irregularly there was less accuracy. The cognitive explanation of this finding makes it almost tautological. We have shown that most respondents use regularity schema in reporting about these tests. Those respondents whose records indicate that, indeed, the tests were received regularly would certainly have a lower level of false reports than would respondents who did not receive the tests regularly but who thought they did. It is also of interest to note that there really was very little decay in recall associated with the length of the recall period. The effects of time seemed to be entirely due to the effect of including the partial data for 1992. For screening procedures, the important issue is whether the respondent is getting them according to the recommended intervals. The current recommenda- tions for Papanicolaou smear, clinical breast examination, and mammogram vary, but for the Papanicolaou smear, where inaccurate reporting is most serious, the recommendation is once every 3 years after two consecutive negative smears (45). Hence, knowing whether the respondent received a Papanicolaou smear once a year or two to three times during a 5-year interval is immaterial, since either pattern is consistent with the recommendations. Thus, one needs to think about what level of recall and length of recall period are necessary when taking such histories (35), and the question of what is sufficient needs to be considered. On the basis of our results, we did not recommend any changes in the format of the questions currently used in the National Health Interview Survey to obtain information about Papanicolaou smears, mammograms, and breast examinations. Despite the error, the data obtained are sufficient to answer the question regarding regularity. Another finding that is relevant to the reliance on schema may be that the assumption that membership in an HMO always means a complete annual checkup, linked to these screening procedures, may not always hold. The changes in health care delivery may create increasing problems in tracking such behavior as the US medical care system becomes more competitive and pressed for time. Some of the HMO physicians noted that when they knew patients would visit several times a year for chronic care, they would be more likely to conduct a physical examination and preventive services over several visits. If these screening procedures are not delivered with the annual physical, it will create additional obstacles for accurate patient recall of these events. Moreover, should changes in insurance coverage require a change of HMO, there might not be consistent record data over time, since the patients' records may not follow changes in HMO due to employer contracting procedures. Future research, particularly in non-HMO populations, would be desirable. Studies should test the assumptions regarding the meaning of regularity. It may also be that Papanicolaou smears, the most overreported procedure, are not really offered to women annually but that regularity is actually defined by a less frequent interval. Neither questionnaire form changed the way that women retrieved information. This may be due to the homogeneity of the patients selected from one HMO where the care is delivered in a uniform way. We had speculated that women who had hysterectomies would report Papanicolaou smears more accurately. They did not report more accurately, although, of course, they reported fewer tests. Nevertheless, the percentage of false positives and false negatives did not differ beAm J Epidemiol Vol. 146, No. 11, 1997 Recalling and Reporting Health-related Events tween women who did and those who did not have hysterectomies. It should also be pointed out that nothing in our study rules out the possibility that some of the overstatement of receipt of procedures was caused by the perceived social desirability of regularly receiving early cancer detection tests. Social desirability would not necessarily imply that respondents deliberately falsified their answers. It could also be that respondents who may have been uncertain about whether they had a procedure every year said that they did because they knew that this was what they should have done. This recall strategy is consistent with the generic nature of schema content (1). Our sample was desirable because of our ability to validate information from records. It may not be representative of the general public, however, since HMO patients may be more likely to receive cancer screening procedures than are patients receiving care from fee-for-service physicians (46-48). It is also possible that membership in an HMO results in more regular behavior and greater use of schemas than is found in the general population of women over age 50 years. Finally, as we used a small sample, it is also possible that with larger numbers some of the differences we predicted would have been statistically significant. BROADER IMPLICATIONS FOR COGNITIVE RESEARCH ON SURVEY RESPONSE IN EPIDEMIOLOGIC RESEARCH Aside from a better understanding of how women report on these three cancer screening tests, the results also have implications for future research on cognitive aspects of taking medical histories. First, these results strongly suggest that respondents are likely to use schema in reporting about health behavior even when the total number of events is small if they perceive the events as regular. The use of schema is even more likely if respondents are asked about less recent events. One might also expect that the likelihood of forgetting exceptions would increase with longer time periods, but we saw no evidence of it in this study. When schema were used, our results showed that the order in which questions about details of an event were asked had no effect on the accuracy of reporting that the event occurred. As many epidemiologically relevant events occur early in life, taking histories that cover lifetime exposures may be especially susceptible to recall error. Moreover, we examined regularly occurring events; irregular events that occur early in a lifetime may present different and more complex problems. Obviously, if we could predict when it would help, it would be desirable to tell respondents what retrieval Am J Epidemiol Vol. 146, No. 11, 1997 991 method they should use for greatest accuracy, but this research is in agreement with other studies that indicate that it is enormously difficult to get respondents to change the way they retrieve information. We do not say that it is impossible to do so, but we were unable to do it, even though our focus groups and think-aloud interviews had given us a good understanding of what methods respondents were actually using. The most beneficial aspects of this research enhance our understanding of the levels of accuracy and the need to allow for this in such studies. ACKNOWLEDGMENTS Supported by contract 200-91-7035 from the US Department of Health and Human Services, Centers for Disease Prevention and Control, National Center for Health Statistics. The authors acknowledge the generous cooperation of RUSH-Anchor Health Management Organization for making the subjects for this study accessible. Rene Bucio, Jeanaette Cunningham, and Veshane Smith did the abstracting. Dr. George Wilbanks participated in the formulation of the questions. Mary Kobialka did the quality review of the abstracting. The late Dr. Loretta Lacy conducted the focus groups, and Dr. Donald Hedaker and Sui Chi Wong assisted in the multivariate statistical analysis. REFERENCES 1. Schwarz N, Sudman S. Autobiographical memory and the validity of retrospective reports. New York, NY: SpringerVerlag, 1994. 2. Jobe JB, Mingay DJ. Cognitive and survey measurement: history and overview. Appl Cognitive Psychol 1991 ;5: 175-92. 3. Strack F, Martin LL. Thinking, judging.and communicating: a process account of context effects in attitude surveys. In: Hippler H-J, Schwarz N, Sudman S, eds. Social information processing and survey methodology. New York, NY: SpringerVerlag, 1987:123-48. 4. Tourangeau R. Attitude measurement: a cognitive perspective. In: Hippler H-J, Schwarz N, Sudman S, eds. Social information processing and survey methodology. New York, NY: Springer-Verlag, 1987:149-62. 5. Tourangeau R, Rasinski KA. Cognitive processes underlying context effects in attitude measurement. Psychol Bull 1988; 103:299-314. 6. Bradbum NM, Sudman S, associates. Improving interview method and questionnaire design: response effects to threatening questions in survey research. San Francisco, CA: Jossey-Bass, 1979. 7. Sudman S, Bradbum NM, Schwarz N. Thinking about answers: the application of cognitive processes to survey methodology. San Francisco, CA: Jossey-Bass, 1996. 8. Hastie R. Information processing theory for the survey researcher. In: Hippler H-J, Schwarz N, Sudman S, eds. Social information processing and survey methodology. New York, NY: Springer-Verlag, 1987:42-70. 9. Menon G. Judgments of behavioral frequencies: memory search and retrieval strategies. In: Sudman S, Bradbum NM, 992 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Wamecke et al. Schwarz N, eds. Thinking about answers: the application of cognitive processes to survey methodology. San Francisco, CA: Jossey-Bass, 1996:161-72. Blair E, Burton J. Cognitive processes used by survey respondents in answering behavioral frequency questions. J Consumer Res 1987;14:280-8. Brewer WF. Autobiographical memory and survey research. In: Schwarz N, Sudman S. Autobiographical memory and the validity of retrospective reports. New York, NY: SpringerVerlag, 1994:11-20. Wagenaar WA. My memory: a study of autobiographical memory over six years. Cognitive Psychol 1986;18:225-52. Means B, Swan GE, Jobe JB, et al. An alternative approach to obtaining personal history data. In Biemer PP, Groves RM, Lyberg LE, et al., eds. Measurement errors in surveys. New York, NY: John Wiley & Sons, 1991:167-83. Sudman S, Schwarz N. Contributions of cognitive psychology to advertising research. J Advertising Res 1989;29:43-53. Sudman S, Wamecke RB, Johnson TP, et al. Cognitive aspects of reporting cancer prevention examinations and tests. Rockville, MD: National Center for Health Statistics, 1994. (Vital and health statistics, Series 6, no. 7). (DHHS publication no. 94-1082). Gordon NP, Hiatt RA, Lampert DI. Concordance of selfreported data and medical record audit for six cancer screening procedures. J Natl Cancer Inst 1993;85:566-70. Brown JB, Adams ME. Patients as reliable reporters of the medical care process. Med Care 1992;30:400-11. Jobe JB, White AA, Kelley CL. Recall strategies and memory for health care visits. Milbank Q 1990;68:171-89. Loftus EF, Smith KD, Klinger MR, et al. Memory and mismemory for health events. In: Tanur J, ed. Questions about questions. New York, NY: Russell Sage, 1992:102-37. Sawyer JA, Earp JA, Fletcher RH, et al. Accuracy of women's self-report of their last Pap smear. Am J Public Health 1989; 79.1036-7. Bowman JA, Redman S, Dickinson JA, et al. The accuracy of Pap smear utilization self-report: a methodological consideration in cervical screening research. Health Services Res 1991;26:97-107. Paskett ED, Tatum C, Mack DW, et al. Validation of selfreported breast and cervical screening tests among lowincome minority women. Cancer Epidemiol Biomarkers Prev 1996;5:721-6. Hiatt RA, P6rez-Stable EJ, Quesenberry C, et al. Agreement between self-reported early detection practices and medical audits among Hispanic and non-Hispanic white health plan members in northern California. Prev Med 1995;24:278-85. Wamecke R, Graham S. Characteristics of blacks obtaining Pap smears. Cancer 1976;37:2015-25. Loftus EF, Klinger MR, Smith KD, et al. A tale of two questions: the benefits of asking more than one question. Public Opinion Q 1990;54:330-45. Michielutte R, Dignan M, Wells HB, et al. Errors in reporting cervical cancer among public health clinic patients. J Clin Epidemiol 1991;44:403-8. McKenna MT, Speers M, Mallin K, et al. Agreement between patient self-reports and medical records for Pap smear histories. Am J Prev Med 1992;8:287-91. Walter SD, Clarke EA, Hatcher J, et al. A comparison of 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. physician and patient reports of Pap smear histories. J Clin Epidemiol 1987:41:401-10. Hochstim JR. A critical comparison of three strategies of collecting data from households. J Am Stat Assoc 1967;62: 976-89. Peters RK, Dear MB, Thomas D. Barriers to screening of cancer of the cervix. Prev Med 1989;18:133-46. Wamecke RB. Intervention in black populations. In: Mettlin C, Murphy G, eds. Cancer among black populations. New York, NY: Alan R. Liss, 1981:167-83. Wamecke RB, Havlicek PL, Manfredi C. Awareness and use of screening by old-aged persons. In: Yancik R, ed. Perspectives on prevention and treatment of cancer in the elderly. New York, NY: Raven Press, 1983:275-87. King ES, Rimer BK, Trock B, et al. How valid are mammography self-reports? Am J Public Health 1990;80:1386-8. Dignan D, Harris R, Ranney J, et al. Measuring use of mammography: two methods compared. Am J Public Health 1992;82:1386-8. Burton S, Blair EA. Task conditions, response formulation processes, and response accuracy for behavioral frequency questions in surveys. Public Opinion Q 1991;55:50-79. Eisenhower D, Mathiowetz NA, Morganstein D. Recall error sources and bias reduction techniques. In: Biemer PP, Groves RM, Lyberg LE, et al., eds. Measurement errors in surveys. New York, NY: John Wiley & Sons, 1991:127-44. Tulving E. Elements of specific memory. Oxford, England: Clarendon Press, 1983. Herrmann DJ. The validity of retrospective reports as a function of directness of the retrieval processes. In: Schwarz N, Sudman S, eds. Autobiographical memory and the validity of retrospective reports. New York, NY: Springer-Verlag, 1994: 21-37. Demlo LK, Campbell PM, Brown SS. Reliability of information abstracted from patients' medical records. Med Care 1978;16:995-1005. Feigl P, Glaefke G, Ford L, et al. Studying patterns of cancer care: how useful is the medical record? Am J Public Health 1988:78:526-33. Romm F, Putnam S. The validity of the medical record. Med Care 1981;19:310-15. Sudman S, Bradburn NM. Response effects in surveys: a review and synthesis. Chicago, IL: Aldine, 1974. Hedeker D, Gibbons RD. A random-effects ordinal regression model for multilevel analysis. Biometrics 1994;50:933-44. Darnel WW. Applied nonparametric statistics. Boston, MA: Houghton-Mifflin, 1978. Czaja R, McFall S, Wamecke R, et al. Preferences of community physicians for cancer screening guidelines. Ann Intern Med 1994;120:602-8. Berstein MB, Thompson GB, Harlan LC. Differences in rates of cancer screening by usual source of care. Med Care 1991; 29:196-201. Kang SH, Bloom JR. Social support and cancer screening among older black Americans. J Natl Cancer Inst 1993;85: 737-42. Udvarhelyi IS, Jennison K, Phillips RS, et al. Comparison of quality of ambulatory care for fee-for-service and prepaid patients. Ann Intern Med 1991 ;115:394-400. Am J Epidemiol Vol. 146, No. 11, 1997
© Copyright 2026 Paperzz