Papanicolaou Smears, Clinical Breast

American Journal of Epidemiology
Copyright O 1997 by The Johns HopHna University School of Hygiene and Public Health
All rights reserved
Vol. 148, No. 11
Printed In U.SA.
Cognitive Aspects of Recalling and Reporting Health-related Events:
Papanicolaou Smears, Clinical Breast Examinations, and Mammograms
Richard B. Warnecke,1 Seymour Sudman,1-2 Timothy P. Johnson,1 Diane O'Rourke,1 Andrew M. Davis,3 and
Jared B. Jobe 4
This paper reports an examination of cognitive processes used by 178 women aged 50 years and older in
retrieving information about the frequency with which they received Papanicolaou smears, mammograms, and
clinical breast examinations. Women were selected from a health maintenance organization in which they had
been enrolled for at least 51/> years. The literature suggested that reporting of regular events such as these
kinds of tests is likely to be based on schemas, which is an estimation technique in which events are reported
in a format with generic content. Thus, if the procedure is believed to occur annually, the respondent will report
receiving five tests in 5 years. The study attempted to evaluate whether use of episodic recall, in which
respondents are forced to report individual events, would be more accurate than reports based on estimation
using a schema format. The results indicated that most of the errors occurred in Papanicolaou smear
reporting, which is consistent with the literature, and that the fewest errors occurred with mammograms.
Regardless of the questionnaire format, respondents persisted in using schemas based on the date of annual
physical examination. Most reporting errors occurred because the interval between examinations was estimated incorrectly. Am J Epidemiol 1997;146:982-92.
mammography; Papanicolaou smear; questionnaires; retrospective studies
construct histories of Papanicolaou smears, mammograms, and clinical breast examinations.
Epidemiologic research generally relies on selfreports or histories obtained via interview or questionnaire to acquire information about events or activities
that may have taken place within days, weeks, months,
or even many years prior to the study. Collection of
these histories relies heavily on respondents' recall,
and only rarely are there existing records available for
validation of whatever information the respondent provides. These retrospective interviews require the subjects to recall autobiographical information about the
nature of exposures or other events, the circumstances
under which they occurred, and, quite often, the date.
The focus of this paper is respondent recall in response
to retrospective questions. To illustrate the potential
impact of recall on histories obtained by interview, we
examine recall strategies used by respondents to re-
COGNITIVE THEORY AND RETROSPECTIVE
RECALL
In recent years, improvement of the accuracy of
self-report in surveys has received considerable attention by survey specialists who have joined with cognitive psychologists to apply the theories and methods
of cognitive psychology to question design. This collaboration has enhanced understanding of how respondents retrieve autobiographical information (1) and
how to design questions that improve the validity of
retrieved information (2).
It is now generally agreed upon by most researchers
who use cognitive methods to understand survey measurement error that there are four steps in the response
process (3-5): 1) interpretation of the question; 2)
retrieval of either the answer to the question or relevant information that will be used to construct an
answer, 3) judgment formation, during which the respondent uses recalled information to formulate an
answer; and 4) editing or deciding what judgments
formed in the preceding step will be disclosed to the
interviewer (6). Not every step is equally important for
all questions. The focus here is on step 2, retrieving
information.
Received for publication January 17, 1997, and in final form
September 19, 1997.
Abbreviation: HMO, health maintenance organization.
1
Survey Research Laboratory, College of Urban Planning and
Public Affairs, University of Illinois at Chicago, Chicago, IL
2
College of Commerce, University of Illinois at UrbanaChampaign, Urbana, IL
3
RUSH-Anchor Health Maintenance Organization, Chicago, IL
4
National Institute of Aging, National Institutes of Health, Department of Health and Human Services, Bethesda, MD.
Reprint requests to Dr. Richard B. Warnecke, Survey Research
Laboratory, University of Illinois at Chicago, 910 W. Van Buren,
Suite 500, M/C 336, Chicago, IL 60607.
982
Recalling and Reporting Health-related Events
There is a substantial body of research conducted by
both cognitive psychologists and survey researchers
on the processes of retrieving information from memory and the effects of question wording on the validity
of this retrospective reporting (1, 7). For example,
most cognitive psychologists believe that some kinds
of activities are not retrieved as individual events, but
as "schema." Schema are generalizations based on
experience and are recalled as patterns with generic
content rather than as specifically enumerated events
(8). The accuracy of information obtained from schemas depends on the regularity of the behavior (9) and
the length of time since the event occurred (10). Research suggests that response accuracy can be influenced by asking questions that deliberately evoke or
discourage the use of schemas.
Brewer (11) has made a strong case that the date of
an event is its least well-remembered attribute, relative
to such other aspects as where the event occurred, who
was present, and what took place (12, 13). It has been
found, for example, that in many instances the date on
which an event occurred may not even be incorporated
as part of the memory of that event. In such cases,
obtaining information about when an event occurred
may require the interviewer to use other aspects of the
event as cues to influence recall by ordering the questions about the better-remembered aspects of the event
to provide a context for helping the respondent recall
the date (10, 11, 14).
We present here findings from a study conducted for
the National Center for Health Statistics (15) of the
cognitive processes that older (50 years or more),
female respondents used to answer questions related to
their experiences with obtaining Papanicolaou smears,
clinical breast examinations, and mammograms. The
purpose was to ascertain ways to improve the validity
of self-report data about the frequency of obtaining
these procedures.
WEAKNESSES OF SELF-REPORT DATA FOR
PAPANICOLAOU SMEARS, CLINICAL BREAST
EXAMINATIONS, AND MAMMOGRAMS
Most of what is known about females' knowledge
about and frequency of receiving cancer screening
procedures has come from self-report, survey interviews, such as the National Health Interview Survey
and the Behavioral Risk Factor Surveillance System.
The accuracy of self-reports of cancer detection tests
obtained by interviews is still a topic of debate. To
examine the validity of these self-reports, several researchers have matched medical records with responses from self-report interviews.
Four response quality measures were calculated to
summarize respondent reports: 1) concordance, or the
Am J Epidemiol Vol. 146, No. 11, 1997
983
percentage of all respondents who reported that the
test was received or who reported "no test" that agreed
with the records (16). This ratio is also commonly
referred to as an indicator of "gross accuracy" (15) or
the "raw agreement rate" (17). 2) Sensitivity is the
number of correctly recalled tests divided by number
of cases with tests found in the medical record. In
previous research, measures that are operationally
identical to sensitivity have been referred to as measures of completeness (18, 19) or as the medical record
confirmation rate (17). 3) Specificity is the number of
correctly reported "no tests" divided by the number of
cases with no test found in the records. 4) Finally, the
report-to-records ratio is the percentage who reported
that they received the test divided by percentage of
those who reported that they had not received a test
plus false-negatives (those cases in which no test was
reported, but the records indicated that one was received). The report-to-records ratio is a measure of net
bias in test reporting (17).
Reporting accuracy of the Papanicolaou smear
As can be seen in table 1, the majority of the
self-report validation has been done on studies using
Papanicolaou smear data. Table 1 provides a summary
of 12 published studies that evaluated Papanicolaou
smear-reporting quality. These studies vary along several important dimensions, as shown in table 1. For
example, subjects have been identified from a variety
of sources, including random community samples
(20-24), health maintenance organization (HMO)
membership rolls (15, 16, 24, 25), public health clinics
(26), cancer tumor registries (27), and case-control
studies (28), and across diverse geographic locations.
In addition, subjects varied by age and ethnicity. They
were asked to recall their screening histories across a
variety of time intervals ranging from immediately
after a medical encounter to some time in the 6 years
after the test.
Sufficient information was nevertheless available
for us to calculate a comparable set of measures for
each study. Concordance of respondent reports of
Papanicolaou smear history ranged from 0.39
through 0.80 (weighted average = 0.65). Although
sensitivity was high (range = 0.61-0.97; weighted
average = 0.81), specificity was only moderate
(range = 0.19-0.76; weighted average = 0.40).
Report-to-records ratios indicate that receipt of
Papanicolaou smears was overreported in each
study (range = 1.21-3.32; weighted average =
2.10). Other Papanicolaou smear validation studies
are available that did not contain information necessary for inclusion in table 1 (27-32). The results
Study
(reference no.)
HMO
HMO
HMO
HMO
Random
HMO
Random low-Income housing
HMO
HMO
HMO
HMO
HMO
Random low-income housing
Random
HMO
Random/referred
Random
HMO*
Cervical dysplasia study controls
Tumor registry
Cervical cancer study controls
STD* clinic
Random
Sample
HMO, health maintenance organization; STD, sexually transmitted disease.
3,280
2,655
534
6,478
Weighted means
Papanlcolaou smear
Mammogram
Breast examination
AD examinations
1
172
178
371
456
84
441
537
178
199
380
386
178
Breast examination
Gordon etal. (16)
Loftus et al. (25)
Sudman etal. (15)
Mammogram
King et al. (33)
Brown and Adams (17)
Gordon etal. (16)
Dignan et al. (34)
Loftus et al. (25)
Paskett et al. (22)
Hiatt et al. (23)
Sudman etal. (15)
544
173
438
686
318
Michielutts at a). (26)
Wamecke and Graham (24)
Loftus et al. (25)
Paskett et al. (22)
Hiatt et al. (23)
Sudman etal. (15)
111
352
212
125
218
98
No.
Northern California
Washington State
Chicago, Illinois
Pennsylvania
Massachusetts
Northern California
North Carolina
Washington State
North Carolina
Northern California
Chicago, Illinois
Buffalo, New \brk
Washington State
North Carolina
Northern California
Chicago, Illinois
Los Angelas, CaTrfomla
North Carolina
Australia
Northern California
Ontario, Canada
Chicago, Illinois
Ontario, Canada
Location
Previous Papanlcolaou smear, clinical breast examination, and mammogram validation studtes
Papanicolaou smears
Sawyer etal. (20)
Bowman eteJ. (21)
Gordon etaJ. (16)
Walter et al. (28)
McKenna et al. (27)
Walter et al. (28)
TABLE 1.
2 years
6 months
6 years
1 year
1-2 months
2 years
1 year
6 months
1-2 years
2 years
6 years
3 years
3 years
2 years
1 year
3 years
1 year
After examination
Syoara
6 months
3 years
2 years
6 years
Recall period
0.81
0.68
0.84
0.70
0.78
0.65
0.86
0.73
0.97
0.71
0.70
0.90
0.77
0.51
0.84
0.70
1.00
0.92
0.98
0.99
0.61
0.76
0.90
0.95
0.93
0.97
0.89
0.96
0.72
0.97
0.71
0.97
0.87
0.84
0.81
0.54
0.58
0.78
0.78
0.78
0.75
0.65
0.59
0.47
0.39
0.80
Concordance SenatMty
0.51
0.40
1.39
0.70
1.65
2.10
1.42
2.07
1.26
1.03
2.34
0.84
0.91
0.30
0.81
0.22
1.06
1.16
1.21
1.49
1.21
1.21
1.25
1.44
1.72
2.22
3.32
2.10
1.78
1.42
3.16
1.51
Report-torecords
ratio
0.94
0.85
0.58
0.71
0.76
0.19
0.73
0.47
0.55
0.35
0.65
0.41
0.45
0.32
0.21
Spectficty
Recalling and Reporting Health-related Events
of these studies, however, are consistent with those
summarized table 1.
Reporting accuracy of the mammogram
A summary of findings from eight verified mammography studies is also presented in table 1. Note
that these study populations are far more homogeneous than those examined for Papanicolaou smear
reporting. Six of the eight mammography studies selected subjects from HMO rosters (15-17, 23, 25, 33);
two (22, 34) came from random community surveys.
These studies examined a range of recall intervals
(from 1-2 months to 6 years) similar to those in the
Papanicolaou studies reviewed, and all were published
during the 1990s.
Data from seven of diese studies allowed us to make
all the calculations in table 1. The data of Loftus et al.
(25) allowed only calculation of report-to-records ratio. Overall, the data from these studies showed higher
degrees of concordance (range = 0.51-0.97; weighted
average = 0.76), sensitivity (range = 0.70-1.00;
weighted average = 0.86), and specificity (range =
0.30-0.94; weighted average = 0.68) compared with
the Papanicolaou smear validation data. Although
overreporting was also a problem with mammography
self-reports, report-to-records ratios were smaller than
among those calculated for Papanicolaou smear data
(range = 0.84-2.34; weighted average = 1.39).
Reporting accuracy of the clinical breast
examination
Three studies have attempted to validate self-reports of
clinical breast examinations. Gordon et al. (16) reported
a concordance rate of 0.70 for this procedure, with high
sensitivity (0.97) but low specificity (0.22). Sudman et al.
(15) reported a similar concordance rate (0.73), a somewhat lower sensitivity (0.78), and significantly higher
specificity (0.70). As with mammography, the data of
Loftus et al. (25) allow only calculation of the report-torecords ratio. Unlike Papanicolaou smears and mammograms, medical record validation of clinical breast examinations is generally derived from physician notes as
opposed to radiology or pathology reports. As a result,
true reports may have an increased likelihood of being
classified as "false positives" if the physician failed to
record them. Consistent with findings for the other two
screening procedures, clinical breast examinations were
also likely to be overreported. The weighted report-torecords mean was 1.54.
PRELIMINARY STUDIES AND HYPOTHESES
We initially identified issues corresponding to each of
the four major cognitive process stages discussed above
Am J Epidemiol
Vol. 146, No. 11, 1997
985
that might be associated with reporting accuracy. We
examined three indicators of comprehension: 1) Do older
women know what the terms "Papanicolaou smear" and
"mammogram" mean? 2) Do they confuse procedures
such as clinical breast examination and mammograms?
3) Were any of the questions we planned to ask prone to
substantial misinterpretation?
We next examined three aspects of information retrieval: 1) How do older women retrieve information
on medical screening examinations? 2) Are tests
stored in discrete memory, or do respondents use
schema? 3) If schema are used, what are the schema?
Two dimensions of judgment formation were evaluated: 1) How do older women form judgments about
how often they are screened? 2) Do they count each
time, or do they estimate the frequency?
Finally, editing of responses was assessed by considering three issues: 1) How comfortable are older women
discussing their experiences with Papanicolaou smears
and mammograms? 2) Do they withhold sensitive personal information or overreport socially prescribed behavior? 3) Does the interviewer's gender influence their
comfort with the subject of the interview?
We conducted a series of pilot studies with women
employed by the University of Illinois at Chicago who
had been members of the HMO that was to be used for
this project for at least 5 years. This HMO had a policy
of providing Papanicolaou smears, mammograms, and
clinical breast examinations as part of the annual physical examination offered to all female members of the
HMO, so variation in opportunity was held constant.
The pilot studies are discussed in detail elsewhere
(15); however, they included two focus groups and 16
"think-aloud" or in-depth interviews.
The focus group participants were also selected by
race (black/white) to allow us to ascertain if there were
differences by race in the women's understanding of
these procedures; there were no differences. The content of focus group discussion addressed the need for
regular screening for cancer and the women's understanding of the nature of and recommended screening
interval for the Papanicolaou smear, mammogram, and
clinical breast examination. The focus group results
confirmed that these women knew what the procedures were and understood the recommended screening intervals. As part of the discussion, they were
asked whether the gender of the interviewer would
affect their comfort discussing Papanicolaou smears,
mammograms, or breast examinations. Gender of the
interviewer was not an issue.
Four investigators (T. P. J., D. O., S. S., and
R. B. W.) conducted 16 think-aloud interviews with
members of the selected HMO. Several questionnaire
formats were used. In one format, the questions were
986
Wamecke et al.
asked exactly as they are on the National Health Interview Survey; the other formats varied the way in
which the questions were asked. All versions were
followed up with requests for the respondents to think
aloud about how they arrived at the answers to the
questions, what they thought the questions meant, and
what kinds of information they thought about when
making their judgments about frequency. At the end of
the interview, each respondent was asked about her
level of comfort with discussing these procedures in
the interview.
With the patients' consent, we verified all reported
screening experience in their records at the HMO. We
asked respondents whether they received any procedures from other sources. As all respondents interviewed came from a single HMO and were selected
because they were long-term members, we are quite
confident that the respondents did not receive these
procedures from other sources.
The results of these preliminary tests indicated that
comprehension was not an issue. Women over age 50
years knew what these procedures were and understood the questions about frequency. There was some
confusion about the distinction between breast examinations and mammograms because most of the
women did not distinguish between them. When we
used the term "mammogram" first, followed by the
term "breast examination," the confusion was eliminated. Confusion about what constituted a physical
examination was resolved by using the phrase "a complete physical examination."
From the focus groups and the think-aloud responses to the questionnaire versions finally selected,
it was clear that few respondents were retrieving isolated individual screening examinations. Almost all
respondents relied on schemas (11). The reasons for
this are fairly clear. First, a majority of respondents
reported that the behavior being studied occurred regularly. (An examination of the records indicated that
actual behavior was less regular than what was remembered.) Second, 5 years is a long time period,
which makes it difficult to retrieve individual events.
Both regularity (9) and long time periods (35) have
been shown to lead to the retrieval of schema rather
than individual episodes.
By far the most common schema used, even when
unprompted, was to associate screening tests with annual complete physical checkups. The other major
schema was to associate receiving the test with serious
illnesses or conditions that required one or more of the
procedures for monitoring the condition or as part of
follow-up after the condition was corrected. It appeared to be easy for respondents to remember that
they had such tests and to count the number they had
received. Gynecologic examinations were much less
frequent than complete physical checkups, but when
they did occur, Papanicolaou tests and mammograms
were usually part of the process. Another schema that
was mentioned involved retests when the original results were inconclusive.
It has been found that memories are often stored in
selected subsets that are highly organized cognitively,
but are subject to errors of incompleteness and distortion (36). Although respondents appeared to have little
trouble retrieving schema related to screening tests,
there were some indications of possibly faulty retrieval
or incompleteness. For example, several respondents
reported getting regular annual checkups, but in later
discussion indicated that the time period between
checkups varied from 12 to 15 and sometimes even 18
months (incompleteness). A few respondents reported
that they knew they were supposed to get the test
annually, but skipped 1 or more years out of 5 for a
variety of reasons (distortion). However, regardless of
the accuracy, the schema comprised the basic structure
for estimating frequencies in most instances.
DESIGN OF THE MAIN STUDY
Objectives and scope
Upon completion of phase I, a field study was
designed and executed to assess the methods suggested by the pilot phase. Cognitive methods were
investigated to improve responses to questions about
health promotion and disease prevention, specifically
focused on accuracy of reporting Papanicolaou
smears, mammograms, and clinical breast examinations. The objectives of this research were to measure
how well the experience of receiving these procedures
is recalled by the respondents and to determine what
methods are most likely to elicit accurate recall and
reporting of having received them.
This paper examines three hypotheses that the pilot
research suggested would be most viable. 1) More
accurate reporting will be found for early cancer detection tests that are conducted separately rather than
as part of a physical examination because separate
events are more salient and easier to retrieve. 2) Controlling for frequency, more accurate reporting (less
over- and underreporting) will be found for respondents for whom the diagnostic tests are conducted
regularly rather than irregularly because regular events
are easier to schematize accurately. 3) More accurate
reporting will be found among respondents who use a
questionnaire form that activates schemas relating to
health events, physical checkups, and gynecologic examinations, compared with a questionnaire that asks
about each screening test separately.
Am J Epidemiol
Vol. 146, No. 11, 1997
Recalling and Reporting Health-related Events
Hypotheses 1 and 2 are based on characteristics of
the early detection procedures and the context in
which they occur. These are events that are not controllable by the researchers, although both regularity
and the context in which the tests are delivered can be
ascertained from existing data. Hypothesis 1 is based
on the idea that screening procedures that are delivered
separately are more likely than those that are delivered
as part of general physical examinations to be stored in
memory as unique events. This logic assumes that the
respondent is made aware of the event because of the
distinctiveness of the circumstances in which it occurred compared with events that are parts of familiar
processes and therefore are not distinct (37, 38).
Hypothesis 2 follows from the work of Menon (9)
and others who have suggested that the formation and
use of schemas to make estimates of one's behavior is
facilitated by the regularity of that behavior. The very
regularity of the behavior promotes and enhances recall. The context facilitates recall. This hypothesis is
distinct from hypothesis 1 because it addresses regular
behavior, whereas hypothesis 1 refers to irregular
behavior.
Hypothesis 3 is also derived from ideas about
schema formation. It asserts that schemas will be easier to retrieve and use in estimation if they have been
activated previously by questions that parallel the
schemas that most respondents use. (Questionnaires
are available from the investigators.)
TABLE2.
(n-178)
987
Demographic characteristics of respondent*
Characteristic
%
^ o (years)*
50-59
60-69
270
57.9
33.7
8.4
Rac«
Black
White
Hispanic
Asian/Pacific Islander
87.6
8.4
3.4
0.6
Education
Less than high school graduate
High school graduate
Some college
College graduate
15.2
34.8
25.8
24.2
Employment status
Employed full time
Employed part time
Not employed
71.9
5.6
22.5
Moan age, 59.6 years.
pie (106 subjects) was interviewed with a questionnaire that attempted to enumerate each screening test
separately (episodic version). Because only 178 respondents agreed to allow us to validate their reports,
86 were interviewed with the schematic version and 92
with the episodic version.
Target population
As with the pilot phase, women aged 50 years and
older who were members of the RUSH-Anchor HMO
for at least 5 years comprised the target population;
however, they did not have to be university employees. As with the pilot sample, we selected women in
this age group because they are the group most likely
to misreport having received a Papanicolaou smear
(24, 31, 32). Moreover, women over age 50 are the
primary target population for mammograms and breast
physical examinations. The characteristics of the population are presented in table 2. A total of 211 women
met the criteria for the study. However, of these, only
178 signed consent forms, allowing us to access their
records for validation. Thus, the sample size in table 2
is 178.
Questionnaire formats
To test the hypotheses, we randomly assigned half
of the sample (105 subjects) to be interviewed with a
questionnaire designed to encourage the use of schema
to recall screening, general checkups, and gynecologic
examinations (schematic version). The rest of the samAm J Epidemiol
Vol. 146, No. 11, 1997
Medical record abstraction
The sources of specific data for each eligible patient
included physician progress notes, cytology reports,
laboratory reports, and radiology reports. Information
collected about events in the previous 5>/2 years included 1) total number of medical visits; 2) frequency,
dates, and results of Papanicolaou smears; 3) frequency, dates, and results of mammograms; 4) frequency, dates, and results of clinical breast examinations; and 5) whether a full or partial hysterectomy
was ever performed. The matching between the record
and the questionnaire was done by a medical records
specialist. Initially, matching was done regardless of
discrepancies by date. Once a match was established,
the difference in months between the record date and
the interview date was computed if a date was given
on the interview. Reports of frequency of procedure
were classified as "regular," "irregular," or "not received."
We followed the procedure of relying on the medical record as the standard, generally adopted in validating self-reports. However, we recognize that many
who rely on records have found them to be incomplete
988
Wamecke et al.
and not totally reliable (39-42). A record reabstraction study was undertaken to evaluate whether complete information was abstracted from medical
records. A credentialed medical records technician
employed by Survey Research Laboratory reabstracted
20 records. Overall, there was 98 percent agreement
between the initial and the reabstracted data, indicating a very high level of abstractor reliability.
A self-reported instance of screening was considered valid if a procedure was found in the chart and
was dated within ± 3 months of the date reported by
the respondent.
RESULTS OF THE MAIN STUDY
In formally testing the three hypotheses discussed in
the paper, we relied both upon intuitive interpretation
of the measures using bivariate analyses and on random effects logistic regression models (43). Thus, for
each hypothesis the analyses will begin with bivariate
analyses, and then the final evaluation of the hypothesis will use a random effects model. Random effects
statistical modeling is used to adjust for the fact that
the observations in these analyses are not independent
of one another. In fact, most of the 178 respondents
contributed 18 self-reports (three screening procedures
for each of 5'/2 years) to die analysis. The clustering in
these data therefore requires an analytic approach capable of testing our hypotheses while simultaneously
modeling this dependency.
Our basic finding is that respondents overreported receiving screening tests regardless of the format in which
the question was asked. Over the Stt-year period of
interest (January 1987 to June 1992), the relative overreporting was 29 percent for all tests combined (16 percent for mammograms, 26 percent for clinical breast
examinations, and 51 percent for Papanicolaou smears).
Most of the error was in false-positive reports, which
averaged 16 percent, compared with omissions, which
averaged 5 percent The null hypothesis that memory
errors were unbiased (i.e., the rate of false reports equals
the rate of omissions) was tested by the McNemar test for
related samples (44) on each type of examination for
each year. The null hypothesis was rejected (p < 0.005),
since overreporting was significandy greater than underreporting in all reporting years for Papanicolaou smears
and for 3 of 5x/i reporting years for clinical breast examinations and mammograms.
The effects of time lend further support for the
general use of schemas. Except for in the partial data
for 1992, there was no statistical effect of time in the
bivariate data on false reports (range, 14.1-21.3) during the years studied. Another indication that time is
not important is seen in the consistent concordance
scores over the study period (range, 0.74-0.77), again
with the exception that the partial data obtained in
1992 showed a the higher rate (0.91) for the partial
year. These data are consistent with the basic finding
of the pervasive use of schemas. Respondents reported
that they received regular, annual physical examinations that included the procedures and forgot possible
exceptions.
Null model
For testing our hypotheses in these regression analyses, summarized in table 3, we have selected concordance as the dependent variable of interest. The null
model is presented as model 1 of table 3. This model
estimates the odds ratio and 95 percent confidence
interval for one parameter, reporting year. Variability
associated with random, person-specific effect was
also assessed in the model and was significant. This
result indicates that there is a clustering effect within
these data (the intracluster correlation in the model is
0.10,/? > 0.001). The results of the null model indicate
that concordance improved with time (due in large part
to inclusion of partial data for 1992, as expected based
TABLE 3. Odds ratios (and confidence intervals) for random effects logistic regression equations
predicting respondent-record concordance (1 • yes)
Model
35
1t
Report year
Papanicolaou smear (1 - yes)
Breast examination (1 - yes)
Test regularity (1 - yes)
Odds
ratio
95%
confidence
Interval
1.17**
1.13-1.22
Odds
ratio
95%
confidence
Interval
1.19**
0.64*
0.48**
1.14-1.23
0.51-0.79
0.38-O.59
Odds
ratio
95%
confidence
Interval
1.19**
0.64**
0.50**
1.32*
1.14-1.23
0.50-0.81
0.40-0.62
1.05-1.67
* p < 0.01; * * p < 0.001.
t Model 1, nuD model.
i Model 2, effects of type of test
§ Model 3, effects of regularity.
Am J Epidemiol
Vol. 146, No. 11, 1997
Recalling and Reporting Health-related Events
on the bivariate data) as shown by the significant odds
ratio for report year in model 1 of table 3. This model
is the baseline for evaluating each of our hypotheses.
Hypothesis 1: effects of type of test
Hypothesis 1 proposes that tests conducted separately will be better remembered. For this sample,
almost all Papanicolaou smears and breast examinations were conducted as parts of the physical or gynecologic examination. The exceptions were too few to
analyze. Mammograms were conducted at a separate
location across the street from the clinic, and in addition, there was some variation in when this test is
done. For hypothesis 1, we compared mammograms
that were done elsewhere with Papanicolaou smears
and breast examinations usually done at the time of the
physical.
Hypothesis 1 was confirmed using report-to-records
ratio (mammograms, 1.16; clinical breast examinations, 1.26; and Papanicolaou smears, 1.51), concordance (mammograms, 0.84; clinical breast examinations, 0.73; and Papanicolaou smears, 0.78), and
specificity (mammograms, 0.81; clinical breast examinations, 0.70; and Papanicolaou smears, 0.73). Another difference is in the percentage of false reports,
which averaged 18.7 percent for Papanicolaou smears
and breast examinations and 11.1 percent for mammograms. These results are consistent over time.
For the analysis using the random effects logistic
regression model, dummy variables representing concordance in Papanicolaou smear and clinical breast
examinations were added to the model to contrast the
degree of concordance with that observed for mammograms. The significant odds ratios in table 3, model
2, confirm the effects observed in the bivariate data, in
that mammograms were reported with greater accuracy than either of the other procedures. Moreover, the
model chi-square improved significantly (A^2 = 43.6,
df = 2). The clustering effects were still present in the
model.
Hypothesis 2: effects of regularity
Hypothesis 2 returns to the effects of schema and
time on the accuracy of recall. Hypothesis 2 postulated
more accurate reporting among respondents for whom
the diagnostic tests are conducted regularly rather than
irregularly, since regular events are easier to schematize accurately. This hypothesis is supported.
Testing hypothesis 2 required that the term "regularity" be defined. Respondents were classified as regularly receiving the procedure if there was evidence in
the medical record that they were tested over some
regular interval (every year, every other year) or if
Am J Epidemiol
Vol. 146, No. 11, 1997
989
they received examinations during 4 of the 5 years
included in the study. All other respondents were
classified as irregular, except for those with no medical record evidence of cancer screening examinations
during the 5-year study period, who were excluded
from the comparison of regular and irregular respondents.
Bivariate results reveal that the major difference
between regular and irregular test recipients is that the
percentage of false reports was higher among irregular
test recipients (23.0 percent compared with 11.0 percent for Papanicolaou smears, 23.2 percent vs. 6.9
percent for breast examinations, and 14.0 compared
with 7.1 percent for mammograms) than among regular recipients, respectively.
Overall, the net biases (report-to-records ratios)
were far smaller for respondents who received tests
regularly for each procedure. They were 8 percent for
Papanicolaou smears, —18 percent for breast examinations, and only 1 percent for those who received
mammograms regularly. On the other hand, the net
biases for those who were tested irregularly ranged
from 31 percent for those who had mammograms
irregularly to 90 percent among those who received
Papanicolaou smears irregularly. Review of the concordance measures supports the finding of Menon (9)
that regularity increases accurate reporting.
Model 3 in table 3 provides an overall test of the
effect of obtaining tests regularly on concordance,
controlling for test type. Both the regression estimate
for this variable and the difference in chi-square likelihood ratio (A^2 = 81.4, df = 1) for the overall model
indicate that test regularity improves the accuracy of
reporting cancer screening tests.
Hypothesis 3: effects of questionnaire format
Hypothesis 3 proposed that more accurate reporting
would be found for respondents who used a questionnaire form that activates schema related to health
events, physical checkups, and gynecologic examinations compared with a questionnaire that asks about
each of the screening tests separately. The results do
not support this hypothesis. For example, the ratios of
reporting to records by form range from 1.26 to 1.33.
These differences are not statistically significant, nor
do they have any practical importance. There was no
consistent pattern of superiority of either questionnaire
form across all three screening tests or over years. In
retrospect, it appears that the treatment was ineffective
because most respondents used schema in answering
the questions regardless of the form. To put it another
way, use of schemas was not influenced by the form of
the questionnaire; they were already activated simply
990
Wamecke et a).
by the topic. A random effects model not presented
here confirmed this finding (15).
DISCUSSION AND RECOMMENDATIONS
On the basis of phase I focus groups and phase II
interviews, most women regularly used schema to
report their tests, such as "I get a mammogram every
year" or "I get a Papanicolaou smear along with my
annual physical." We thought that we could affect this
use of schema by revising the questionnaire format,
but we were unsuccessful, as shown by the results of
the test of hypothesis 3. It was clear that the specific
events in question were imbedded in the recall of the
routine physical examination experience and could not
be recalled independent of that key, initiating experience.
Schemas do not necessarily provide poor estimates.
For very regular behavior, schemas may provide better
estimates than would efforts to remember individual
episodes. Schemas can, however, result in overstatements of behavior when respondents forget occasions
when the regular behavior is interrupted. That appears
to be the case for all of the health care behaviors
studied. As noted above, respondents, on average,
overstated receiving health care procedures by 29 percent compared with the records. The overstatement
ranged from 16 percent for mammograms to 51 percent for Papanicolaou smears. One consequence of
using schemas was the small differences in the levels
of false positives by year. Researchers typically observe sharp increases in underreporting caused by forgetting due to longer recall periods, as was observed in
the data we presented on false negatives. Since the
false positives are far larger, they dominate the net
results.
When the respondent reported regularly receiving
the procedures, using schema facilitated recall, as
shown by the positive results of our test of hypothesis
2; when they received the results irregularly there was
less accuracy. The cognitive explanation of this finding makes it almost tautological. We have shown that
most respondents use regularity schema in reporting
about these tests. Those respondents whose records
indicate that, indeed, the tests were received regularly
would certainly have a lower level of false reports than
would respondents who did not receive the tests regularly but who thought they did.
It is also of interest to note that there really was very
little decay in recall associated with the length of the
recall period. The effects of time seemed to be entirely
due to the effect of including the partial data for 1992.
For screening procedures, the important issue is
whether the respondent is getting them according to
the recommended intervals. The current recommenda-
tions for Papanicolaou smear, clinical breast examination, and mammogram vary, but for the Papanicolaou
smear, where inaccurate reporting is most serious, the
recommendation is once every 3 years after two consecutive negative smears (45). Hence, knowing
whether the respondent received a Papanicolaou smear
once a year or two to three times during a 5-year
interval is immaterial, since either pattern is consistent
with the recommendations. Thus, one needs to think
about what level of recall and length of recall period
are necessary when taking such histories (35), and the
question of what is sufficient needs to be considered.
On the basis of our results, we did not recommend any
changes in the format of the questions currently used
in the National Health Interview Survey to obtain
information about Papanicolaou smears, mammograms, and breast examinations. Despite the error, the
data obtained are sufficient to answer the question
regarding regularity.
Another finding that is relevant to the reliance on
schema may be that the assumption that membership
in an HMO always means a complete annual checkup,
linked to these screening procedures, may not always
hold. The changes in health care delivery may create
increasing problems in tracking such behavior as the
US medical care system becomes more competitive
and pressed for time. Some of the HMO physicians
noted that when they knew patients would visit several
times a year for chronic care, they would be more
likely to conduct a physical examination and preventive services over several visits. If these screening
procedures are not delivered with the annual physical,
it will create additional obstacles for accurate patient
recall of these events. Moreover, should changes in
insurance coverage require a change of HMO, there
might not be consistent record data over time, since
the patients' records may not follow changes in HMO
due to employer contracting procedures.
Future research, particularly in non-HMO populations, would be desirable. Studies should test the assumptions regarding the meaning of regularity. It may
also be that Papanicolaou smears, the most overreported procedure, are not really offered to women
annually but that regularity is actually defined by a
less frequent interval.
Neither questionnaire form changed the way that
women retrieved information. This may be due to the
homogeneity of the patients selected from one HMO
where the care is delivered in a uniform way. We had
speculated that women who had hysterectomies would
report Papanicolaou smears more accurately. They did
not report more accurately, although, of course, they
reported fewer tests. Nevertheless, the percentage of
false positives and false negatives did not differ beAm J Epidemiol
Vol. 146, No. 11, 1997
Recalling and Reporting Health-related Events
tween women who did and those who did not have
hysterectomies.
It should also be pointed out that nothing in our
study rules out the possibility that some of the overstatement of receipt of procedures was caused by the
perceived social desirability of regularly receiving
early cancer detection tests. Social desirability would
not necessarily imply that respondents deliberately
falsified their answers. It could also be that respondents who may have been uncertain about whether
they had a procedure every year said that they did
because they knew that this was what they should have
done. This recall strategy is consistent with the generic
nature of schema content (1).
Our sample was desirable because of our ability to
validate information from records. It may not be representative of the general public, however, since HMO
patients may be more likely to receive cancer screening procedures than are patients receiving care from
fee-for-service physicians (46-48). It is also possible
that membership in an HMO results in more regular
behavior and greater use of schemas than is found in
the general population of women over age 50 years.
Finally, as we used a small sample, it is also possible
that with larger numbers some of the differences we
predicted would have been statistically significant.
BROADER IMPLICATIONS FOR COGNITIVE
RESEARCH ON SURVEY RESPONSE IN
EPIDEMIOLOGIC RESEARCH
Aside from a better understanding of how women
report on these three cancer screening tests, the results
also have implications for future research on cognitive
aspects of taking medical histories. First, these results
strongly suggest that respondents are likely to use
schema in reporting about health behavior even when
the total number of events is small if they perceive the
events as regular. The use of schema is even more
likely if respondents are asked about less recent
events. One might also expect that the likelihood of
forgetting exceptions would increase with longer time
periods, but we saw no evidence of it in this study.
When schema were used, our results showed that the
order in which questions about details of an event were
asked had no effect on the accuracy of reporting that
the event occurred. As many epidemiologically relevant events occur early in life, taking histories that
cover lifetime exposures may be especially susceptible
to recall error. Moreover, we examined regularly occurring events; irregular events that occur early in a
lifetime may present different and more complex problems.
Obviously, if we could predict when it would help,
it would be desirable to tell respondents what retrieval
Am J Epidemiol
Vol. 146, No. 11, 1997
991
method they should use for greatest accuracy, but this
research is in agreement with other studies that indicate that it is enormously difficult to get respondents to
change the way they retrieve information. We do not
say that it is impossible to do so, but we were unable
to do it, even though our focus groups and think-aloud
interviews had given us a good understanding of what
methods respondents were actually using. The most
beneficial aspects of this research enhance our understanding of the levels of accuracy and the need to
allow for this in such studies.
ACKNOWLEDGMENTS
Supported by contract 200-91-7035 from the US Department of Health and Human Services, Centers for Disease Prevention and Control, National Center for Health
Statistics.
The authors acknowledge the generous cooperation of
RUSH-Anchor Health Management Organization for making the subjects for this study accessible. Rene Bucio,
Jeanaette Cunningham, and Veshane Smith did the abstracting. Dr. George Wilbanks participated in the formulation of
the questions. Mary Kobialka did the quality review of the
abstracting. The late Dr. Loretta Lacy conducted the focus
groups, and Dr. Donald Hedaker and Sui Chi Wong assisted
in the multivariate statistical analysis.
REFERENCES
1. Schwarz N, Sudman S. Autobiographical memory and the
validity of retrospective reports. New York, NY: SpringerVerlag, 1994.
2. Jobe JB, Mingay DJ. Cognitive and survey measurement:
history and overview. Appl Cognitive Psychol 1991 ;5:
175-92.
3. Strack F, Martin LL. Thinking, judging.and communicating: a
process account of context effects in attitude surveys. In:
Hippler H-J, Schwarz N, Sudman S, eds. Social information
processing and survey methodology. New York, NY: SpringerVerlag, 1987:123-48.
4. Tourangeau R. Attitude measurement: a cognitive perspective.
In: Hippler H-J, Schwarz N, Sudman S, eds. Social information processing and survey methodology. New York, NY:
Springer-Verlag, 1987:149-62.
5. Tourangeau R, Rasinski KA. Cognitive processes underlying
context effects in attitude measurement. Psychol Bull 1988;
103:299-314.
6. Bradbum NM, Sudman S, associates. Improving interview
method and questionnaire design: response effects to threatening questions in survey research. San Francisco, CA:
Jossey-Bass, 1979.
7. Sudman S, Bradbum NM, Schwarz N. Thinking about
answers: the application of cognitive processes to survey
methodology. San Francisco, CA: Jossey-Bass, 1996.
8. Hastie R. Information processing theory for the survey researcher. In: Hippler H-J, Schwarz N, Sudman S, eds. Social
information processing and survey methodology. New York,
NY: Springer-Verlag, 1987:42-70.
9. Menon G. Judgments of behavioral frequencies: memory
search and retrieval strategies. In: Sudman S, Bradbum NM,
992
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Wamecke et al.
Schwarz N, eds. Thinking about answers: the application of
cognitive processes to survey methodology. San Francisco,
CA: Jossey-Bass, 1996:161-72.
Blair E, Burton J. Cognitive processes used by survey respondents in answering behavioral frequency questions. J Consumer Res 1987;14:280-8.
Brewer WF. Autobiographical memory and survey research.
In: Schwarz N, Sudman S. Autobiographical memory and the
validity of retrospective reports. New York, NY: SpringerVerlag, 1994:11-20.
Wagenaar WA. My memory: a study of autobiographical
memory over six years. Cognitive Psychol 1986;18:225-52.
Means B, Swan GE, Jobe JB, et al. An alternative approach to
obtaining personal history data. In Biemer PP, Groves RM,
Lyberg LE, et al., eds. Measurement errors in surveys. New
York, NY: John Wiley & Sons, 1991:167-83.
Sudman S, Schwarz N. Contributions of cognitive psychology
to advertising research. J Advertising Res 1989;29:43-53.
Sudman S, Wamecke RB, Johnson TP, et al. Cognitive aspects
of reporting cancer prevention examinations and tests. Rockville, MD: National Center for Health Statistics, 1994. (Vital
and health statistics, Series 6, no. 7). (DHHS publication no.
94-1082).
Gordon NP, Hiatt RA, Lampert DI. Concordance of selfreported data and medical record audit for six cancer screening
procedures. J Natl Cancer Inst 1993;85:566-70.
Brown JB, Adams ME. Patients as reliable reporters of the
medical care process. Med Care 1992;30:400-11.
Jobe JB, White AA, Kelley CL. Recall strategies and memory
for health care visits. Milbank Q 1990;68:171-89.
Loftus EF, Smith KD, Klinger MR, et al. Memory and mismemory for health events. In: Tanur J, ed. Questions about
questions. New York, NY: Russell Sage, 1992:102-37.
Sawyer JA, Earp JA, Fletcher RH, et al. Accuracy of women's
self-report of their last Pap smear. Am J Public Health 1989;
79.1036-7.
Bowman JA, Redman S, Dickinson JA, et al. The accuracy of
Pap smear utilization self-report: a methodological consideration in cervical screening research. Health Services Res
1991;26:97-107.
Paskett ED, Tatum C, Mack DW, et al. Validation of selfreported breast and cervical screening tests among lowincome minority women. Cancer Epidemiol Biomarkers Prev
1996;5:721-6.
Hiatt RA, P6rez-Stable EJ, Quesenberry C, et al. Agreement
between self-reported early detection practices and medical
audits among Hispanic and non-Hispanic white health plan
members in northern California. Prev Med 1995;24:278-85.
Wamecke R, Graham S. Characteristics of blacks obtaining
Pap smears. Cancer 1976;37:2015-25.
Loftus EF, Klinger MR, Smith KD, et al. A tale of two
questions: the benefits of asking more than one question.
Public Opinion Q 1990;54:330-45.
Michielutte R, Dignan M, Wells HB, et al. Errors in reporting
cervical cancer among public health clinic patients. J Clin
Epidemiol 1991;44:403-8.
McKenna MT, Speers M, Mallin K, et al. Agreement between
patient self-reports and medical records for Pap smear histories. Am J Prev Med 1992;8:287-91.
Walter SD, Clarke EA, Hatcher J, et al. A comparison of
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
physician and patient reports of Pap smear histories. J Clin
Epidemiol 1987:41:401-10.
Hochstim JR. A critical comparison of three strategies of
collecting data from households. J Am Stat Assoc 1967;62:
976-89.
Peters RK, Dear MB, Thomas D. Barriers to screening of
cancer of the cervix. Prev Med 1989;18:133-46.
Wamecke RB. Intervention in black populations. In: Mettlin
C, Murphy G, eds. Cancer among black populations. New
York, NY: Alan R. Liss, 1981:167-83.
Wamecke RB, Havlicek PL, Manfredi C. Awareness and use
of screening by old-aged persons. In: Yancik R, ed. Perspectives on prevention and treatment of cancer in the elderly.
New York, NY: Raven Press, 1983:275-87.
King ES, Rimer BK, Trock B, et al. How valid are mammography self-reports? Am J Public Health 1990;80:1386-8.
Dignan D, Harris R, Ranney J, et al. Measuring use of
mammography: two methods compared. Am J Public Health
1992;82:1386-8.
Burton S, Blair EA. Task conditions, response formulation
processes, and response accuracy for behavioral frequency
questions in surveys. Public Opinion Q 1991;55:50-79.
Eisenhower D, Mathiowetz NA, Morganstein D. Recall error
sources and bias reduction techniques. In: Biemer PP, Groves
RM, Lyberg LE, et al., eds. Measurement errors in surveys.
New York, NY: John Wiley & Sons, 1991:127-44.
Tulving E. Elements of specific memory. Oxford, England:
Clarendon Press, 1983.
Herrmann DJ. The validity of retrospective reports as a function of directness of the retrieval processes. In: Schwarz N,
Sudman S, eds. Autobiographical memory and the validity of
retrospective reports. New York, NY: Springer-Verlag, 1994:
21-37.
Demlo LK, Campbell PM, Brown SS. Reliability of information abstracted from patients' medical records. Med Care
1978;16:995-1005.
Feigl P, Glaefke G, Ford L, et al. Studying patterns of cancer
care: how useful is the medical record? Am J Public Health
1988:78:526-33.
Romm F, Putnam S. The validity of the medical record. Med
Care 1981;19:310-15.
Sudman S, Bradburn NM. Response effects in surveys: a
review and synthesis. Chicago, IL: Aldine, 1974.
Hedeker D, Gibbons RD. A random-effects ordinal regression
model for multilevel analysis. Biometrics 1994;50:933-44.
Darnel WW. Applied nonparametric statistics. Boston, MA:
Houghton-Mifflin, 1978.
Czaja R, McFall S, Wamecke R, et al. Preferences of community physicians for cancer screening guidelines. Ann Intern
Med 1994;120:602-8.
Berstein MB, Thompson GB, Harlan LC. Differences in rates
of cancer screening by usual source of care. Med Care 1991;
29:196-201.
Kang SH, Bloom JR. Social support and cancer screening
among older black Americans. J Natl Cancer Inst 1993;85:
737-42.
Udvarhelyi IS, Jennison K, Phillips RS, et al. Comparison of
quality of ambulatory care for fee-for-service and prepaid
patients. Ann Intern Med 1991 ;115:394-400.
Am J Epidemiol
Vol. 146, No. 11, 1997