Structure and strategy in the associative false memory paradigm

MEMORY, 2001, 9 (3), 145–163
Structure and strategy in the associative false
memory paradigm
Lisa K. Libby and Ulric Neisser
Cornell University, USA
List-learning experiments can have several levels of structure: individual words, the gist (if any) of each
list, and the task in which those lists are embedded. The usual presentation of the DRM associative
paradigm (Deese, 1959; Roediger & McDermott, 1995) strongly encourages a focus on gist and produces a
high rate of false recall of key words (FRK). The experiments reported here were designed to invite the
use of memory strategies based on structures other than the gist and thus reduce FRK. The crucial
condition of Experiment 1, short lists followed by rehearsal, encouraged a focus on individual words and
produced a low rate of FRK. In Experiment 2, the lists were embedded in a guessing game, which virtually
eliminated FRK. FRK was also low in Experiments 3a and 3b when participants engaged in a complex task
involving the first letters of list words. The relevance of these findings to false memories in the DRM and
the connection of false autobiographical memories is discussed.
The DRM paradigm, originally devised by Deese
(1959) and developed further by Roediger and
McDermott (1995), has received much attention
recently due to the robust false memory effect it
creates. Participants in a DRM experiment hear
word lists, each composed of 12 to 15 common
associates of a single non-presented key word: the
list based on the key word, sleep, includes bed, rest,
awake, tired, dream, etc. but not sleep itself. On
subsequent memory tests, participants often recall
(e.g., McDermott, 1996: Payne, Elie, Blackwell, &
Neuschatz 1996; Robinson & Roediger 1997) or
recognise (e.g., Israel & Schacter, 1997; Mather,
Henkel, & Johnson, 1997; Payne et al., 1996) the
non-presented key word. The effect is a strong
one; notably, it is not extinguished by attempts to
warn participants (Gallo, Roberts, & Seamon,
1997; Lampinen, Neuschatz, & Payne, 1997;
McDermott & Roediger, 1998; Neuschatz &
Payne, 1996). One interpretation of the DRM
effect is based on the associative nature of the
lists. From this perspective, false memories of the
key word arise because the key word is repeatedly
activated by the presented list items; it is the total
associative strength of the individual list items
together that predicts the likelihood of falsely
recalling the key word (Robinson & Roediger,
1997; Roediger & McDermott, 1995).
However, another aspect of the paradigm has
also been cited as integral to producing the false
memory effect: the gist of each list. The idea is that
false recall of the key word occurs when participants use a gist representation as the basis for
remembering (Brainerd & Reyna, 1998; Melo,
Winocur, & Moscovitch, 1999; Payne et al., 1996;
Schacter, Verfaellie, & Pradere, 1996). There are
several pieces of evidence that support this claim.
Amnesic patients with damage to the medial
temporal lobe but intact frontal lobes would be
able to extract gist from a DRM list, but not retain
memory for individual items. Indeed, such
patients were impaired at recalling studied words
from DRM lists but were more likely to falsely
recall key words than were controls (Melo et al.,
1999). Norman and Schacter (1997) compared the
performance of younger and older adults in the
DRM task. Older adults falsely recognised key
words at a significantly greater rate than did
Requests for reprints should be sent to Lisa K. Libby, Department of Psychology, Uris Hall, Cornell University, Ithaca, New York
14850, USA. Email: [email protected]
# 2001 Psychology Press Ltd
http://www.tandf.co.uk/journals/pp/09658211.html
DOI:10.1080/09658210042000085
146
LIBBY AND NEISSER
younger adults, a result attributed to age-related
changes in memory: with age, the ability to recall
specific details of previously studied items
declines and there is greater reliance on memory
for gist. Kensinger and Schacter (1999) found that
older adults show increases in veridical recall, but
not decreases in false recall with repeated presentation of DRM lists. The older adults’ continual focus on gist appears to have been
responsible for their persistent false recall:
younger adults, who were able to capitalise on the
opportunity to gain more item-specific knowledge
with repeated presentation of the lists, showed a
decrease in false recall with repeated presentation, as well as an increase in veridical recall (see
also McDermott, 1996). Other experiments
focusing on younger adults show that, in general,
false memories of the key words are less likely
under conditions that make the gist less prominent: when DRM lists are mixed together (Mather
et al., 1997; McDermott, 1996), when pictures are
presented along with list words thus allowing
participants to use memory of a particular drawing
as a recognition criterion (Israel & Schacter,
1997), when memories of individual words are
examined extensively (Lampinen et al., 1997;
Mather et al., 1997), or in incidental learning
(Tussing & Greene, 1997). These various lines of
research all converge on the idea that false
memories of the key word are common under
conditions in which gist is the basis for memory,
but less common otherwise.
Our purpose in this paper is to show that the
structure of the DRM list-learning situation can
influence participants’ readiness to use a gistbased memory strategy, and hence affect the rate
of false recall. Consider the strategies available in
a typical DRM situation. Confronted with 12 or
more words to remember, participants have little
choice but to look for some kind of intra-list
structure: verbatim memory cannot accommodate
a list of this length. Moreover, the presentation of
so many semantically related words makes the gist
of the list quite obvious. Those participants who
notice it may well adopt a gist strategy, accepting
any gist-related word that comes to mind as a
legitimate list member—a strategy that can easily
produce false recall of the key word. With changes
in the experimental situation, other strategies may
become available. Results from our pilot experiments suggest that even very small changes in the
structure of the situation can have such an effect.
A pilot experiment in which all of the DRM lists
were five words long produced false recall of the
key word on only a small proportion of lists
(M = .08, SD = .10). However, the same five-word
lists produced a much higher rate of false recall
(M = .27, SD = .20) in another pilot experiment
that differed only in that the five-word lists were
intermixed with DRM lists of seven and nine
words. One reason for the difference in false recall
rates between the two experiments may be the
different strategies that the experimental situations made available. If it is clear that all lists are
five words long, a strategy of rehearsing the entire
list in working memory is always viable. If the
length of a list is not known when it begins, participants may be more likely to focus on gist than
they would otherwise: the list may turn out to be
too long for a working memory strategy.
The three experiments we report here were
designed to vary the memory strategies made
possible by the structure of the DRM situation.
We predict that false recall should be much
reduced in situations that divert attention from
the gist and encourage different encoding and
retrieval strategies. One way to do this is to divert
participants’ attention away from the gist by
focusing attention on the individual words of the
list. A verbatim memory strategy focused on those
words in particular should reduce false recall of
the key word (because it is not among them).
Another very different way to reduce reliance on
the gist is to direct participants’ attention towards
a competing higher-order structure by manipulating the task in which the list is embedded. For
example, consider the list; candles, frosting, ice
cream, party hats, plates, forks, spoons, streamers,
balloons, confetti. The gist of this list is something
like birthday party; in the DRM paradigm, false
reports of the key word, cake, should occur rather
frequently. However, encountering the list in
another situation might lead to a very different
pattern of recall. Suppose you and a friend are
throwing a birthday party for someone. You go to
your friend’s house to prepare for the party and
find him in the middle of baking the cake. He says,
‘‘Why don’t you go to the store and get the rest of
the stuff we need?’’ and proceeds to rattle off the
aforementioned list. Although all of the words on
the shopping list are associates of cake, when you
get to the store you will not falsely remember that
your friend told you to buy a cake: you know a
cake is already made, and you know that the party
only needs one cake. Even if the word, cake, were
to pop into your mind, you would not consider this
as an indication that your friend had said to buy a
cake. Your understanding of the situation in which
THE ASSOCIATIVE FALSE MEMORY PARADIGM
you heard the shopping list precludes the possibility that cake would have been on it. This
example makes the point that knowledge about
the higher-order structure of a situation can
influence people’s memories of information
acquired in that situation.
To test our hypothesis about the effect of
situational structure on memory strategies and
false recall with DRM lists, we conducted three
experiments, each designed to shift the basis of the
memory strategy away from the gist structure of
the lists—either downward to individual words
(Experiment 1) or upward to the task in which the
lists were embedded (Experiments 2 and 3a/b). In
all of the experiments we expected that directing
attention away from the gist would dramatically
reduce false recall of key words. In the general
discussion we consider how our results relate to
other theories about the role of gist in DRM false
memories. We also comment on the parallels
between false memories produced in the DRM
paradigm and those that occur outside the
laboratory.
EXPERIMENT 1
If the high rate of falsely recalling key words in the
DRM paradigm is related to participants’ use of a
gist-based strategy, any change in procedure that
makes an alternative strategy more viable should
reduce the rate of falsely recalling key words. The
alternative strategy of interest in Experiment 1
was reliance on working memory and rehearsal:
we wanted our participants to repeat the list words
to themselves as continuously as possible from the
beginning of the list until the time of test. This
approach should be more effective with short than
with long lists, but only if the time between presentation and recall is not filled with an interfering
activity. The four conditions of Experiment 1 were
generated by crossing list length (short or long)
with interference (present or absent). We
predicted that the condition with short lists and no
interference would produce the fewest false
recalls of the key word, and the fewest false recalls
of other non-presented words as well.
Previous experiments have investigated the
variables of DRM list length and distraction, but
not together. Crossing these two variables allows
one to vary the associative strength of lists independently of the availability of memory strategies,
whereas this has arguably not been the case in
previous experiments that studied the effects of
147
list length and distraction. In Experiment 1 of
Robinson and Roediger (1997), DRM list length
was varied from 3 to 15 words. The result was
clear: the rate of falsely recalling and recognising
key words increased monotonically with list
length. Robinson and Roediger took these data as
evidence that the associative strengths of the list
words (i.e., their tendencies to elicit the key word)
combine in a cumulative manner. We believe that
another factor should also be considered: participants are more likely to use a gist strategy with
long lists than with short ones. As list length
increases, the gist structures of the list becomes
more and more obvious while the alternative
working memory strategy becomes less and less
viable.
In a second experiment, Robinson and Roediger (1997) used the same lists of words related to
the key words but added unrelated filler items so
that all the lists were 15 words long. The results
were essentially the same as before: the rate of
falsely recalling and recognising the key word
increased with the number of items on the list that
were related to the key word. However, this result
may be due to the offsetting effects of list length
and gist strength on the memory strategies participants use. On one hand, the sheer length of the
filled 15-word lists would discourage a working
memory strategy. On the other hand, increasing
the number of filler items would make the gist
structure less obvious.
An important aspect of Robinson and Roediger’s (1997) procedure was the use of an interpolated distractor task: participants were given
addition problems to solve during the 30 seconds
between the end of each list and the beginning of
recall. This type of distractor makes working
memory almost useless, forcing participants to use
a gist strategy. One would therefore expect
interference to increase the rate of falsely recalling the key word on trials where participants
would otherwise be using working memory (i.e.,
trials with short lists). Unfortunately, the only
DRM study that has explicitly varied the presence
of distraction (McDermott, 1996), used 15-word
lists exclusively.
McDermott’s (1996) study is also of interest for
another reason. Her participants clearly used a
‘‘late working memory strategy’’ on the no-distraction trials, recalling words from the end of the
list first. Indeed, McDermott notes that this
strategy gives ‘‘. . . little chance for the key word to
appear . . .’’ during that part of the recall (p.217).
Once the contents of working memory have been
148
LIBBY AND NEISSER
dumped in this way, however, a participant who is
trying to remember 15 words must still depend on
the gist strategy to recall the rest of the list. It is for
that reason, we believe, that McDermott found no
significant effect of distraction on the rate of
falsely recalling the key word. In our Experiment
1 we expected that, as in McDermott’s study, long
lists would encourage reliance on gist even without distraction. However, we predicted that in the
short-list conditions of our Experiment 1, participants would falsely recall the key word less often
under no-distraction conditions than they would
when distraction was present.
Method
Participants. A total of 72 Cornell undergraduates, 20 males and 52 females, were given
extra course credit for participating in the
experiment. The first 36 participants were randomly assigned to the short-list/distraction and
short-list/rehearsal conditions; the next 36 were
randomly assigned to the long-list/distraction and
long-list/rehearsal conditions.
Design and materials. The stimulus materials were based on the middle 18 lists from Roediger and McDermott’s (1995) Appendix. The
key words for these lists are spider, needle, cold,
doctor, high, foot, soft, fruit, mountain, man,
sleep, chair, river, music, girl, slow, rough, and
king. Each list consists of 15 associates of its key
word, arranged in descending order of associative strength. The full chair list, for example, is
table, sit, legs, seat, couch, desk, recliner, sofa,
wood, cushion, swivel, stool, sitting, rocking,
bench.
All participants were presented with 18 lists in
the same random order in which the key words
have just been listed. In the long-list/distraction
and long-list/rehearsal conditions the lists were all
15 words long (like the chair list just given). In
contrast, the lists in the short-list/distraction and
short-list/rehearsal conditions varied in length.
Participants in the short-list conditions heard six
lists of length 6, six of length 7, and six of length 8;
these were always the first 6, 7, or 8 words from
the corresponding Roediger-McDermott list. As
explained in the introduction, pilot experiments
had shown that mixing list lengths in this way
increases the rate of falsely recalling key words.
Because one aim of the present experiment was to
show a difference between distraction and
rehearsal conditions on short lists, we adopted the
mixed-length design to avoid a floor effect on false
recall rates for short lists. The assignment of lists
to lengths and the sequence of lengths actually
presented were counterbalanced across participants so that (a) each Roediger-McDermott list
was used once at each length, and (b) each length
appeared equally often at each position in the
sequence of 18 lists.
Procedure. Groups of 1 to 6 participants were
tested together. The lists, which had been recorded in a female voice at a rate of approximately
1.5 s per word, were presented by tape recorder.
Each list was followed by a 30 s interval during
which the participants either counted or rehearsed
(see later); this was followed by a signal to recall
the list. To avoid rushed recall in the long-list
conditions and idle waiting time in the short-list
conditions, the time allotted for recall was varied
by condition. (Pilot testing showed that the long
lists took longer to recall than did the short lists.
When we provided short-list participants with the
full amount of recall time required by the long-list
participants, short-list participants did not use the
extra time to recall. Rather, they finished writing
in about 30 s and then appeared to turn their
attention to other matters—looking out of the
window, attempting to open up a book. They did
not return to the recall task.) In the present
experiment, 40 s were allotted for recall in the two
short-list conditions and 70 s were given in the two
long-list conditions. After the recall interval, the
next list began.
In the long-list/distraction condition, participants were told that they would be asked to
remember some lists of words and to do some
arithmetic; they should do their best on both
parts because the relationship between the two
abilities was under study. Each 15-word list was
followed by a different 3-digit number. Participants wrote down the number and immediately
began counting backwards from it by sevens,
writing down each number along the way. (Given
‘‘107’’, for example, they were to write 107, 100,
93, 86, etc.) After 30 s the experimenter said
‘‘recall’’; the participants stopped counting backwards and wrote down as many words from the
preceding list as they could. As in the typical
DRM experiment, participants were given a
standard warning (subsequently used in all
experiments reported here) to be careful that the
words they wrote down were actually from the
list and not to guess. After 70 s for recall, the tape
THE ASSOCIATIVE FALSE MEMORY PARADIGM
started again with the next list. Booklets of
response sheets were provided; columns for each
list were marked with spaces for the counting and
recall tasks. A practice trial, using a list of unrelated words with no obvious relation to the lists
of the main experiment, was given at the end of
the instructions.
Participants in the long-list/rehearsal condition
were told that this was an experiment on memory,
that they would be asked to remember some lists
of words, and that they should do their best.
Instructions were similar to those just described,
except that counting backwards was omitted.
There was an empty 30 s interval between the end
of each list and the recall signal; participants were
told that they could use this interval to practice the
words to themselves.
Similar procedures were followed in the shortlist/distraction and short-list/rehearsal conditions,
except that the lists varied in length as noted
earlier and participants were given 40 s for recall.
For both long- and short-lists, all of the members of a testing group were assigned either to
count or rehearse in the 30 s following list
presentation; groups were randomly assigned to
conditions.
Results
False recall of key words. For each condition, the mean proportion of (the 18 total) lists on
which the key word was falsely recalled is
presented in Table 1. False recall of key words
149
occurred rather frequently in the long-list/
distraction and long-list/rehearsal conditions
(Ms = .33 and .38, SDs = .21 and .19), somewhat
less often (M = .28, SD = .19) in short-list/
distraction, and rarely (M = .12, SD = .09) in
short-list/rehearsal. A 2 (list length: short,
long) 6 2 (interpolated activity: distraction,
rehearsal) ANOVA revealed that the main effect
of list length on false recall of the key word is
highly
significant,
F(1, 68) = 14.5,
p < .001,
whereas that of interpolated activity is not. More
important, the predicted interaction between list
length and activity is significant, F(1, 68) = 6.74,
p < .01; this reflects the unique status of the shortlist/rehearsal condition with its very small
number of false reports. The rate of falsely
recalling the key word in the short-list/rehearsal
condition is significantly lower than in the shortlist/distraction condition; t(34) = 3.26, p < .003;
and also significantly lower than in the two longlist conditions, ts(34) > 4.00, ps < .001. The rates of
falsely recalling the key word in the short- and
long-list/distraction condition do not differ significantly, t(34) = .76, nor do the rates in the longlist/distraction
and
rehearsal
conditions,
t(34) = .80.
Considering just the short-list conditions, false
recall rate was submitted to a 2 (interpolated
activity: distraction, rehearsal) 6 3 (list length: 6,
7, 8) ANOVA with repeated measures on the
second variable. This analysis showed a main
effect for list length, F(2, 68) = 4.78, p < .01, and a
main effect for activity, F(1, 34) = 10.75, p < .002,
with no significant interaction. However, within
TABLE 1
Experiment 1
List length
Presentedc
Word type
Key non-presentedd
Other non-presentedd
Distraction
Longa
Shortb
.55 (.06)
.76 (.09)
.33 (.21)
.28 (.19)
.30 (.21)
.21 (.18)
Rehearsal
Longa
Shortb
.57 (.07)
.88 (.04)
.38 (.19)
.12 (.09)
.35 (.19)
.12 (.12)
Mean rates of recall for presented and non-presented words in Experiment 1 as a
function of list length and interpolated activity.
n = 18 in each condition. Values enclosed in parentheses represent standard deviations.
a
Lists were 15 words long. b Lists were 6, 7, and 8 words long. c Proportions are out of
270 words total in the long condition and out of 126 words total in the short condition.
d
Proportions are out of 18 lists total.
150
LIBBY AND NEISSER
the short-list/distraction and short-list/rehearsal
groups there is only one case in which rates of
falsely recalling the key word differ significantly
by list length.1 In addition, the critical 2 (list
length: short, long) 6 2 (interpolated activity:
distraction, rehearsal) ANOVA on false recall
discussed earlier was recalculated using each of
the short list lengths alone and the pattern of
results did not change. Thus, for the sake of simplicity, within each of the two short-list conditions
data for the three different list lengths were
combined in the crucial analyses.
False
recall
of
other
non-presented
words. The mean number of other non-presented words besides the key words that were
falsely recalled per list is also shown in Table 1.
The pattern across conditions is similar to that
observed for rates of falsely recalling the key nonpresented words: the rate is lowest in the shortlist/rehearsal condition. A 2 (list length: short,
long) 6 2 (interpolated activity: distraction,
rehearsal) 6 2 (non-presented item type: key
word, other non-presented) ANOVA with repeated measures on the last variable showed no
significant interactions involving the non-presented item-type factor. (There is a significant
main effect of length; F(1, 68)= 18.36, p < .001; a
significant interaction of length and activity;
F(1, 68) = 6.07, p < .02; and a marginally significant
main effect of non-studied item type,
F(1, 68) = 3.07, p < .08.) In addition, a 2 (list length:
short, long) 6 2 (interpolated activity: distraction,
rehearsal) ANOVA on false recall of non-presented words other than the key words revealed a
similar pattern of results to that obtained from the
comparable ANOVA on false recall of the key
words. There is a significant effect of list length on
false recall of other non-presented words;
F(1, 68) = 13.74, p < .001; this is qualified by a
marginally significant interaction between list
length and activity, F(2, 68) = 2.97, p < .09. There is
no significant main effect of activity.
Veridical recall of list words. Table 1 shows
the mean proportions of actual list words that
were correctly recalled in the various conditions.
As might be expected, these proportions are
higher with short than with long lists, and also
higher in the rehearsal than in the distraction
conditions. A 2 (list length: short, long) 6 2
(interpolated activity: distraction, rehearsal)
ANOVA showed that the main effects of list
length and interpolated activity are both highly
significant, Fs(1, 68) = 266.0 and 17.2, ps < .001.
The interaction is also significant, F(1, 68) = 10.3,
p < .01, reflecting the high rate of veridical recall in
the short-list/rehearsal condition. The rate of
veridical recall is significantly higher the short-list/
rehearsal condition than in the short-list/distraction condition; t(34) = 5.29, p < .001; and also significantly higher than in the two long-list
conditions, ts(34) > 16.21, ps < .001.
Considering just the short-list conditions, veridical recall rate was submitted to a 2 (interpolated activity: distraction, rehearsal) 6 3 (list
length: 6, 7, 8) ANOVA with repeated measures
on the second variable. This analysis showed main
effects for list length, F(2, 68)= 27.64, p < .001, and
for activity, F(1, 34) = 36.66, p < .001, with no significant interaction. Within the short-list/distraction and short-list/rehearsal groups veridical recall
rates differ significantly according to list length in
all but one case.2 However, again, the critical 2
(list length: short, long) 6 2 (interpolated activity:
distraction, rehearsal) ANOVA on veridical recall
discussed earlier was recalculated using each of
the short list lengths alone and the pattern of
results did not change. Thus, for the sake of simplicity, within each of the two short-list conditions
data for the three different list lengths were
combined in the crucial analyses.
1
For the short-list/distraction condition, there was no significant difference between the rates of falsely recalling the key
word at length 6 (M = .24, SD= .22) and length 7 (M = .25,
SD= .22); between the rates at length 7 and length 8 (M = .35,
SD= .29), or between the rates at length 6 and length 8,
ts(17) < 1.64, ps > .12. For the short-list/rehearsal condition, the
difference between the rate of falsely recalling the key words at
length 6 (M = .06, SD= .02) and length 7 (M = .11, SD= .11) was
not significant, nor was the difference between the rate of
falsely recalling the key words at length 7 and length 8 (M = .18,
SD= .18), ts(17) < 1.43, ps > .15. There was a significant difference between the rates at length 6 and length 8, t(17) = 2.61,
p < .02.
2
For the short-list/distraction condition, there was no significant difference between veridical recall at length 6 (M = .81,
SD= .10) and length 7 (M = .78, SD= .09); the difference
between veridical recall rates at length 7 and length 8 (M = .78,
SD= .09) was significant, as was the difference between the
rates at length 6 and length 8, ts(17) > 3.37, ps < .004. For the
short-list/rehearsal condition, the difference between veridical
recall rates at length 6 (M = .95, SD= .04) and length 7 (M = .89,
SD= .07) was significant, as was the difference between rates at
length 7 and length 8 (M = .83, SD = .06), and at lengths 6 and 8,
ts(17) > 1.51, ps < .005.
THE ASSOCIATIVE FALSE MEMORY PARADIGM
Discussion
When given short lists and time to rehearse before
responding, participants in Experiment 1 falsely
recalled the key words significantly less often than
did participants given the same short lists with
distraction, and significantly less often than participants given long lists with or without distraction. A pure associative account would have
predicted a list-length effect: there are fewer
associations leading to the key word in shorter
lists. However, the interaction between list length
and interpolated activity is not so easily explained
on this basis. In the two short-list conditions the
associative strength of the lists was the same, yet
false recall of the key words occurred significantly
less often in the short-list/rehearsal condition than
in the short-list/distraction condition. In combination with this result, rates of falsely recalling the
key word in the long-list conditions make the
point that the effect of rehearsal depended on its
strategic value. Allowing for rehearsal on long
lists when list length alone precluded a verbatim
memory strategy did not significantly affect the
rate of falsely recalling the key word.
Although one might have expected rehearsal
always to strengthen associations to the key word
and thus increase false recall across the board, the
data are more consistent with a strategy account.
When lists were short, rehearsal maximised the
attractiveness of a verbatim memory strategy,
thereby reducing reliance on the gist of the list,
and reducing false recall as well.3 If this interpretation is correct, recall in the short-list/
rehearsal condition should be more accurate in all
respects. Indeed, compared with participants in
the short-list/distraction condition, participants in
the short-list/rehearsal condition not only falsely
recalled the key word less often, but also correctly
recalled list words more often. In addition, similar
to the pattern for falsely recalling the key word,
false recall of other non-presented words was
lowest in the short-list/rehearsal condition.
3
An anonymous reviewer expressed concern about the
confound between recall time and list length in Experiment 1.
The reviewer suggested that participants given less time to
recall will be less likely to falsely recall the key word, and that
this could account for our results. The difference in false recall
between the two short-list conditions (which had the same
amount of recall time) together with the lack of difference
between the short- and long-list distraction conditions (which
had different amounts of recall time) is inconsistent with this
alternate interpretation.
151
Results from Experiment 1 are consistent with
other research showing that increasing the distinctive features of presented words reduces false
memories in the DRM paradigm (Hicks & Marsh,
1999; Israel & Schacter, 1997; McDermott, 1996).
In our next experiment we go on to test the idea
that false recall of key words can also be reduced
in a very different way that does not rely on distinctive characteristics of presented words.
EXPERIMENT 2
The gist of each list is the prominent higher-order
structure in the typical DRM situation and
encourages false recall of the key word. However,
a different higher-order organisation of the list—
one that does not subsume the key word—may
reduce the likelihood of falsely recalling the key
word. The party-planning example from the
introduction is consistent with this claim; Experiment 2 empirically tests it. Here, words from
DRM lists were presented as clues to a secret word
that the experimenter had in mind. Each participant’s task was to guess this one secret word that
was related to all the clues but not actually presented. (In fact, of course, the secret word was the
key word.) A plausible strategy with such a task is
to use the first few list words to determine a candidate guess for the secret word and then attend to
the remaining list words to make sure the candidate is not among them. A participant using this
strategy can be very certain that his or her candidate guess (usually, but not always, the key word)
has not occurred on the list of clues. Even though
some attention to the gist of the list is necessary to
establish a candidate word in the first place, that
word is remembered as having a particular status
incompatible with appearance on the list itself.
This strategy should effectively prevent false
recall of the word guessed as the secret word,
regardless of list length. The participant does not
have to remember all the list words; most of them
need only be checked against the candidate word
being held in mind. In contrast, veridical recall
should be affected by list-length just as in the
rehearse conditions of Experiment 1. To be sure,
participants may sometimes hit on words other
than the key words as candidates for the secret
words. On such trials, the candidate words should
not be falsely recalled, but false recalls of the key
word should be just as likely as in the rehearse
conditions of Experiment 1. Finally, due to the
strategy we expect participants to adopt, guessing
152
LIBBY AND NEISSER
the key word as the secret word should selectively
eliminate false recall of the key word; it should not
influence the rate at which other non-presented
words are falsely recalled.
Method
Participants. A total of 37 Cornell undergraduates, 20 males and 17 females, were given
extra course credit for participating in the
experiment. 18 were randomly assigned to the
short-list/game condition and 19 to the long-list/
game condition.
Materials. The same lists were used as in
Experiment 1. Participants in the long-list/game
condition heard the lists that had been used in the
long-list/rehearsal and long-list/distraction conditions. Participants in the short-list/game condition
heard the same counterbalanced sets of lists as in
the short-list/rehearsal and short-list/distraction
conditions.
Procedure. Groups of 1 to 6 participants were
tested together. All participants were given 18
lists. The overall procedure was the same as in
Experiment 1. Participants heard a list, there was
a 30 s interval, the experimenter gave a signal to
recall the list; then after either 40 s (short-list
condition) or a maximum of 70 s (long-list condition) for response, the next list began. The
important difference between Experiments 1 and
2 is in the instructions given to participants.
Experiment 2 was explained as a game in which
the experimenter would be thinking of a secret
word and the participants were each to try to figure out what that word was from a list of clues.
The experimenter (LKL, who had also recorded
the lists for Experiment 1) read the lists aloud at a
rate of approximately 1.5 s per word. After presentation of the list there were 30 s for participants
to decide on their guesses for the secret word.
Then, when the experimenter said ‘‘answer’’,
participants recorded their guesses for the secret
word and also as many of the clues as they could
remember. Participants worked independently
and were given the standard warning against
guessing used in Experiment 1.
Booklets of 18 response sheets were provided,
each marked with a blank for the secret word
(‘‘What’s my word?____’’) and a space beneath to
write the clues. The front page of the booklet
showed an example of a completed answer sheet:
‘‘bread’’ was filled in as the secret word and the
words from Roediger and McDermott’s (1995)
bread list (not used in the main experiment here)
were filled in as the clues. The experimenter
explained that a clue could be related to the secret
word in any number of ways: as an opposite, an
exemplar, a descriptor, or just a word that often
occurs with secret word. Before the experimental
trials began, a practice trial was given; it was based
on the Roediger-McDermott list for sweet, which
was not used in the main experiment.
Results
Trials on which a participant guessed correctly
(i.e., chose the key word as the secret word) will be
called C-trials. Out of the 18 total trials, the mean
number of C-trials per participant in the long-list/
game condition is 14 (SD = .63); in the short-list/
game condition the mean number is 10 (SD = .78).
Trials on which a participant failed to guess the
key word will be called X-trials (i.e., for an
individual participant the number of Xtrials = 187the number of C-trials). (One participant guessed the key word correctly on every trial.
Thus, this participant has no X-trials and is
excluded from all analyses involving X-trials.)
Most of the X-trials (52% in long-list/game, 82%
in short-list/game) occurred when participants
guessed plausible related words that were not on
the list. Others occurred when participants—
despite the instructions—guessed a clue word
from the list, or simply failed to respond.
For each participant, recall performance was
calculated separately for C-trials and for X-trials.
A participant’s C-trial rate of falsely recalling the
key word is the proportion of his or her C-trials on
which he or she falsely recalled the key word. A
participant’s C-trial veridical recall rate is the
proportion of words from his or her C-trial lists
that he or she correctly recalled. Individual scores
for X-trials were calculated in the same manner.
False recall of key words. Table 2 shows that
participants in both conditions almost never falsely recalled the key word on C-trials (long-list/
game: M = .01, SD = .05; short-list/game: M = .005,
SD = .02). (Of the 37 participants, 34 did not falsely recall any of the key words; 2 participants did
so a total of three times in the long-list condition,
and 1 did so once in the short-list condition.) On
X-trials, however, participants falsely recalled the
key word more frequently: on an average of .41
THE ASSOCIATIVE FALSE MEMORY PARADIGM
153
TABLE 2
Experiment 2
Presentedc
Word type
Key non-presentedd
Other non-presentedd
C-trials
Longa
Shortb
.58 (.06)
.81 (.14)
.01 (.05)
.005 (.02)
.31 (.29)
.08 (.10)
X-trials
Longa
Shortb
.45 (.13)
.75 (.11)
.41 (.29)
.15 (.12)
.32 (.55)
.16 (.13)
List length
Mean rates of recall for presented and non-presented words when key word guessed
(C-trials) and key word not guessed (X-trials) in the two conditions of Experiment 2.
n = 18 or 19 in each condition.
a
Lists were 15 words long. b Lists were 6, 7, and 8 words long. c Proportions are out of
the total number of words presented per trial type. d Proportions are out of the total
number of lists per trial type.
(SD = .29) of the X-trials in the long-list/game
condition and on an average of .15 (SD = .12) in
the short-list/game condition. A 2 (list length:
short, long) 6 2 (trial type: C-trial, X-trial)
ANOVA with repeated measures on the second
variable showed that both main effects on rate of
falsely recalling the key word are highly significant, Fs (1, 34) > 13.25, ps < .001. These main
effects are qualified by a significant interaction;
F(1, 34) = 10.71, p < .002; reflecting the fact that list
length did not significantly affect the rate of
falsely recalling the key word on C-trials,
t(35) = .8, but list length did have an effect on Xtrials, t(35) = 3.5, p < .001.
On C-trials, long- and short-list/game participants falsely recalled the key word on a far smaller
proportion of lists than did participants in the
comparable rehearse conditions of Experiment 1
(long conditions: .01 vs .38; short conditions: .005
vs .12). A 2 (list length: short, long) 6 2 (activity:
game, rehearse) ANOVA showed that both main
effects and the interaction are significant, Fs
(1, 68) > 24.94, ps < .001. As predicted, however,
the corresponding comparisons for X-trials
produced a very different result: similar rates of
falsely recalling the key word in game and
rehearse (long conditions: .41 vs .38; short conditions: .15 vs .12). Indeed, a 2 (list length: short,
long) 6 2 (activity: game, rehearse) ANOVA
showed only a significant effect of length,
F(1, 68) = 34.3, p < .001, in this case.
False
recall
of
other
non-presented
words. The mean number of other non-presented words besides the key words that were
falsely recalled per list on C- and on X-trials is also
shown in Table 2. A 2 (list length: short, long) 6 2
(trial type: C-trial, X-trial) ANOVA with repeated measures on the second factor showed only a
significant effect of list length, F(1, 34) = 5.68,
p < .023. In contrast to the comparable analysis of
false recall of the key word, the main effect of trial
type on false recall of other non-presented words
is not significant, nor is the interaction effect, Fs
(1, 34) < .43. This result is consistent with the
prediction that the effect of guessing the key word
as the secret word would be different for false
recall of the key word than for false recall of other
non-presented words.4
The rate of falsely recalling other non-presented words in the C-trials of the game and in the
rehearse conditions of Experiment 1 were submitted to a 2 (list length: short, long) 6 2 (activity:
game, rehearse) ANOVA. The only significant
effect was a main effect of list length,
F(1, 69) = 26.32, p < .001; the corresponding analysis using the game X-trials also showed only a
main effect of list length, F(1, 69) = 7.27, p < .009.
Thus, for both C- and X-trials of Experiment 2,
the rates of falsely recalling other non-presented
words at each list length are comparable to the
4
This contrast between the patterns of false recall observed
for false recall of key words and other non-presented words is
reflected in the marginally significant three-way interaction
obtained from a 2 (list length: short, long) 6 2 (trial type: Ctrial, X-trial) 6 2 (non-presented word type: key, other)
ANOVA with repeated measures on the second two factors,
F(1, 34)= 3.64, p < .06. The two-way interaction between nonpresented word type and trial type was significant,
F(1, 34)= 7.00, p < .01. The only other signficant effect was a
main effect of non-presented word type, F(1, 34)= 24.47,
p < .001.
154
LIBBY AND NEISSER
rates observed in the corresponding rehearsal
conditions of Experiment 1. This contrasts with
the pattern of rates of falsely recalling the key
word: only the X-trials of Experiment 2 show
comparable rates to those observed at each list
length in Experiment 2, C-trials do not.
Veridical recall. Table 2 shows that participants correctly recalled a larger proportion of the
list words on C-trials than on X-trials. A 2 (list
length: short, long) 6 2 (guessing performance: Ctrials, X-trials) repeated measures ANOVA
showed that the main effect of guessing performance on veridical recall is significant,
F(1, 34) = 26.5, p < .001. The main effect of list
length is also significant, F(1, 34) = 67.2, p < .001:
participants in short-list/game recalled a larger
proportion of list words than did participants in
long-list/game on C- and on X-trials. There is no
significant interaction.
A 2 (list length: short, long) 6 2 (activity: game,
rehearse) ANOVA showed that, on the C-trials in
the game conditions and in the rehearse conditions of Experiment 1, list length significantly
affected the proportion of list words correctly
recalled; F(1, 69) = 185.1, p < .001; activity had no
significant effect. The interaction is not significant. A similar analysis for X-trials showed
main effects for both list length; F(1, 68) = 197.6,
p < .001; and activity; F(1, 68) = 32.9, p < .001; with
no significant interaction: veridical recall was
better on short lists and in the rehearse condition.
Thus, as predicted, the effect of list-length on
veridical recall does not differ between
Experiments 1 and 2, regardless of whether C- or
X-trials of Experiment 2 are considered.
Discussion
Experiment 2 showed that presenting DRM lists
in a context emphasising a higher-order structure that excludes the key word can virtually
eliminate false recall of the key word, even with
long lists. When participants proposed the key
word as the experimenter’s secret word they
almost never falsely recalled the key word,
regardless of whether the DRM list was long or
short (i.e., regardless of the associative strength
of the list). In contrast, when participants did
not propose the key word as the secret word,
they were no better at avoiding falsely recalling
the key word than were participants in the comparable rehearse conditions of Experiment 1,
where the higher-order gist structure included
the key word. Considering C- and X-trials from
both conditions of Experiment 2 together
strengthens the claim that the game reduced
false recall of the key word due to an effect of
higher-order structure on memory strategies.
Across both conditions on all trials, the words
that participants proposed as the secret words
were almost never listed among the clues. Even
on those X-trials where participants falsely recalled the key word, they did know that they had
not heard their own candidate words.
According to our reasoning, the game condition eliminated false recall of the key word on Ctrials because the higher-order structure excluded
that word in particular. If this were the case, then
veridical recall rates on C-trials in the long- and
short-list/game conditions should not be different
from those observed in the comparable rehearse
conditions of Experiment 1; indeed they were not.
Analyses of false recall of non-presented words
other than the key word are also consistent with
our reasoning. Unlike rates of falsely recalling the
key word, rates of falsely recalling other nonpresented words were not affected by whether or
not the participant guessed the key word as the
secret word. Moreover, on both C- and X-trials,
the pattern of falsely recalling other nonpresented words was no different from the rates
observed in the comparable rehearsal conditions
of Experiment 1.
The effect of our game instructions could be
described as a warning to participants about the
special structure of the lists. Previous experiments have used explicit warnings and, in general, have found little effect on false reports of
the key word. Comparing Experiment 2 with
these earlier warning studies suggests that what
matters is not the warning itself, but the kinds of
strategies that warning makes available. Our
game task not only warned participants about the
structure of the DRM lists, but also suggested a
specific strategy for thinking about them: to pick
one word and make sure it is not on the list. This
strategy enabled participants to avoid the key
word on trials in which they had guessed the key
word as the secret word. In contrast, the design of
most previous warning experiments (Gallo et al.,
1997; Lampinen et al., 1997; Neuschatz & Payne,
1996) limited the strategic value of the warning:
long study lists composed of up to 10 DRM lists
were presented back to back. The presentation of
so many list words apparently forces participants
to focus on gists, thus making it difficult to keep
THE ASSOCIATIVE FALSE MEMORY PARADIGM
key words separate from list words. Gallo et al.
(1997) did find an effect of warning in this case,
but it was far from an elimination. Neuschatz and
Payne (1996) found no effect of their warning,
perhaps in part because the directions did not
specify that there was only one key word per
category. Neuschatz and Payne also suggested
participants pay special attention to the words
they did hear; in contrast, our instructions
focused attention on words not presented.
Finally, Lampinen et al.’s (1997) warning presentation has very little strategic value. Not only
did participants hear multiple DRM lists back to
back, but the warning was not given until after
these lists were presented. Once participants have
focused on the gist at list presentation, they have
little hope of distinguishing key words from the
words actually presented.
The only tests of an explicit warning in a single-list format are McDermott and Roediger’s
(1998) Experiments 2 and 3, which did not eliminate false recognition of the key word. Although
McDermott and Roediger used a recognition
measure, it seems likely that the same sort of
organisational strategy was made available by
their warning as by our game task. In our interpretation, such warnings should be only as effective in eliminating false recall of the key word as
participants are in guessing what the key word is.
In our experiment participants did not always
correctly figure out what the key word was—
apparently, neither did McDermott and Roediger’s participants.
In any case, the guessing task used in Experiment 2 is not the only higher-order task structure
that can be used to reduce false recall of the key
word with DRM lists. Another such task structure
was explored in Experiments 3a and 3b.
EXPERIMENT 3A
The gist of each list in Experiment 2 provided key
information about the secret word. On trials in
which participants took full advantage of that
information and proposed the key word as the
secret word, there were virtually no false reports
that the key word had been on the list. This finding contrasts with the high rates of falsely recalling the key word in most other DRM studies. In
our view, these results reflect the particular
strategy that was adopted by our participants in
response to the guessing-game structure. The
game in Experiment 2 provided a higher-order
155
structure for the DRM lists that excluded the key
words. However, the game instructions also
explicitly communicated to participants the special construction of the DRM lists: all the presented words are related to one key nonpresented word. In this sense, the game instructions could be interpreted as a warning about
being misled by the gist. The purpose of Experiment 3a, in which the gist structure of the lists
was never mentioned at any time, was to show
that changes of higher-order structure can reduce
the rate of falsely recalling the key words even
without such a warning.
The DRM lists of Experiment 3a were again
embedded in a task that provided a higher-order
structure for the list that competed with the gist
organisation. This time, however, that task had
nothing to do with the semantic structure of the
lists: it only concerned the first letters of individual
words. Before presenting each list, the experimenter announced a focus letter. (In fact, this was
always the first letter of the key word for that list.)
Words that began with the focus letter were called
focus words. The participants’ tasks were (a) to
count and report the number of focus words on the
list, (b) to remember those focus words in the
order in which they were presented, and (c) to
remember as many of the other list words as
possible.
Paying special attention to the focus words
during encoding should keep them in the forefront
of working memory; loosely speaking, working
memory would ‘‘contain’’ all the focus words a
participant has heard in the current list. The
availability of these words in working memory
would make a particular inference possible at
retrieval if the key word should come to mind and
be considered as a possible list word. Noticing that
the key word began with the focus letter, the
participant could check whether the key word was
among the focus words still being held in working
memory. As it would not be, he or she could
conclude (correctly) that the key word had not
appeared on the list.
The stimulus material for Experiment 3a consisted of DRM lists that were 9, 10, and 11 words
long. There were one, two, three, or four focus
words per list. Participants in the key-focus condition were given the special instructions described earlier; those in the control condition were
given standard DRM instructions. We predicted
that key-focus participants would produce substantially fewer false reports of the key word than
would control participants.
156
LIBBY AND NEISSER
Method
Participants. A total of 36 Cornell undergraduates, 10 males and 26 females, were given
extra course credit for participating in the
experiment. Of these, 18 were randomly assigned
to the key-focus condition and 18 to the control
condition.
Materials. Fifteen of Roediger and McDermott’s (1995) lists include at least one word that
starts with the same letter as the key word; some
lists have two, three, or four such words. Ten of
these lists, representing a variety of initial focus
letters, were modified for use in Experiment 3a
(number of focus words in parentheses): slow (4),
mountain (1), black (3), sweet (3), doctor (1), chair
(2), bread (1), sleep (3), river (1), man (2). The lists
were 9, 10, or 11 words long (list length was varied
for similar reasons as in Experiment 1). These
were not necessarily the first 9/10/11 words of the
corresponding Roediger-McDermott list; it was
sometimes necessary to replace earlier words with
later ones to achieve adequate variation in the
number and placement of focus words. The actual
lists used are shown in Appendix A.
Procedure. In both conditions, groups of 1 to
6 participants were tested together. The lists,
which LKL had recorded in the order mentioned
earlier at a rate of approximately 1.5 s per word,
were presented by tape recorder. Each list was
followed by a 30 s blank interval, after which the
experimenter gave a signal to recall the list.
Participants had 40 s to do so before the next list
began.
In the key-focus condition, the experimenter
explained that for each list participants were to
count how many words began with the focus letter, to remember those words in the order they
were presented, and then to remember as many of
the other words as possible. Appropriate response
sheets showing the focus letter for each list were
provided; these sheets had specific areas for each
list in which to write the number of focus words,
the focus words themselves, and the rest of the
words from the list. Participants were given the
standard warning against guessing used in the
preceding experiments. The experimenter
reminded participants of the appropriate focus
letter before each list began. A practice trial, using
words unrelated to each other and unrelated to
any of the experimental lists, was given before the
main experiment.
In the control condition, the experimenter
explained that this was an experiment on memory,
that participants would be asked to remember
some lists of words, and that they should do their
best. The response booklets simply provided
spaces in which to write down the list words.
Participants were given the same warning against
guessing and the same practice list as in the keyfocus condition.
Results
False recall of key words. Table 3 shows that
the proportion of lists on which participants falsely recalled the key word was much lower in the
key-focus (M = .09, SD = .08) than in the control
condition (M = .22, SD = .15). This difference is
highly significant (U = 79, p < .01), as shown by a
Mann-Whitney U-test. (This test was used due to
unequal variances in the two groups: the majority
of key-focus participants never or only once falsely recalled the key word, whereas most of the
control participants did so from two to five times.)
Across participants in the key-focus condition,
there were only 16 falsely recalled key words. Of
these, 10 were reported in the focus-word section
of the response sheets; in all 10 cases, the participants’ reports of the number of focus words indicated that they had counted the key word among
them. The remaining six false recalls of the key
word appeared among the ‘‘other’’ words on the
response sheets. On five occasions, key-focus
participants listed actual focus words among the
‘‘other’’ words.
TABLE 3
Experiments 3a & b
Condition
Presenteda
Word type
Key non-presentedb
Experiment 3a
Control
Key-focus
.73 (.07)
.62 (.09)
.22 (.15)
.09 (.08)
Experiment 3b
Control
Key-focus
Other-focus
.71 (.10)
.65 (.08)
.65 (.07)
.20 (.19)
.07 (.14)
.18 (.15)
Mean rates of recall for presented and non-presented words
in each condition of Experiments 3a and 3b.
n= 18 or 19 in each condition. Values enclosed in parentheses represent standard deviations.
a
Lists were 9, 10, and 11 words long. Proportions are out of
90 words total. b Proportions are out of 9 key words total.
THE ASSOCIATIVE FALSE MEMORY PARADIGM
Veridical recall. In Experiments 1 and 2, the
conditions with the lowest rates of false recalling
the key word also had the highest rates of veridical
recall. This was not the case in Experiment 3a,
however. Table 3 shows that the key-focus condition had significantly fewer veridical recalls than
the control condition, Ms = .62 and .73, SDs = .09
and .07, respectively; t(34) = 4.45, p < .001.
Discussion
As predicted, introduction of the focus-letter task
substantially reduced the rate of falsely recalling
the key word. Nevertheless, that rate did not go to
zero: across all participants, there were 16 false
recalls of the key word in the 180 trials of the keyfocus condition. The fact that most of these falsely
recalled key words were reported as focus words
suggests that our initial analysis of the strategies
available in this condition may have been incomplete. The strategy we described (checking a key
word that comes to mind during recall against the
focus words held in working memory) will prevent
false reports of the key word only if the focus
words in working memory are the ones that
actually appeared on the list. If the key word had
already come to mind during list presentation it
might have been stored in working memory along
with the real focus words, and later reported. Just
this seems to have happened on 10 occasions in
the key-focus condition. That the majority of false
recalls of the key word (10/16) followed this
pattern suggests that false recall of key words in
the DRM paradigm may often be produced at
encoding.
The fact that our key-focus instructions
reduced veridical recall of the list words (as well
as false recall of the key word) probably reflects a
simple interference effect. The additional task of
counting and keeping track of the focus words,
required in this more difficult condition, has
much in common with the main task of remembering the list words. Such a conflict would be
expected to produce a certain amount of
interference, and it apparently did so. But,
whatever its cause, this reduction in veridical
recall introduces a competing explanation for the
drop in the rate of falsely recalling the key word.
Is it possible that both recall rates went down as a
simple result of increased task difficulty?
Experiment 3b was conducted as a direct
empirical test.
157
EXPERIMENT 3B
We believe that the key-focus task in Experiment
3a reduced false recall of the key word because the
task made certain strategies involving the focus
letter available to participants. However, the
decrease in veridical recall suggests another possibility. Perhaps the difficulty of the key-focus task
reduced false recall of the key word simply by
causing participants to report fewer words overall.
To test this hypothesis, we devised a task that
taxed working memory to the same extent as the
key-focus task of Experiment 3a but did not confer the same strategic benefits. This was a focusletter task in which the focus letters were not the
first letters of the key words.
For Experiment 3b we modified the focus lists
from Experiment 3a such that each list contained
two alternative sets of focus words—one set that
began with the first letter of the key word and one
set that began with a different letter. Accordingly,
Experiment 3b had two focus conditions: participants were given either the key-word letters (keyfocus) or the other letters (other-focus) as focus
letters. Participants in a third (control) condition
did not engage in the focus task at all but were
simply asked to listen to the lists during presentation, as were control participants in Experiment 3a. We expected that the focus task would
have adverse effects on veridical recall (compared
to the control task), regardless of whether the
focus letter was the first letter of the key word or
not. Nevertheless, the control and other-focus
conditions should produce comparable rates of
falsely recalling the key words, and both should be
substantially higher than the rate in the key-focus
condition.
Method
Participants. A total of 55 Cornell undergraduates, 24 males and 31 females, were given
extra course credit for participating in the
experiment. They were randomly assigned to one
of three conditions: 18 to control and other-focus,
19 to key-focus.
Materials. The lists used in Experiment 3a
were modified so that each list contained two sets
of an equal number of focus words: the words in
one set began with the first letter of the key word
and the words in the other set began with a different letter. This arrangement could only be
158
LIBBY AND NEISSER
achieved in 9 of the 10 lists used in Experiment 3a,
so participants in Experiment 3b were given only 9
lists, which are shown in Appendix B.
Procedure. In all conditions, groups of 1 to 8
participants were tested together. In the control
condition, participants were given the same type
of answer booklets and directions as were control
participants in Experiment 3a. In the key-focus
and other-focus conditions, participants were
given the same type of answer booklets and
directions as were the focus participants in
Experiment 3a. The only difference between the
key- and other-focus conditions was in the focus
letters given to participants. As in Experiment 3a,
lists were presented on a tape recorder at a rate of
1.5 s per word; there was a blank 30 s interval after
each list, followed by a 40 s interval in which
participants recalled words from that list.
Results
False recall of key words. Table 3 shows that
the mean proportion of the total nine lists on
which participants falsely recalled the key word
was much lower in the key-focus condition
(M = .07, SD = .14) than in the other two conditions, where the rates were similar (other-focus:
M = .18, SD = .15; control: M = .20, SD = .19). A
Kruskal-Wallis One-Way ANOVA showed a significant effect of condition on false recall of the
key word, w2 (2) = 9.06, p < .01. A Mann-Whitney
U-test revealed no significant difference between
the rates of falsely recalling the key word in the
control and other-focus conditions, U = 161.5, ns.
However, the planned contrast between the rate
of falsely recalling the key word in the key-focus
condition and that in the other two conditions
combined showed that the difference was highly
significant, U = 179, p < .01. As in Experiment 3a,
non-parametric tests were used here due to
unequal variances. The number of key words falsely recalled by participants in the control and
other-focus conditions ranged from 0 to 6. In
contrast, 12 of the 19 participants in the key-focus
condition never falsely recalled the key word; 5 of
them falsely recalled the key word only once. (It
should be noted that the score of one unusual keyfocus participant, z = 3.6, greatly affected the
mean proportion of lists on which the key word
was falsely recalled in the key-focus condition.
When this participant is removed, the mean rate is
falsely recalling the key word in this condition
drops to .04, SD = .07.)
Across all key-focus participants, there were
only 12 falsely recalled key words. Nine of these
were reported in the focus-word section of the
response sheets; in all nine cases, the participants’
reports of the number of focus words indicated
that the key word had been counted among them.
The remaining three false recalls of the key word
appeared among the ‘‘other’’ words on the
response sheets. On one occasion in the key-focus
condition and two occasions in the other-focus
group, participants listed actual focus words
among the ‘‘other’’ words.
Veridical recall. Table 3 shows that the mean
proportions of list words veridically recalled were
identical in the key-focus and other-focus conditions (Ms = .65, SDs = .08 and .07, respectively),
and less than in the control condition (M = .71,
SD = .10). A one-way ANOVA shows a marginally significant effect of condition on veridical
recall, F(2, 54) = 2.83, p = .07. The planned contrast between the rate of veridical recall in both
focus conditions together and that in the control
condition was significant, t(35) = 2.39, p < .05.
Discussion
If the (key-) focus task reduces false recall of the
key word simply by increasing task difficulty, then
the reduction in false recall of key words should be
independent of the particular letters of focus. The
results of Experiment 3b contradict this hypothesis. False recall of key words was reduced only
when participants focused on the first letters of the
key words, not when participants focused on different letters. In contrast, veridical recall was
reduced no matter what the focus letters were. We
conclude that the reduced incidence of falsely
recalling the key word observed in the key-focus
conditions of Experiments 3a and 3b appeared
because participants took advantage of the
opportunity for strategic avoidance of the key
word that the key-focus structure provides.5
5
One question that remains in both Experiments 3a and 3b
is how the number of focus words per list affected recall performance. It might be predicted that the more focus words, the
harder it is to keep all of them in verbatim memory and the
more likely participants would be to believe the key word was
presented. However, an analysis of our data would not provide
an adequate test of this hypothesis. As the lists in our experiments vary not only by number of focus words but also by
length and the particular key word around which the list is
constructed, any analysis of the effect of number of focus letters would be misleading.
THE ASSOCIATIVE FALSE MEMORY PARADIGM
GENERAL DISCUSSION
We have reported three DRM experimental
designs in which false recall of key words was
sharply reduced. In our view, these reductions
occurred because the structure of the experimental situations encouraged participants to use
strategies other than simply depending on the gist
of the list. In the short-list/rehearsal condition of
Experiment 1, participants relied more on individual words retained in working memory than on
overall gist. Thus, the rate of falsely recalling the
key word in this condition was uniquely low and
the rate of veridical recall uniquely high. Participants who played the guessing-game in Experiment 2 knew that the ‘‘secret words’’ they figured
out were not on the lists. Thus, regardless of list
length, false recall of the key word was essentially
eliminated on trials where participants guessed
the secret words correctly. In Experiments 3a and
3b false recall of the key word rarely occurred
when the instructions required participants to
organise the DRM list in memory according to the
first letter of the key word; Experiment 3b showed
that this effect was due to the strategy such an
organisation makes possible. Keeping track of all
list words that began with the key letter allowed
participants to be sure that the key word was not
on the list; keeping track of all list words that
began with some other letter did not allow such a
strategy.
Previous research has shown that increasing the
salience of characteristics of individual list items
reduces false recall of the key word; we have
linked this effect to a more general phenomenon
regarding the influence of situational structure on
memory strategy in the DRM paradigm. Just as
making individual items salient reduces the reliance on gist (Experiment 1), so does making an
alternate higher-order structure salient (Experiments 2 and 3a/b); both manipulations result in
substantial reductions in false recall of the key
word. Our point is not simply that the rate of
falsely recalling the key word was much reduced
in these experiments (it never quite reached zero),
but that participants’ memories of the lists
appeared to have been greatly affected by the
contexts of the tasks.
From this perspective, the few intrusions participants did make are instructive. In most cases,
these intrusions reflect the strategies appropriate
to the contexts in which the lists were encountered. The short-list/rehearsal condition of
Experiment 1 focused attention on individual
words, but nothing about the organisation of the
159
task signalled that the key word should not be
among those words. Thus, if the key word came to
mind it might well be accepted as a list word and
(falsely) recalled. In Experiment 2, false recall of
the key word only occurred on trials where the
participant had not guessed the key word as the
secret word. Participants in Experiments 3a and
3b who falsely recalled the key word usually listed
it as one of the focus words, which is where it
should be if the list were organised according to
the focus letter. These patterns of intrusions show
that even when the key word may be activated in
memory, participants do not accept such traces of
activation blindly but rather interpret them within
the structure that the situation provides.
Others have also pointed out that the gist of
DRM lists plays a role in producing false memories of the key word. Schacter and colleagues
(Israel & Schacter, 1997; Schacter, Israel, &
Racine, 1999; Schacter et al., 1996) argue that
presentation of numerous related items highlights
the gist and, when participants do not retain the
distinctive details of words that were actually
presented, false recall is likely. This reasoning led
to the prediction that study conditions that
encourage encoding of distinctive features of
presented words should reduce false recall.
Indeed, this appears to be the case (e.g., Israel &
Schacter, 1997; Schacter et al., 1999). Mather et al.
(1997) make a similar argument, from a sourcemonitoring perspective. They propose that in the
DRM paradigm the dimension of semantic similarity (which does not differentiate between presented words and the key word) is so salient that it
overrides other dimensions that would differentiate between presented words and the key
word. In the DRM, focusing on the salient (yet
non-diagnostic) semantic characteristics of memories causes key words that were internally generated to be misattributed to the list read by the
experimenter. Finally, a third explanation involving the role of gist is that put forth by Payne et al.
(1996). They apply the principles of fuzzy trace
theory (Reyna & Brainerd, 1995) which proposes
that people encode, in parallel, a verbatim and a
gist representation of events as they occur. The
process of establishing the gist is called ‘‘gist
extraction’’ and this is what allows people to pick
up on patterns of stimuli within an event. Memory
can be based on either verbatim or gist representations; Payne et al. propose that false memories of the key word in the DRM are based on
gist representations. Two of their experiments
(Experiments 2 and 3) showed that repeated
testing increased false recall, a result they attrib-
160
LIBBY AND NEISSER
uted to increased opportunities for gist extraction,
and thereby stronger gist representations on
which false memories of the key word were based.
All of these theories would seem to suggest the
design of our Experiment 1, and also to make the
same predictions as we did. Allowing for verbatim
rehearsal of all presented words increases the
distinctive features of presented words, would
encourage the use of perceptual detail (rather than
semantic characteristics) for source monitoring,
and would eliminate the use of the extracted gist as
a basis for responding. However, our focus on the
role that the structure of the situation plays in
determining memory strategies led us to the design
and predictions of Experiments 2 and 3, which do
not follow directly from the other approaches. The
reduction in falsely recalling the key word in our
Experiment 2 was achieved by introducing a
higher-order structure for the list that put constraints on the relevance of gist to the presented
words. All of the words on the list were related to
the gist, but so was one word that was not on the
list. In this case, false recall of the key word was
reduced not by paying closer attention to the
presented words, but by knowledge of how the gist
related to the structure of the game. Indeed, the
game may even have enhanced the gist-extraction
process, which was necessary to figure out the
secret word. This did not increase false recall (as
Payne et al.’s interpretation might predict); rather,
the game worked against false recall because its
structure differentiated this element of the gist
from the presented words. In Experiment 3 we
expected that grouping the list according to the
first letter of the key word would alert participants
to the absence of the key word. In the key-focus
condition participants would be paying close
attention to a feature of presented words (the
initial letter) that was the same as a feature of the
non-presented key word. Neither the rationale
proposed by Schacter and colleagues nor Mather
et al.’s focus on memory characteristics would
readily predict the success of this design, as these
theories emphasise the salience of differences
between the presented and non-presented words
for reducing false memories of the key word. It is
noteworthy that false recall was not reduced in the
other-focus condition of Experiment 3b, when
participants focused on organising the list around a
letter that did distinguish presented words from
the key word.
To make these contrasts with other theories
regarding the role of gist is not to argue that they
are invalid, or that they are necessarily incompa-
tible with our own. However, our focus on the role
of higher-order structural knowledge illuminates
new ways of reducing false recall of key words in
the DRM. In so doing, our approach adds to an
understanding of how gist is related to DRM false
memories. Memory always occurs in the context
of a particular activity, and that context affects
how people go about the task, as well as what they
actually remember. This principle applies in listlearning experiments, even when particular associations or traces are strongly activated, as they
surely are in the DRM paradigm. We do not doubt
that such activation occurs; our point is that there
is no automatic link from the activation of a trace
to a person’s belief that this activation is evidence
of a real past occurrence.
In this regard, the perspective we offer on the
DRM paradigm is consistent with other recent
research that has focused on the role of situational
and metacognitive knowledge in memory. The
source-monitoring framework (Johnson, Hashtroudi, & Lindsay, 1993; Johnson & Raye, 1981)
proposes that both memory characteristics and
more general knowledge about the way the world
works figure into reality-monitoring decisions
(e.g., ‘‘That memory of a money tree in my back
yard is extremely vivid, but it must have been a
dream because I know that money does not grow
on trees’’, Johnson & Raye, 1981, p. 72). However,
Bayen and colleagues (Bayen, Nakamura, Dupuis,
& Yang, 2000) point out that there is relatively
little empirical evidence for the latter factor, as
most source-monitoring experiments arbitrarily
assign items to sources. For example, items may
be presented by two different speakers, yet these
speakers are not identified by any social roles,
characteristics, or opinions that are relevant to the
words they present. In this case, participants must
rely solely on memory characteristics in a later
source-monitoring test because there is no higherorder information to use. Several recent studies
(Bayen et al., 2000; Mather, Johnson, & DeLeonardis, 1999; Sherman & Bessenoff, 1999) have
introduced meaningful relationships between
sources and presented items and found that participants used these relationships during subsequent source-monitoring tasks. In addition,
other studies suggest that people use their beliefs
about how their memories should work, given
particular encoding conditions, to interpret
recollective experiences they have when their
memory is later tested (Bink, Marsh, & Hicks,
1999; Forster & Strack, 1998). The conclusion
from these studies is consistent with our results
THE ASSOCIATIVE FALSE MEMORY PARADIGM
from Experiment 2 (e.g., ‘‘Because I figured out
the secret word, I know it was not presented’’) and
the key-focus conditions of Experiments 3a and 3b
(e.g., ‘‘I know that ‘sleep’ was not presented
because if it were, I would have been rehearsing it
as one of the focus words’’).
Demonstrations of the effect of higher-order
knowledge on memory in verbal learning situations fit with observations of autobiographical
remembering. For example, people are more
likely to accept a suggested false memory from
childhood when this event fits within the framework of relevant self-knowledge than when it does
not (Hyman & Billings, 1998; Hyman, Husbands,
& Billings, 1995). In this connection, our results
and discussion illustrate an important point. One
of the most intriguing questions about the DRM
paradigm is whether the false memories that it
produces so readily have anything in common
with the more personal false memories that can
often occur in clinical settings. People’s notions
about what really occurred in the past depend not
only on what associations may have been activated in their minds but also on the contexts in
which they find themselves and the goals they are
trying to achieve. We reduced false recall with
DRM lists by introducing more complex tasks and
contexts. But things do not always work this way:
in other settings (e.g., psychotherapy), changing
the context of a memory task may increase false
recall rather than reduce it (Engel, 1999). Generally speaking, the accuracy of a given autobiographical memory is determined by the
likelihood that the higher-order reconstructive
tools used on that occasion will lead one to
recreate events as they actually occurred (Bahrick, Hall, & Berger, 1996; Ross, 1989). Apparently this principle holds for the DRM paradigm
as well.
Manuscript received 9 November 1999
Manuscript accepted 10 October 2000
REFERENCES
Bahrick, H.P., Hall, L.K., & Berger, S.A. (1996).
Accuracy and distortion in memory for high school
grades. Psychological Science, 7, 265–271.
Bayen, U.J., Nakamura, G.V., Dupuis, S.E., & Yang, C.
(2000). The use of schematic knowledge about
sources in source monitoring. Memory and Cognition, 28, 480–500.
Bink, M.L., Marsh, R.L., & Hicks, J.L. (1999). An
alternative conceptualization to memory ‘‘strength’’
161
in reality monitoring. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 25,
804–809.
Brainerd, C.J., & Reyna, V.F. (1998). When things that
were never experienced are easier to ‘‘remember’’
than things that were. Psychological Science, 9, 484–
493.
Deese, J. (1959). On the prediction of occurrence of
particular verbal intrusions in immediate recall.
Journal of Experimental Psychology, 58, 17–22.
Engel, S. (1999). Context is everything: The nature of
memory. New York: W.H. Freeman & Company.
Forster, J., & Strack, F. (1998). Subjective theories
about encoding may influence recognition: Judgmental regulation in human memory. Social Cognition, 16, 78–92.
Gallo, D.A., Roberts, M.J., & Seamon, J.G. (1997).
Remembering words not presented in lists: Can we
avoid creating false memories? Psychonomic Bulletin & Review, 4, 271–276.
Hicks, J.L., & Marsh, R.L. (1999). Attempts to reduce
the incidence of false recall with source monitoring.
Journal of Experimental Psychology: Learning,
Memory, and Cognition, 25, 1195–1209.
Hyman, I.E., & Billings, J. (1998). Individual differences and the creation of false childhood memories.
Memory, 6, 1–20.
Hyman, I.E., Husband, T.H., & Billings, J.F. (1995).
False memories of childhood experiences. Applied
Cognitive Psychology, 9, 181–197.
Israel, L., & Schacter, D.L. (1997). Pictorial encoding
reduces false recognition of semantic associates.
Psychonomic Bulletin & Review, 4, 577–581.
Johnson, M.K., Hashtroudi, S., & Lindsay, D.S. (1993).
Source monitoring. Psychological Bulletin, 114, 3–28.
Johnson, M.K., & Raye, C.L. (1981). Reality monitoring. Psychological Review, 88, 67–85.
Kensinger, E.A., & Schacter, D.L. (1999). When true
memories suppress false memories: Effects of ageing. Cognitive Neuropsychology, 16, 399–415.
Lampinen, J.M., Neuschatz, J.S., & Payne, D.G. (1997).
Source attributions and false memories: A test of the
demand characteristics account. Psychonomic Bulletin & Review, 6, 130–135.
Mather, M., Henkel, L.A., Johnson, M.K. (1997).
Evaluating characteristics of false memories:
Remember/know judgments and memory characteristics questionnaire compared. Memory &
Cognition, 25, 826–837.
Mather, M., Johnson, M.K., & DeLeonardis, D.M.
(1999). Stereotype reliance in source-monitoring:
Age differences and neuropsychological test correlates. Cognitive Neuropsychology, 16, 437–458.
McDermott, K.B. (1996). The persistence of false
memories in list recall. Journal of Memory and
Language , 35, 212–230.
McDermott, K.B., & Roediger, H.L. (1998). Attempting to avoid illusory memories: Robust false recognition of associates persists under conditions of
explicit warnings and immediate testing. Journal of
Memory and Language, 39, 508–520.
Melo, B., Winocur, G., & Moscovitch, M. (1999). False
recall and false recognition: An examination of the
effects of selective and combined lesions to the
162
LIBBY AND NEISSER
medial temporal lobe/diencephalon and frontal lobe
structures. Cognitive Neuropsychology, 16, 343–359.
Neuschatz, J.S., & Payne, D.G. (1996). The influence of
warnings and encoding instructions on the magnitude
of the false memory effect. Paper presented at the
Eastern Psychological Association, Philadelphia,
PA, USA.
Norman, K.A., & Schacter, D.L. (1997). False recognition in young and older adults: Exploring the characteristics of illusory memories. Memory and
Cognition, 25, 838–48.
Payne, D.G., Elie, C.J., Blackwell, J.M., & Neuschatz,
J.S. (1996). Memory illusions: Recalling, recognizing,
and recollecting events that never occurred. Journal
of Memory and Language, 35, 261–285.
Reyna, V.F., & Brainerd, C.J. (1995). Fuzzy trace theory: An interim synthesis. Learning and Individual
Differences, 7, 1–75.
Robinson, K.J., & Roediger, H.L. (1997). Associative
processes in false recall and false recognition. Psychological Science, 8, 231–237.
Roediger, H.L., & McDermott, K.B. (1995). Creating
false memories: Remembering words not presented
in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814.
Ross, M. (1989). Relation of implicit theories to the
construction of personal histories. Psychological
Review, 96, 341–357.
Schacter, D.L., Israel, L., & Racine, C. (1999). Suppressing false recognition in younger and older
adults: The distinctiveness heuristic. Journal of
Memory and Language, 40, 1–24.
Schacter, D.L., Vervaeillie, M., & Pradere, D. (1996).
The neuropsychology of memory illusions: False
recall and recognition in amnesic patients. Journal of
Memory and Language, 35, 319–334.
Sherman, J.W., & Bessenoff, G.R. (1999). Stereotypes
as source-monitoring cues: On the interaction
between episodic and semantic memory. Psychological Science, 10, 106–110.
Tussing, A.A., & Greene, R.L. (1997). False recognition
of associates: How robust is the effect? Psychonomic
Bulletin & Review, 4, 572–576.
APPENDIX A
The ten lists used in Experiment 3a
Slow
fast
lethargic
stop
listless
snail
speed
cautious
delay
sluggish
traffic
Mountain
hill
valley
molehill
climb
summit
top
peak
plain
glacier
Black
white
dark
cat
charred
night
blue
funeral
color
bottom
grief
brown
Sweet
sour
candy
sugar
bitter
good
taste
soda
tooth
nice
D octor
nurse
sick
lawyer
medicine
health
hospital
dentist
physician
ill
patient
office
Chair
table
sit
legs
seat
couch
desk
recliner
cushion
sofa
wood
Bread
butter
food
eat
sandwich
rye
jam
milk
flour
jelly
Sleep
bed
snooze
rest
awake
tired
slumber
snore
dream
wake
blanket
River
water
stream
lake
Mississippi
boat
tide
swim
flow
run
barge
creek
Man
woman
husband
uncle
lady
mouse
male
father
strong
friend
beard
Key-focus words are in italics.
THE ASSOCIATIVE FALSE MEMORY PARADIGM
APPENDIX B
The nine lists used in Experiment 3b
Slow (L)
fast
LETHARGIC
stop
LISTLESS
snail
cautious
delay
traffic
turtle
hesitant
Mountain (G)
hill
valley
molehill
climb
summit
top
peak
plain
GLACIER
Black (C)
white
dark
CAT
CHARRED
night
blue
funeral
COLOR
bottom
grief
brown
Sweet (T)
sour
candy
sugar
bitter
good
TASTE
soda
TOOTH
nice
TART
Chair (S)
table
SIT
legs
SEAT
couch
desk
recliner
custhion
wood
Bread (R)
butter
food
eat
sandwich
RYE
jam
milk
flour
jelly
Sleep (D )
bed
snooze
rest
DREAM
awake
tired
DOZE
slumber
snore
DROWSY
wake
River (B)
water
stream
lake
Mississippi
BOAT
tide
swim
flow
run
creek
D octor (M)
nurse
sick
lawyer
MEDICINE
health
hospital
dentist
physician
ill
patient
office
Letters in parentheses are other-focus letters. Key-focus words are in italics. Other-focus words
are in capitals.
163