MEMORY, 2001, 9 (3), 145–163 Structure and strategy in the associative false memory paradigm Lisa K. Libby and Ulric Neisser Cornell University, USA List-learning experiments can have several levels of structure: individual words, the gist (if any) of each list, and the task in which those lists are embedded. The usual presentation of the DRM associative paradigm (Deese, 1959; Roediger & McDermott, 1995) strongly encourages a focus on gist and produces a high rate of false recall of key words (FRK). The experiments reported here were designed to invite the use of memory strategies based on structures other than the gist and thus reduce FRK. The crucial condition of Experiment 1, short lists followed by rehearsal, encouraged a focus on individual words and produced a low rate of FRK. In Experiment 2, the lists were embedded in a guessing game, which virtually eliminated FRK. FRK was also low in Experiments 3a and 3b when participants engaged in a complex task involving the first letters of list words. The relevance of these findings to false memories in the DRM and the connection of false autobiographical memories is discussed. The DRM paradigm, originally devised by Deese (1959) and developed further by Roediger and McDermott (1995), has received much attention recently due to the robust false memory effect it creates. Participants in a DRM experiment hear word lists, each composed of 12 to 15 common associates of a single non-presented key word: the list based on the key word, sleep, includes bed, rest, awake, tired, dream, etc. but not sleep itself. On subsequent memory tests, participants often recall (e.g., McDermott, 1996: Payne, Elie, Blackwell, & Neuschatz 1996; Robinson & Roediger 1997) or recognise (e.g., Israel & Schacter, 1997; Mather, Henkel, & Johnson, 1997; Payne et al., 1996) the non-presented key word. The effect is a strong one; notably, it is not extinguished by attempts to warn participants (Gallo, Roberts, & Seamon, 1997; Lampinen, Neuschatz, & Payne, 1997; McDermott & Roediger, 1998; Neuschatz & Payne, 1996). One interpretation of the DRM effect is based on the associative nature of the lists. From this perspective, false memories of the key word arise because the key word is repeatedly activated by the presented list items; it is the total associative strength of the individual list items together that predicts the likelihood of falsely recalling the key word (Robinson & Roediger, 1997; Roediger & McDermott, 1995). However, another aspect of the paradigm has also been cited as integral to producing the false memory effect: the gist of each list. The idea is that false recall of the key word occurs when participants use a gist representation as the basis for remembering (Brainerd & Reyna, 1998; Melo, Winocur, & Moscovitch, 1999; Payne et al., 1996; Schacter, Verfaellie, & Pradere, 1996). There are several pieces of evidence that support this claim. Amnesic patients with damage to the medial temporal lobe but intact frontal lobes would be able to extract gist from a DRM list, but not retain memory for individual items. Indeed, such patients were impaired at recalling studied words from DRM lists but were more likely to falsely recall key words than were controls (Melo et al., 1999). Norman and Schacter (1997) compared the performance of younger and older adults in the DRM task. Older adults falsely recognised key words at a significantly greater rate than did Requests for reprints should be sent to Lisa K. Libby, Department of Psychology, Uris Hall, Cornell University, Ithaca, New York 14850, USA. Email: [email protected] # 2001 Psychology Press Ltd http://www.tandf.co.uk/journals/pp/09658211.html DOI:10.1080/09658210042000085 146 LIBBY AND NEISSER younger adults, a result attributed to age-related changes in memory: with age, the ability to recall specific details of previously studied items declines and there is greater reliance on memory for gist. Kensinger and Schacter (1999) found that older adults show increases in veridical recall, but not decreases in false recall with repeated presentation of DRM lists. The older adults’ continual focus on gist appears to have been responsible for their persistent false recall: younger adults, who were able to capitalise on the opportunity to gain more item-specific knowledge with repeated presentation of the lists, showed a decrease in false recall with repeated presentation, as well as an increase in veridical recall (see also McDermott, 1996). Other experiments focusing on younger adults show that, in general, false memories of the key words are less likely under conditions that make the gist less prominent: when DRM lists are mixed together (Mather et al., 1997; McDermott, 1996), when pictures are presented along with list words thus allowing participants to use memory of a particular drawing as a recognition criterion (Israel & Schacter, 1997), when memories of individual words are examined extensively (Lampinen et al., 1997; Mather et al., 1997), or in incidental learning (Tussing & Greene, 1997). These various lines of research all converge on the idea that false memories of the key word are common under conditions in which gist is the basis for memory, but less common otherwise. Our purpose in this paper is to show that the structure of the DRM list-learning situation can influence participants’ readiness to use a gistbased memory strategy, and hence affect the rate of false recall. Consider the strategies available in a typical DRM situation. Confronted with 12 or more words to remember, participants have little choice but to look for some kind of intra-list structure: verbatim memory cannot accommodate a list of this length. Moreover, the presentation of so many semantically related words makes the gist of the list quite obvious. Those participants who notice it may well adopt a gist strategy, accepting any gist-related word that comes to mind as a legitimate list member—a strategy that can easily produce false recall of the key word. With changes in the experimental situation, other strategies may become available. Results from our pilot experiments suggest that even very small changes in the structure of the situation can have such an effect. A pilot experiment in which all of the DRM lists were five words long produced false recall of the key word on only a small proportion of lists (M = .08, SD = .10). However, the same five-word lists produced a much higher rate of false recall (M = .27, SD = .20) in another pilot experiment that differed only in that the five-word lists were intermixed with DRM lists of seven and nine words. One reason for the difference in false recall rates between the two experiments may be the different strategies that the experimental situations made available. If it is clear that all lists are five words long, a strategy of rehearsing the entire list in working memory is always viable. If the length of a list is not known when it begins, participants may be more likely to focus on gist than they would otherwise: the list may turn out to be too long for a working memory strategy. The three experiments we report here were designed to vary the memory strategies made possible by the structure of the DRM situation. We predict that false recall should be much reduced in situations that divert attention from the gist and encourage different encoding and retrieval strategies. One way to do this is to divert participants’ attention away from the gist by focusing attention on the individual words of the list. A verbatim memory strategy focused on those words in particular should reduce false recall of the key word (because it is not among them). Another very different way to reduce reliance on the gist is to direct participants’ attention towards a competing higher-order structure by manipulating the task in which the list is embedded. For example, consider the list; candles, frosting, ice cream, party hats, plates, forks, spoons, streamers, balloons, confetti. The gist of this list is something like birthday party; in the DRM paradigm, false reports of the key word, cake, should occur rather frequently. However, encountering the list in another situation might lead to a very different pattern of recall. Suppose you and a friend are throwing a birthday party for someone. You go to your friend’s house to prepare for the party and find him in the middle of baking the cake. He says, ‘‘Why don’t you go to the store and get the rest of the stuff we need?’’ and proceeds to rattle off the aforementioned list. Although all of the words on the shopping list are associates of cake, when you get to the store you will not falsely remember that your friend told you to buy a cake: you know a cake is already made, and you know that the party only needs one cake. Even if the word, cake, were to pop into your mind, you would not consider this as an indication that your friend had said to buy a cake. Your understanding of the situation in which THE ASSOCIATIVE FALSE MEMORY PARADIGM you heard the shopping list precludes the possibility that cake would have been on it. This example makes the point that knowledge about the higher-order structure of a situation can influence people’s memories of information acquired in that situation. To test our hypothesis about the effect of situational structure on memory strategies and false recall with DRM lists, we conducted three experiments, each designed to shift the basis of the memory strategy away from the gist structure of the lists—either downward to individual words (Experiment 1) or upward to the task in which the lists were embedded (Experiments 2 and 3a/b). In all of the experiments we expected that directing attention away from the gist would dramatically reduce false recall of key words. In the general discussion we consider how our results relate to other theories about the role of gist in DRM false memories. We also comment on the parallels between false memories produced in the DRM paradigm and those that occur outside the laboratory. EXPERIMENT 1 If the high rate of falsely recalling key words in the DRM paradigm is related to participants’ use of a gist-based strategy, any change in procedure that makes an alternative strategy more viable should reduce the rate of falsely recalling key words. The alternative strategy of interest in Experiment 1 was reliance on working memory and rehearsal: we wanted our participants to repeat the list words to themselves as continuously as possible from the beginning of the list until the time of test. This approach should be more effective with short than with long lists, but only if the time between presentation and recall is not filled with an interfering activity. The four conditions of Experiment 1 were generated by crossing list length (short or long) with interference (present or absent). We predicted that the condition with short lists and no interference would produce the fewest false recalls of the key word, and the fewest false recalls of other non-presented words as well. Previous experiments have investigated the variables of DRM list length and distraction, but not together. Crossing these two variables allows one to vary the associative strength of lists independently of the availability of memory strategies, whereas this has arguably not been the case in previous experiments that studied the effects of 147 list length and distraction. In Experiment 1 of Robinson and Roediger (1997), DRM list length was varied from 3 to 15 words. The result was clear: the rate of falsely recalling and recognising key words increased monotonically with list length. Robinson and Roediger took these data as evidence that the associative strengths of the list words (i.e., their tendencies to elicit the key word) combine in a cumulative manner. We believe that another factor should also be considered: participants are more likely to use a gist strategy with long lists than with short ones. As list length increases, the gist structures of the list becomes more and more obvious while the alternative working memory strategy becomes less and less viable. In a second experiment, Robinson and Roediger (1997) used the same lists of words related to the key words but added unrelated filler items so that all the lists were 15 words long. The results were essentially the same as before: the rate of falsely recalling and recognising the key word increased with the number of items on the list that were related to the key word. However, this result may be due to the offsetting effects of list length and gist strength on the memory strategies participants use. On one hand, the sheer length of the filled 15-word lists would discourage a working memory strategy. On the other hand, increasing the number of filler items would make the gist structure less obvious. An important aspect of Robinson and Roediger’s (1997) procedure was the use of an interpolated distractor task: participants were given addition problems to solve during the 30 seconds between the end of each list and the beginning of recall. This type of distractor makes working memory almost useless, forcing participants to use a gist strategy. One would therefore expect interference to increase the rate of falsely recalling the key word on trials where participants would otherwise be using working memory (i.e., trials with short lists). Unfortunately, the only DRM study that has explicitly varied the presence of distraction (McDermott, 1996), used 15-word lists exclusively. McDermott’s (1996) study is also of interest for another reason. Her participants clearly used a ‘‘late working memory strategy’’ on the no-distraction trials, recalling words from the end of the list first. Indeed, McDermott notes that this strategy gives ‘‘. . . little chance for the key word to appear . . .’’ during that part of the recall (p.217). Once the contents of working memory have been 148 LIBBY AND NEISSER dumped in this way, however, a participant who is trying to remember 15 words must still depend on the gist strategy to recall the rest of the list. It is for that reason, we believe, that McDermott found no significant effect of distraction on the rate of falsely recalling the key word. In our Experiment 1 we expected that, as in McDermott’s study, long lists would encourage reliance on gist even without distraction. However, we predicted that in the short-list conditions of our Experiment 1, participants would falsely recall the key word less often under no-distraction conditions than they would when distraction was present. Method Participants. A total of 72 Cornell undergraduates, 20 males and 52 females, were given extra course credit for participating in the experiment. The first 36 participants were randomly assigned to the short-list/distraction and short-list/rehearsal conditions; the next 36 were randomly assigned to the long-list/distraction and long-list/rehearsal conditions. Design and materials. The stimulus materials were based on the middle 18 lists from Roediger and McDermott’s (1995) Appendix. The key words for these lists are spider, needle, cold, doctor, high, foot, soft, fruit, mountain, man, sleep, chair, river, music, girl, slow, rough, and king. Each list consists of 15 associates of its key word, arranged in descending order of associative strength. The full chair list, for example, is table, sit, legs, seat, couch, desk, recliner, sofa, wood, cushion, swivel, stool, sitting, rocking, bench. All participants were presented with 18 lists in the same random order in which the key words have just been listed. In the long-list/distraction and long-list/rehearsal conditions the lists were all 15 words long (like the chair list just given). In contrast, the lists in the short-list/distraction and short-list/rehearsal conditions varied in length. Participants in the short-list conditions heard six lists of length 6, six of length 7, and six of length 8; these were always the first 6, 7, or 8 words from the corresponding Roediger-McDermott list. As explained in the introduction, pilot experiments had shown that mixing list lengths in this way increases the rate of falsely recalling key words. Because one aim of the present experiment was to show a difference between distraction and rehearsal conditions on short lists, we adopted the mixed-length design to avoid a floor effect on false recall rates for short lists. The assignment of lists to lengths and the sequence of lengths actually presented were counterbalanced across participants so that (a) each Roediger-McDermott list was used once at each length, and (b) each length appeared equally often at each position in the sequence of 18 lists. Procedure. Groups of 1 to 6 participants were tested together. The lists, which had been recorded in a female voice at a rate of approximately 1.5 s per word, were presented by tape recorder. Each list was followed by a 30 s interval during which the participants either counted or rehearsed (see later); this was followed by a signal to recall the list. To avoid rushed recall in the long-list conditions and idle waiting time in the short-list conditions, the time allotted for recall was varied by condition. (Pilot testing showed that the long lists took longer to recall than did the short lists. When we provided short-list participants with the full amount of recall time required by the long-list participants, short-list participants did not use the extra time to recall. Rather, they finished writing in about 30 s and then appeared to turn their attention to other matters—looking out of the window, attempting to open up a book. They did not return to the recall task.) In the present experiment, 40 s were allotted for recall in the two short-list conditions and 70 s were given in the two long-list conditions. After the recall interval, the next list began. In the long-list/distraction condition, participants were told that they would be asked to remember some lists of words and to do some arithmetic; they should do their best on both parts because the relationship between the two abilities was under study. Each 15-word list was followed by a different 3-digit number. Participants wrote down the number and immediately began counting backwards from it by sevens, writing down each number along the way. (Given ‘‘107’’, for example, they were to write 107, 100, 93, 86, etc.) After 30 s the experimenter said ‘‘recall’’; the participants stopped counting backwards and wrote down as many words from the preceding list as they could. As in the typical DRM experiment, participants were given a standard warning (subsequently used in all experiments reported here) to be careful that the words they wrote down were actually from the list and not to guess. After 70 s for recall, the tape THE ASSOCIATIVE FALSE MEMORY PARADIGM started again with the next list. Booklets of response sheets were provided; columns for each list were marked with spaces for the counting and recall tasks. A practice trial, using a list of unrelated words with no obvious relation to the lists of the main experiment, was given at the end of the instructions. Participants in the long-list/rehearsal condition were told that this was an experiment on memory, that they would be asked to remember some lists of words, and that they should do their best. Instructions were similar to those just described, except that counting backwards was omitted. There was an empty 30 s interval between the end of each list and the recall signal; participants were told that they could use this interval to practice the words to themselves. Similar procedures were followed in the shortlist/distraction and short-list/rehearsal conditions, except that the lists varied in length as noted earlier and participants were given 40 s for recall. For both long- and short-lists, all of the members of a testing group were assigned either to count or rehearse in the 30 s following list presentation; groups were randomly assigned to conditions. Results False recall of key words. For each condition, the mean proportion of (the 18 total) lists on which the key word was falsely recalled is presented in Table 1. False recall of key words 149 occurred rather frequently in the long-list/ distraction and long-list/rehearsal conditions (Ms = .33 and .38, SDs = .21 and .19), somewhat less often (M = .28, SD = .19) in short-list/ distraction, and rarely (M = .12, SD = .09) in short-list/rehearsal. A 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) ANOVA revealed that the main effect of list length on false recall of the key word is highly significant, F(1, 68) = 14.5, p < .001, whereas that of interpolated activity is not. More important, the predicted interaction between list length and activity is significant, F(1, 68) = 6.74, p < .01; this reflects the unique status of the shortlist/rehearsal condition with its very small number of false reports. The rate of falsely recalling the key word in the short-list/rehearsal condition is significantly lower than in the shortlist/distraction condition; t(34) = 3.26, p < .003; and also significantly lower than in the two longlist conditions, ts(34) > 4.00, ps < .001. The rates of falsely recalling the key word in the short- and long-list/distraction condition do not differ significantly, t(34) = .76, nor do the rates in the longlist/distraction and rehearsal conditions, t(34) = .80. Considering just the short-list conditions, false recall rate was submitted to a 2 (interpolated activity: distraction, rehearsal) 6 3 (list length: 6, 7, 8) ANOVA with repeated measures on the second variable. This analysis showed a main effect for list length, F(2, 68) = 4.78, p < .01, and a main effect for activity, F(1, 34) = 10.75, p < .002, with no significant interaction. However, within TABLE 1 Experiment 1 List length Presentedc Word type Key non-presentedd Other non-presentedd Distraction Longa Shortb .55 (.06) .76 (.09) .33 (.21) .28 (.19) .30 (.21) .21 (.18) Rehearsal Longa Shortb .57 (.07) .88 (.04) .38 (.19) .12 (.09) .35 (.19) .12 (.12) Mean rates of recall for presented and non-presented words in Experiment 1 as a function of list length and interpolated activity. n = 18 in each condition. Values enclosed in parentheses represent standard deviations. a Lists were 15 words long. b Lists were 6, 7, and 8 words long. c Proportions are out of 270 words total in the long condition and out of 126 words total in the short condition. d Proportions are out of 18 lists total. 150 LIBBY AND NEISSER the short-list/distraction and short-list/rehearsal groups there is only one case in which rates of falsely recalling the key word differ significantly by list length.1 In addition, the critical 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) ANOVA on false recall discussed earlier was recalculated using each of the short list lengths alone and the pattern of results did not change. Thus, for the sake of simplicity, within each of the two short-list conditions data for the three different list lengths were combined in the crucial analyses. False recall of other non-presented words. The mean number of other non-presented words besides the key words that were falsely recalled per list is also shown in Table 1. The pattern across conditions is similar to that observed for rates of falsely recalling the key nonpresented words: the rate is lowest in the shortlist/rehearsal condition. A 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) 6 2 (non-presented item type: key word, other non-presented) ANOVA with repeated measures on the last variable showed no significant interactions involving the non-presented item-type factor. (There is a significant main effect of length; F(1, 68)= 18.36, p < .001; a significant interaction of length and activity; F(1, 68) = 6.07, p < .02; and a marginally significant main effect of non-studied item type, F(1, 68) = 3.07, p < .08.) In addition, a 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) ANOVA on false recall of non-presented words other than the key words revealed a similar pattern of results to that obtained from the comparable ANOVA on false recall of the key words. There is a significant effect of list length on false recall of other non-presented words; F(1, 68) = 13.74, p < .001; this is qualified by a marginally significant interaction between list length and activity, F(2, 68) = 2.97, p < .09. There is no significant main effect of activity. Veridical recall of list words. Table 1 shows the mean proportions of actual list words that were correctly recalled in the various conditions. As might be expected, these proportions are higher with short than with long lists, and also higher in the rehearsal than in the distraction conditions. A 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) ANOVA showed that the main effects of list length and interpolated activity are both highly significant, Fs(1, 68) = 266.0 and 17.2, ps < .001. The interaction is also significant, F(1, 68) = 10.3, p < .01, reflecting the high rate of veridical recall in the short-list/rehearsal condition. The rate of veridical recall is significantly higher the short-list/ rehearsal condition than in the short-list/distraction condition; t(34) = 5.29, p < .001; and also significantly higher than in the two long-list conditions, ts(34) > 16.21, ps < .001. Considering just the short-list conditions, veridical recall rate was submitted to a 2 (interpolated activity: distraction, rehearsal) 6 3 (list length: 6, 7, 8) ANOVA with repeated measures on the second variable. This analysis showed main effects for list length, F(2, 68)= 27.64, p < .001, and for activity, F(1, 34) = 36.66, p < .001, with no significant interaction. Within the short-list/distraction and short-list/rehearsal groups veridical recall rates differ significantly according to list length in all but one case.2 However, again, the critical 2 (list length: short, long) 6 2 (interpolated activity: distraction, rehearsal) ANOVA on veridical recall discussed earlier was recalculated using each of the short list lengths alone and the pattern of results did not change. Thus, for the sake of simplicity, within each of the two short-list conditions data for the three different list lengths were combined in the crucial analyses. 1 For the short-list/distraction condition, there was no significant difference between the rates of falsely recalling the key word at length 6 (M = .24, SD= .22) and length 7 (M = .25, SD= .22); between the rates at length 7 and length 8 (M = .35, SD= .29), or between the rates at length 6 and length 8, ts(17) < 1.64, ps > .12. For the short-list/rehearsal condition, the difference between the rate of falsely recalling the key words at length 6 (M = .06, SD= .02) and length 7 (M = .11, SD= .11) was not significant, nor was the difference between the rate of falsely recalling the key words at length 7 and length 8 (M = .18, SD= .18), ts(17) < 1.43, ps > .15. There was a significant difference between the rates at length 6 and length 8, t(17) = 2.61, p < .02. 2 For the short-list/distraction condition, there was no significant difference between veridical recall at length 6 (M = .81, SD= .10) and length 7 (M = .78, SD= .09); the difference between veridical recall rates at length 7 and length 8 (M = .78, SD= .09) was significant, as was the difference between the rates at length 6 and length 8, ts(17) > 3.37, ps < .004. For the short-list/rehearsal condition, the difference between veridical recall rates at length 6 (M = .95, SD= .04) and length 7 (M = .89, SD= .07) was significant, as was the difference between rates at length 7 and length 8 (M = .83, SD = .06), and at lengths 6 and 8, ts(17) > 1.51, ps < .005. THE ASSOCIATIVE FALSE MEMORY PARADIGM Discussion When given short lists and time to rehearse before responding, participants in Experiment 1 falsely recalled the key words significantly less often than did participants given the same short lists with distraction, and significantly less often than participants given long lists with or without distraction. A pure associative account would have predicted a list-length effect: there are fewer associations leading to the key word in shorter lists. However, the interaction between list length and interpolated activity is not so easily explained on this basis. In the two short-list conditions the associative strength of the lists was the same, yet false recall of the key words occurred significantly less often in the short-list/rehearsal condition than in the short-list/distraction condition. In combination with this result, rates of falsely recalling the key word in the long-list conditions make the point that the effect of rehearsal depended on its strategic value. Allowing for rehearsal on long lists when list length alone precluded a verbatim memory strategy did not significantly affect the rate of falsely recalling the key word. Although one might have expected rehearsal always to strengthen associations to the key word and thus increase false recall across the board, the data are more consistent with a strategy account. When lists were short, rehearsal maximised the attractiveness of a verbatim memory strategy, thereby reducing reliance on the gist of the list, and reducing false recall as well.3 If this interpretation is correct, recall in the short-list/ rehearsal condition should be more accurate in all respects. Indeed, compared with participants in the short-list/distraction condition, participants in the short-list/rehearsal condition not only falsely recalled the key word less often, but also correctly recalled list words more often. In addition, similar to the pattern for falsely recalling the key word, false recall of other non-presented words was lowest in the short-list/rehearsal condition. 3 An anonymous reviewer expressed concern about the confound between recall time and list length in Experiment 1. The reviewer suggested that participants given less time to recall will be less likely to falsely recall the key word, and that this could account for our results. The difference in false recall between the two short-list conditions (which had the same amount of recall time) together with the lack of difference between the short- and long-list distraction conditions (which had different amounts of recall time) is inconsistent with this alternate interpretation. 151 Results from Experiment 1 are consistent with other research showing that increasing the distinctive features of presented words reduces false memories in the DRM paradigm (Hicks & Marsh, 1999; Israel & Schacter, 1997; McDermott, 1996). In our next experiment we go on to test the idea that false recall of key words can also be reduced in a very different way that does not rely on distinctive characteristics of presented words. EXPERIMENT 2 The gist of each list is the prominent higher-order structure in the typical DRM situation and encourages false recall of the key word. However, a different higher-order organisation of the list— one that does not subsume the key word—may reduce the likelihood of falsely recalling the key word. The party-planning example from the introduction is consistent with this claim; Experiment 2 empirically tests it. Here, words from DRM lists were presented as clues to a secret word that the experimenter had in mind. Each participant’s task was to guess this one secret word that was related to all the clues but not actually presented. (In fact, of course, the secret word was the key word.) A plausible strategy with such a task is to use the first few list words to determine a candidate guess for the secret word and then attend to the remaining list words to make sure the candidate is not among them. A participant using this strategy can be very certain that his or her candidate guess (usually, but not always, the key word) has not occurred on the list of clues. Even though some attention to the gist of the list is necessary to establish a candidate word in the first place, that word is remembered as having a particular status incompatible with appearance on the list itself. This strategy should effectively prevent false recall of the word guessed as the secret word, regardless of list length. The participant does not have to remember all the list words; most of them need only be checked against the candidate word being held in mind. In contrast, veridical recall should be affected by list-length just as in the rehearse conditions of Experiment 1. To be sure, participants may sometimes hit on words other than the key words as candidates for the secret words. On such trials, the candidate words should not be falsely recalled, but false recalls of the key word should be just as likely as in the rehearse conditions of Experiment 1. Finally, due to the strategy we expect participants to adopt, guessing 152 LIBBY AND NEISSER the key word as the secret word should selectively eliminate false recall of the key word; it should not influence the rate at which other non-presented words are falsely recalled. Method Participants. A total of 37 Cornell undergraduates, 20 males and 17 females, were given extra course credit for participating in the experiment. 18 were randomly assigned to the short-list/game condition and 19 to the long-list/ game condition. Materials. The same lists were used as in Experiment 1. Participants in the long-list/game condition heard the lists that had been used in the long-list/rehearsal and long-list/distraction conditions. Participants in the short-list/game condition heard the same counterbalanced sets of lists as in the short-list/rehearsal and short-list/distraction conditions. Procedure. Groups of 1 to 6 participants were tested together. All participants were given 18 lists. The overall procedure was the same as in Experiment 1. Participants heard a list, there was a 30 s interval, the experimenter gave a signal to recall the list; then after either 40 s (short-list condition) or a maximum of 70 s (long-list condition) for response, the next list began. The important difference between Experiments 1 and 2 is in the instructions given to participants. Experiment 2 was explained as a game in which the experimenter would be thinking of a secret word and the participants were each to try to figure out what that word was from a list of clues. The experimenter (LKL, who had also recorded the lists for Experiment 1) read the lists aloud at a rate of approximately 1.5 s per word. After presentation of the list there were 30 s for participants to decide on their guesses for the secret word. Then, when the experimenter said ‘‘answer’’, participants recorded their guesses for the secret word and also as many of the clues as they could remember. Participants worked independently and were given the standard warning against guessing used in Experiment 1. Booklets of 18 response sheets were provided, each marked with a blank for the secret word (‘‘What’s my word?____’’) and a space beneath to write the clues. The front page of the booklet showed an example of a completed answer sheet: ‘‘bread’’ was filled in as the secret word and the words from Roediger and McDermott’s (1995) bread list (not used in the main experiment here) were filled in as the clues. The experimenter explained that a clue could be related to the secret word in any number of ways: as an opposite, an exemplar, a descriptor, or just a word that often occurs with secret word. Before the experimental trials began, a practice trial was given; it was based on the Roediger-McDermott list for sweet, which was not used in the main experiment. Results Trials on which a participant guessed correctly (i.e., chose the key word as the secret word) will be called C-trials. Out of the 18 total trials, the mean number of C-trials per participant in the long-list/ game condition is 14 (SD = .63); in the short-list/ game condition the mean number is 10 (SD = .78). Trials on which a participant failed to guess the key word will be called X-trials (i.e., for an individual participant the number of Xtrials = 187the number of C-trials). (One participant guessed the key word correctly on every trial. Thus, this participant has no X-trials and is excluded from all analyses involving X-trials.) Most of the X-trials (52% in long-list/game, 82% in short-list/game) occurred when participants guessed plausible related words that were not on the list. Others occurred when participants— despite the instructions—guessed a clue word from the list, or simply failed to respond. For each participant, recall performance was calculated separately for C-trials and for X-trials. A participant’s C-trial rate of falsely recalling the key word is the proportion of his or her C-trials on which he or she falsely recalled the key word. A participant’s C-trial veridical recall rate is the proportion of words from his or her C-trial lists that he or she correctly recalled. Individual scores for X-trials were calculated in the same manner. False recall of key words. Table 2 shows that participants in both conditions almost never falsely recalled the key word on C-trials (long-list/ game: M = .01, SD = .05; short-list/game: M = .005, SD = .02). (Of the 37 participants, 34 did not falsely recall any of the key words; 2 participants did so a total of three times in the long-list condition, and 1 did so once in the short-list condition.) On X-trials, however, participants falsely recalled the key word more frequently: on an average of .41 THE ASSOCIATIVE FALSE MEMORY PARADIGM 153 TABLE 2 Experiment 2 Presentedc Word type Key non-presentedd Other non-presentedd C-trials Longa Shortb .58 (.06) .81 (.14) .01 (.05) .005 (.02) .31 (.29) .08 (.10) X-trials Longa Shortb .45 (.13) .75 (.11) .41 (.29) .15 (.12) .32 (.55) .16 (.13) List length Mean rates of recall for presented and non-presented words when key word guessed (C-trials) and key word not guessed (X-trials) in the two conditions of Experiment 2. n = 18 or 19 in each condition. a Lists were 15 words long. b Lists were 6, 7, and 8 words long. c Proportions are out of the total number of words presented per trial type. d Proportions are out of the total number of lists per trial type. (SD = .29) of the X-trials in the long-list/game condition and on an average of .15 (SD = .12) in the short-list/game condition. A 2 (list length: short, long) 6 2 (trial type: C-trial, X-trial) ANOVA with repeated measures on the second variable showed that both main effects on rate of falsely recalling the key word are highly significant, Fs (1, 34) > 13.25, ps < .001. These main effects are qualified by a significant interaction; F(1, 34) = 10.71, p < .002; reflecting the fact that list length did not significantly affect the rate of falsely recalling the key word on C-trials, t(35) = .8, but list length did have an effect on Xtrials, t(35) = 3.5, p < .001. On C-trials, long- and short-list/game participants falsely recalled the key word on a far smaller proportion of lists than did participants in the comparable rehearse conditions of Experiment 1 (long conditions: .01 vs .38; short conditions: .005 vs .12). A 2 (list length: short, long) 6 2 (activity: game, rehearse) ANOVA showed that both main effects and the interaction are significant, Fs (1, 68) > 24.94, ps < .001. As predicted, however, the corresponding comparisons for X-trials produced a very different result: similar rates of falsely recalling the key word in game and rehearse (long conditions: .41 vs .38; short conditions: .15 vs .12). Indeed, a 2 (list length: short, long) 6 2 (activity: game, rehearse) ANOVA showed only a significant effect of length, F(1, 68) = 34.3, p < .001, in this case. False recall of other non-presented words. The mean number of other non-presented words besides the key words that were falsely recalled per list on C- and on X-trials is also shown in Table 2. A 2 (list length: short, long) 6 2 (trial type: C-trial, X-trial) ANOVA with repeated measures on the second factor showed only a significant effect of list length, F(1, 34) = 5.68, p < .023. In contrast to the comparable analysis of false recall of the key word, the main effect of trial type on false recall of other non-presented words is not significant, nor is the interaction effect, Fs (1, 34) < .43. This result is consistent with the prediction that the effect of guessing the key word as the secret word would be different for false recall of the key word than for false recall of other non-presented words.4 The rate of falsely recalling other non-presented words in the C-trials of the game and in the rehearse conditions of Experiment 1 were submitted to a 2 (list length: short, long) 6 2 (activity: game, rehearse) ANOVA. The only significant effect was a main effect of list length, F(1, 69) = 26.32, p < .001; the corresponding analysis using the game X-trials also showed only a main effect of list length, F(1, 69) = 7.27, p < .009. Thus, for both C- and X-trials of Experiment 2, the rates of falsely recalling other non-presented words at each list length are comparable to the 4 This contrast between the patterns of false recall observed for false recall of key words and other non-presented words is reflected in the marginally significant three-way interaction obtained from a 2 (list length: short, long) 6 2 (trial type: Ctrial, X-trial) 6 2 (non-presented word type: key, other) ANOVA with repeated measures on the second two factors, F(1, 34)= 3.64, p < .06. The two-way interaction between nonpresented word type and trial type was significant, F(1, 34)= 7.00, p < .01. The only other signficant effect was a main effect of non-presented word type, F(1, 34)= 24.47, p < .001. 154 LIBBY AND NEISSER rates observed in the corresponding rehearsal conditions of Experiment 1. This contrasts with the pattern of rates of falsely recalling the key word: only the X-trials of Experiment 2 show comparable rates to those observed at each list length in Experiment 2, C-trials do not. Veridical recall. Table 2 shows that participants correctly recalled a larger proportion of the list words on C-trials than on X-trials. A 2 (list length: short, long) 6 2 (guessing performance: Ctrials, X-trials) repeated measures ANOVA showed that the main effect of guessing performance on veridical recall is significant, F(1, 34) = 26.5, p < .001. The main effect of list length is also significant, F(1, 34) = 67.2, p < .001: participants in short-list/game recalled a larger proportion of list words than did participants in long-list/game on C- and on X-trials. There is no significant interaction. A 2 (list length: short, long) 6 2 (activity: game, rehearse) ANOVA showed that, on the C-trials in the game conditions and in the rehearse conditions of Experiment 1, list length significantly affected the proportion of list words correctly recalled; F(1, 69) = 185.1, p < .001; activity had no significant effect. The interaction is not significant. A similar analysis for X-trials showed main effects for both list length; F(1, 68) = 197.6, p < .001; and activity; F(1, 68) = 32.9, p < .001; with no significant interaction: veridical recall was better on short lists and in the rehearse condition. Thus, as predicted, the effect of list-length on veridical recall does not differ between Experiments 1 and 2, regardless of whether C- or X-trials of Experiment 2 are considered. Discussion Experiment 2 showed that presenting DRM lists in a context emphasising a higher-order structure that excludes the key word can virtually eliminate false recall of the key word, even with long lists. When participants proposed the key word as the experimenter’s secret word they almost never falsely recalled the key word, regardless of whether the DRM list was long or short (i.e., regardless of the associative strength of the list). In contrast, when participants did not propose the key word as the secret word, they were no better at avoiding falsely recalling the key word than were participants in the comparable rehearse conditions of Experiment 1, where the higher-order gist structure included the key word. Considering C- and X-trials from both conditions of Experiment 2 together strengthens the claim that the game reduced false recall of the key word due to an effect of higher-order structure on memory strategies. Across both conditions on all trials, the words that participants proposed as the secret words were almost never listed among the clues. Even on those X-trials where participants falsely recalled the key word, they did know that they had not heard their own candidate words. According to our reasoning, the game condition eliminated false recall of the key word on Ctrials because the higher-order structure excluded that word in particular. If this were the case, then veridical recall rates on C-trials in the long- and short-list/game conditions should not be different from those observed in the comparable rehearse conditions of Experiment 1; indeed they were not. Analyses of false recall of non-presented words other than the key word are also consistent with our reasoning. Unlike rates of falsely recalling the key word, rates of falsely recalling other nonpresented words were not affected by whether or not the participant guessed the key word as the secret word. Moreover, on both C- and X-trials, the pattern of falsely recalling other nonpresented words was no different from the rates observed in the comparable rehearsal conditions of Experiment 1. The effect of our game instructions could be described as a warning to participants about the special structure of the lists. Previous experiments have used explicit warnings and, in general, have found little effect on false reports of the key word. Comparing Experiment 2 with these earlier warning studies suggests that what matters is not the warning itself, but the kinds of strategies that warning makes available. Our game task not only warned participants about the structure of the DRM lists, but also suggested a specific strategy for thinking about them: to pick one word and make sure it is not on the list. This strategy enabled participants to avoid the key word on trials in which they had guessed the key word as the secret word. In contrast, the design of most previous warning experiments (Gallo et al., 1997; Lampinen et al., 1997; Neuschatz & Payne, 1996) limited the strategic value of the warning: long study lists composed of up to 10 DRM lists were presented back to back. The presentation of so many list words apparently forces participants to focus on gists, thus making it difficult to keep THE ASSOCIATIVE FALSE MEMORY PARADIGM key words separate from list words. Gallo et al. (1997) did find an effect of warning in this case, but it was far from an elimination. Neuschatz and Payne (1996) found no effect of their warning, perhaps in part because the directions did not specify that there was only one key word per category. Neuschatz and Payne also suggested participants pay special attention to the words they did hear; in contrast, our instructions focused attention on words not presented. Finally, Lampinen et al.’s (1997) warning presentation has very little strategic value. Not only did participants hear multiple DRM lists back to back, but the warning was not given until after these lists were presented. Once participants have focused on the gist at list presentation, they have little hope of distinguishing key words from the words actually presented. The only tests of an explicit warning in a single-list format are McDermott and Roediger’s (1998) Experiments 2 and 3, which did not eliminate false recognition of the key word. Although McDermott and Roediger used a recognition measure, it seems likely that the same sort of organisational strategy was made available by their warning as by our game task. In our interpretation, such warnings should be only as effective in eliminating false recall of the key word as participants are in guessing what the key word is. In our experiment participants did not always correctly figure out what the key word was— apparently, neither did McDermott and Roediger’s participants. In any case, the guessing task used in Experiment 2 is not the only higher-order task structure that can be used to reduce false recall of the key word with DRM lists. Another such task structure was explored in Experiments 3a and 3b. EXPERIMENT 3A The gist of each list in Experiment 2 provided key information about the secret word. On trials in which participants took full advantage of that information and proposed the key word as the secret word, there were virtually no false reports that the key word had been on the list. This finding contrasts with the high rates of falsely recalling the key word in most other DRM studies. In our view, these results reflect the particular strategy that was adopted by our participants in response to the guessing-game structure. The game in Experiment 2 provided a higher-order 155 structure for the DRM lists that excluded the key words. However, the game instructions also explicitly communicated to participants the special construction of the DRM lists: all the presented words are related to one key nonpresented word. In this sense, the game instructions could be interpreted as a warning about being misled by the gist. The purpose of Experiment 3a, in which the gist structure of the lists was never mentioned at any time, was to show that changes of higher-order structure can reduce the rate of falsely recalling the key words even without such a warning. The DRM lists of Experiment 3a were again embedded in a task that provided a higher-order structure for the list that competed with the gist organisation. This time, however, that task had nothing to do with the semantic structure of the lists: it only concerned the first letters of individual words. Before presenting each list, the experimenter announced a focus letter. (In fact, this was always the first letter of the key word for that list.) Words that began with the focus letter were called focus words. The participants’ tasks were (a) to count and report the number of focus words on the list, (b) to remember those focus words in the order in which they were presented, and (c) to remember as many of the other list words as possible. Paying special attention to the focus words during encoding should keep them in the forefront of working memory; loosely speaking, working memory would ‘‘contain’’ all the focus words a participant has heard in the current list. The availability of these words in working memory would make a particular inference possible at retrieval if the key word should come to mind and be considered as a possible list word. Noticing that the key word began with the focus letter, the participant could check whether the key word was among the focus words still being held in working memory. As it would not be, he or she could conclude (correctly) that the key word had not appeared on the list. The stimulus material for Experiment 3a consisted of DRM lists that were 9, 10, and 11 words long. There were one, two, three, or four focus words per list. Participants in the key-focus condition were given the special instructions described earlier; those in the control condition were given standard DRM instructions. We predicted that key-focus participants would produce substantially fewer false reports of the key word than would control participants. 156 LIBBY AND NEISSER Method Participants. A total of 36 Cornell undergraduates, 10 males and 26 females, were given extra course credit for participating in the experiment. Of these, 18 were randomly assigned to the key-focus condition and 18 to the control condition. Materials. Fifteen of Roediger and McDermott’s (1995) lists include at least one word that starts with the same letter as the key word; some lists have two, three, or four such words. Ten of these lists, representing a variety of initial focus letters, were modified for use in Experiment 3a (number of focus words in parentheses): slow (4), mountain (1), black (3), sweet (3), doctor (1), chair (2), bread (1), sleep (3), river (1), man (2). The lists were 9, 10, or 11 words long (list length was varied for similar reasons as in Experiment 1). These were not necessarily the first 9/10/11 words of the corresponding Roediger-McDermott list; it was sometimes necessary to replace earlier words with later ones to achieve adequate variation in the number and placement of focus words. The actual lists used are shown in Appendix A. Procedure. In both conditions, groups of 1 to 6 participants were tested together. The lists, which LKL had recorded in the order mentioned earlier at a rate of approximately 1.5 s per word, were presented by tape recorder. Each list was followed by a 30 s blank interval, after which the experimenter gave a signal to recall the list. Participants had 40 s to do so before the next list began. In the key-focus condition, the experimenter explained that for each list participants were to count how many words began with the focus letter, to remember those words in the order they were presented, and then to remember as many of the other words as possible. Appropriate response sheets showing the focus letter for each list were provided; these sheets had specific areas for each list in which to write the number of focus words, the focus words themselves, and the rest of the words from the list. Participants were given the standard warning against guessing used in the preceding experiments. The experimenter reminded participants of the appropriate focus letter before each list began. A practice trial, using words unrelated to each other and unrelated to any of the experimental lists, was given before the main experiment. In the control condition, the experimenter explained that this was an experiment on memory, that participants would be asked to remember some lists of words, and that they should do their best. The response booklets simply provided spaces in which to write down the list words. Participants were given the same warning against guessing and the same practice list as in the keyfocus condition. Results False recall of key words. Table 3 shows that the proportion of lists on which participants falsely recalled the key word was much lower in the key-focus (M = .09, SD = .08) than in the control condition (M = .22, SD = .15). This difference is highly significant (U = 79, p < .01), as shown by a Mann-Whitney U-test. (This test was used due to unequal variances in the two groups: the majority of key-focus participants never or only once falsely recalled the key word, whereas most of the control participants did so from two to five times.) Across participants in the key-focus condition, there were only 16 falsely recalled key words. Of these, 10 were reported in the focus-word section of the response sheets; in all 10 cases, the participants’ reports of the number of focus words indicated that they had counted the key word among them. The remaining six false recalls of the key word appeared among the ‘‘other’’ words on the response sheets. On five occasions, key-focus participants listed actual focus words among the ‘‘other’’ words. TABLE 3 Experiments 3a & b Condition Presenteda Word type Key non-presentedb Experiment 3a Control Key-focus .73 (.07) .62 (.09) .22 (.15) .09 (.08) Experiment 3b Control Key-focus Other-focus .71 (.10) .65 (.08) .65 (.07) .20 (.19) .07 (.14) .18 (.15) Mean rates of recall for presented and non-presented words in each condition of Experiments 3a and 3b. n= 18 or 19 in each condition. Values enclosed in parentheses represent standard deviations. a Lists were 9, 10, and 11 words long. Proportions are out of 90 words total. b Proportions are out of 9 key words total. THE ASSOCIATIVE FALSE MEMORY PARADIGM Veridical recall. In Experiments 1 and 2, the conditions with the lowest rates of false recalling the key word also had the highest rates of veridical recall. This was not the case in Experiment 3a, however. Table 3 shows that the key-focus condition had significantly fewer veridical recalls than the control condition, Ms = .62 and .73, SDs = .09 and .07, respectively; t(34) = 4.45, p < .001. Discussion As predicted, introduction of the focus-letter task substantially reduced the rate of falsely recalling the key word. Nevertheless, that rate did not go to zero: across all participants, there were 16 false recalls of the key word in the 180 trials of the keyfocus condition. The fact that most of these falsely recalled key words were reported as focus words suggests that our initial analysis of the strategies available in this condition may have been incomplete. The strategy we described (checking a key word that comes to mind during recall against the focus words held in working memory) will prevent false reports of the key word only if the focus words in working memory are the ones that actually appeared on the list. If the key word had already come to mind during list presentation it might have been stored in working memory along with the real focus words, and later reported. Just this seems to have happened on 10 occasions in the key-focus condition. That the majority of false recalls of the key word (10/16) followed this pattern suggests that false recall of key words in the DRM paradigm may often be produced at encoding. The fact that our key-focus instructions reduced veridical recall of the list words (as well as false recall of the key word) probably reflects a simple interference effect. The additional task of counting and keeping track of the focus words, required in this more difficult condition, has much in common with the main task of remembering the list words. Such a conflict would be expected to produce a certain amount of interference, and it apparently did so. But, whatever its cause, this reduction in veridical recall introduces a competing explanation for the drop in the rate of falsely recalling the key word. Is it possible that both recall rates went down as a simple result of increased task difficulty? Experiment 3b was conducted as a direct empirical test. 157 EXPERIMENT 3B We believe that the key-focus task in Experiment 3a reduced false recall of the key word because the task made certain strategies involving the focus letter available to participants. However, the decrease in veridical recall suggests another possibility. Perhaps the difficulty of the key-focus task reduced false recall of the key word simply by causing participants to report fewer words overall. To test this hypothesis, we devised a task that taxed working memory to the same extent as the key-focus task of Experiment 3a but did not confer the same strategic benefits. This was a focusletter task in which the focus letters were not the first letters of the key words. For Experiment 3b we modified the focus lists from Experiment 3a such that each list contained two alternative sets of focus words—one set that began with the first letter of the key word and one set that began with a different letter. Accordingly, Experiment 3b had two focus conditions: participants were given either the key-word letters (keyfocus) or the other letters (other-focus) as focus letters. Participants in a third (control) condition did not engage in the focus task at all but were simply asked to listen to the lists during presentation, as were control participants in Experiment 3a. We expected that the focus task would have adverse effects on veridical recall (compared to the control task), regardless of whether the focus letter was the first letter of the key word or not. Nevertheless, the control and other-focus conditions should produce comparable rates of falsely recalling the key words, and both should be substantially higher than the rate in the key-focus condition. Method Participants. A total of 55 Cornell undergraduates, 24 males and 31 females, were given extra course credit for participating in the experiment. They were randomly assigned to one of three conditions: 18 to control and other-focus, 19 to key-focus. Materials. The lists used in Experiment 3a were modified so that each list contained two sets of an equal number of focus words: the words in one set began with the first letter of the key word and the words in the other set began with a different letter. This arrangement could only be 158 LIBBY AND NEISSER achieved in 9 of the 10 lists used in Experiment 3a, so participants in Experiment 3b were given only 9 lists, which are shown in Appendix B. Procedure. In all conditions, groups of 1 to 8 participants were tested together. In the control condition, participants were given the same type of answer booklets and directions as were control participants in Experiment 3a. In the key-focus and other-focus conditions, participants were given the same type of answer booklets and directions as were the focus participants in Experiment 3a. The only difference between the key- and other-focus conditions was in the focus letters given to participants. As in Experiment 3a, lists were presented on a tape recorder at a rate of 1.5 s per word; there was a blank 30 s interval after each list, followed by a 40 s interval in which participants recalled words from that list. Results False recall of key words. Table 3 shows that the mean proportion of the total nine lists on which participants falsely recalled the key word was much lower in the key-focus condition (M = .07, SD = .14) than in the other two conditions, where the rates were similar (other-focus: M = .18, SD = .15; control: M = .20, SD = .19). A Kruskal-Wallis One-Way ANOVA showed a significant effect of condition on false recall of the key word, w2 (2) = 9.06, p < .01. A Mann-Whitney U-test revealed no significant difference between the rates of falsely recalling the key word in the control and other-focus conditions, U = 161.5, ns. However, the planned contrast between the rate of falsely recalling the key word in the key-focus condition and that in the other two conditions combined showed that the difference was highly significant, U = 179, p < .01. As in Experiment 3a, non-parametric tests were used here due to unequal variances. The number of key words falsely recalled by participants in the control and other-focus conditions ranged from 0 to 6. In contrast, 12 of the 19 participants in the key-focus condition never falsely recalled the key word; 5 of them falsely recalled the key word only once. (It should be noted that the score of one unusual keyfocus participant, z = 3.6, greatly affected the mean proportion of lists on which the key word was falsely recalled in the key-focus condition. When this participant is removed, the mean rate is falsely recalling the key word in this condition drops to .04, SD = .07.) Across all key-focus participants, there were only 12 falsely recalled key words. Nine of these were reported in the focus-word section of the response sheets; in all nine cases, the participants’ reports of the number of focus words indicated that the key word had been counted among them. The remaining three false recalls of the key word appeared among the ‘‘other’’ words on the response sheets. On one occasion in the key-focus condition and two occasions in the other-focus group, participants listed actual focus words among the ‘‘other’’ words. Veridical recall. Table 3 shows that the mean proportions of list words veridically recalled were identical in the key-focus and other-focus conditions (Ms = .65, SDs = .08 and .07, respectively), and less than in the control condition (M = .71, SD = .10). A one-way ANOVA shows a marginally significant effect of condition on veridical recall, F(2, 54) = 2.83, p = .07. The planned contrast between the rate of veridical recall in both focus conditions together and that in the control condition was significant, t(35) = 2.39, p < .05. Discussion If the (key-) focus task reduces false recall of the key word simply by increasing task difficulty, then the reduction in false recall of key words should be independent of the particular letters of focus. The results of Experiment 3b contradict this hypothesis. False recall of key words was reduced only when participants focused on the first letters of the key words, not when participants focused on different letters. In contrast, veridical recall was reduced no matter what the focus letters were. We conclude that the reduced incidence of falsely recalling the key word observed in the key-focus conditions of Experiments 3a and 3b appeared because participants took advantage of the opportunity for strategic avoidance of the key word that the key-focus structure provides.5 5 One question that remains in both Experiments 3a and 3b is how the number of focus words per list affected recall performance. It might be predicted that the more focus words, the harder it is to keep all of them in verbatim memory and the more likely participants would be to believe the key word was presented. However, an analysis of our data would not provide an adequate test of this hypothesis. As the lists in our experiments vary not only by number of focus words but also by length and the particular key word around which the list is constructed, any analysis of the effect of number of focus letters would be misleading. THE ASSOCIATIVE FALSE MEMORY PARADIGM GENERAL DISCUSSION We have reported three DRM experimental designs in which false recall of key words was sharply reduced. In our view, these reductions occurred because the structure of the experimental situations encouraged participants to use strategies other than simply depending on the gist of the list. In the short-list/rehearsal condition of Experiment 1, participants relied more on individual words retained in working memory than on overall gist. Thus, the rate of falsely recalling the key word in this condition was uniquely low and the rate of veridical recall uniquely high. Participants who played the guessing-game in Experiment 2 knew that the ‘‘secret words’’ they figured out were not on the lists. Thus, regardless of list length, false recall of the key word was essentially eliminated on trials where participants guessed the secret words correctly. In Experiments 3a and 3b false recall of the key word rarely occurred when the instructions required participants to organise the DRM list in memory according to the first letter of the key word; Experiment 3b showed that this effect was due to the strategy such an organisation makes possible. Keeping track of all list words that began with the key letter allowed participants to be sure that the key word was not on the list; keeping track of all list words that began with some other letter did not allow such a strategy. Previous research has shown that increasing the salience of characteristics of individual list items reduces false recall of the key word; we have linked this effect to a more general phenomenon regarding the influence of situational structure on memory strategy in the DRM paradigm. Just as making individual items salient reduces the reliance on gist (Experiment 1), so does making an alternate higher-order structure salient (Experiments 2 and 3a/b); both manipulations result in substantial reductions in false recall of the key word. Our point is not simply that the rate of falsely recalling the key word was much reduced in these experiments (it never quite reached zero), but that participants’ memories of the lists appeared to have been greatly affected by the contexts of the tasks. From this perspective, the few intrusions participants did make are instructive. In most cases, these intrusions reflect the strategies appropriate to the contexts in which the lists were encountered. The short-list/rehearsal condition of Experiment 1 focused attention on individual words, but nothing about the organisation of the 159 task signalled that the key word should not be among those words. Thus, if the key word came to mind it might well be accepted as a list word and (falsely) recalled. In Experiment 2, false recall of the key word only occurred on trials where the participant had not guessed the key word as the secret word. Participants in Experiments 3a and 3b who falsely recalled the key word usually listed it as one of the focus words, which is where it should be if the list were organised according to the focus letter. These patterns of intrusions show that even when the key word may be activated in memory, participants do not accept such traces of activation blindly but rather interpret them within the structure that the situation provides. Others have also pointed out that the gist of DRM lists plays a role in producing false memories of the key word. Schacter and colleagues (Israel & Schacter, 1997; Schacter, Israel, & Racine, 1999; Schacter et al., 1996) argue that presentation of numerous related items highlights the gist and, when participants do not retain the distinctive details of words that were actually presented, false recall is likely. This reasoning led to the prediction that study conditions that encourage encoding of distinctive features of presented words should reduce false recall. Indeed, this appears to be the case (e.g., Israel & Schacter, 1997; Schacter et al., 1999). Mather et al. (1997) make a similar argument, from a sourcemonitoring perspective. They propose that in the DRM paradigm the dimension of semantic similarity (which does not differentiate between presented words and the key word) is so salient that it overrides other dimensions that would differentiate between presented words and the key word. In the DRM, focusing on the salient (yet non-diagnostic) semantic characteristics of memories causes key words that were internally generated to be misattributed to the list read by the experimenter. Finally, a third explanation involving the role of gist is that put forth by Payne et al. (1996). They apply the principles of fuzzy trace theory (Reyna & Brainerd, 1995) which proposes that people encode, in parallel, a verbatim and a gist representation of events as they occur. The process of establishing the gist is called ‘‘gist extraction’’ and this is what allows people to pick up on patterns of stimuli within an event. Memory can be based on either verbatim or gist representations; Payne et al. propose that false memories of the key word in the DRM are based on gist representations. Two of their experiments (Experiments 2 and 3) showed that repeated testing increased false recall, a result they attrib- 160 LIBBY AND NEISSER uted to increased opportunities for gist extraction, and thereby stronger gist representations on which false memories of the key word were based. All of these theories would seem to suggest the design of our Experiment 1, and also to make the same predictions as we did. Allowing for verbatim rehearsal of all presented words increases the distinctive features of presented words, would encourage the use of perceptual detail (rather than semantic characteristics) for source monitoring, and would eliminate the use of the extracted gist as a basis for responding. However, our focus on the role that the structure of the situation plays in determining memory strategies led us to the design and predictions of Experiments 2 and 3, which do not follow directly from the other approaches. The reduction in falsely recalling the key word in our Experiment 2 was achieved by introducing a higher-order structure for the list that put constraints on the relevance of gist to the presented words. All of the words on the list were related to the gist, but so was one word that was not on the list. In this case, false recall of the key word was reduced not by paying closer attention to the presented words, but by knowledge of how the gist related to the structure of the game. Indeed, the game may even have enhanced the gist-extraction process, which was necessary to figure out the secret word. This did not increase false recall (as Payne et al.’s interpretation might predict); rather, the game worked against false recall because its structure differentiated this element of the gist from the presented words. In Experiment 3 we expected that grouping the list according to the first letter of the key word would alert participants to the absence of the key word. In the key-focus condition participants would be paying close attention to a feature of presented words (the initial letter) that was the same as a feature of the non-presented key word. Neither the rationale proposed by Schacter and colleagues nor Mather et al.’s focus on memory characteristics would readily predict the success of this design, as these theories emphasise the salience of differences between the presented and non-presented words for reducing false memories of the key word. It is noteworthy that false recall was not reduced in the other-focus condition of Experiment 3b, when participants focused on organising the list around a letter that did distinguish presented words from the key word. To make these contrasts with other theories regarding the role of gist is not to argue that they are invalid, or that they are necessarily incompa- tible with our own. However, our focus on the role of higher-order structural knowledge illuminates new ways of reducing false recall of key words in the DRM. In so doing, our approach adds to an understanding of how gist is related to DRM false memories. Memory always occurs in the context of a particular activity, and that context affects how people go about the task, as well as what they actually remember. This principle applies in listlearning experiments, even when particular associations or traces are strongly activated, as they surely are in the DRM paradigm. We do not doubt that such activation occurs; our point is that there is no automatic link from the activation of a trace to a person’s belief that this activation is evidence of a real past occurrence. In this regard, the perspective we offer on the DRM paradigm is consistent with other recent research that has focused on the role of situational and metacognitive knowledge in memory. The source-monitoring framework (Johnson, Hashtroudi, & Lindsay, 1993; Johnson & Raye, 1981) proposes that both memory characteristics and more general knowledge about the way the world works figure into reality-monitoring decisions (e.g., ‘‘That memory of a money tree in my back yard is extremely vivid, but it must have been a dream because I know that money does not grow on trees’’, Johnson & Raye, 1981, p. 72). However, Bayen and colleagues (Bayen, Nakamura, Dupuis, & Yang, 2000) point out that there is relatively little empirical evidence for the latter factor, as most source-monitoring experiments arbitrarily assign items to sources. For example, items may be presented by two different speakers, yet these speakers are not identified by any social roles, characteristics, or opinions that are relevant to the words they present. In this case, participants must rely solely on memory characteristics in a later source-monitoring test because there is no higherorder information to use. Several recent studies (Bayen et al., 2000; Mather, Johnson, & DeLeonardis, 1999; Sherman & Bessenoff, 1999) have introduced meaningful relationships between sources and presented items and found that participants used these relationships during subsequent source-monitoring tasks. In addition, other studies suggest that people use their beliefs about how their memories should work, given particular encoding conditions, to interpret recollective experiences they have when their memory is later tested (Bink, Marsh, & Hicks, 1999; Forster & Strack, 1998). The conclusion from these studies is consistent with our results THE ASSOCIATIVE FALSE MEMORY PARADIGM from Experiment 2 (e.g., ‘‘Because I figured out the secret word, I know it was not presented’’) and the key-focus conditions of Experiments 3a and 3b (e.g., ‘‘I know that ‘sleep’ was not presented because if it were, I would have been rehearsing it as one of the focus words’’). Demonstrations of the effect of higher-order knowledge on memory in verbal learning situations fit with observations of autobiographical remembering. For example, people are more likely to accept a suggested false memory from childhood when this event fits within the framework of relevant self-knowledge than when it does not (Hyman & Billings, 1998; Hyman, Husbands, & Billings, 1995). In this connection, our results and discussion illustrate an important point. One of the most intriguing questions about the DRM paradigm is whether the false memories that it produces so readily have anything in common with the more personal false memories that can often occur in clinical settings. People’s notions about what really occurred in the past depend not only on what associations may have been activated in their minds but also on the contexts in which they find themselves and the goals they are trying to achieve. We reduced false recall with DRM lists by introducing more complex tasks and contexts. But things do not always work this way: in other settings (e.g., psychotherapy), changing the context of a memory task may increase false recall rather than reduce it (Engel, 1999). Generally speaking, the accuracy of a given autobiographical memory is determined by the likelihood that the higher-order reconstructive tools used on that occasion will lead one to recreate events as they actually occurred (Bahrick, Hall, & Berger, 1996; Ross, 1989). Apparently this principle holds for the DRM paradigm as well. Manuscript received 9 November 1999 Manuscript accepted 10 October 2000 REFERENCES Bahrick, H.P., Hall, L.K., & Berger, S.A. (1996). Accuracy and distortion in memory for high school grades. Psychological Science, 7, 265–271. Bayen, U.J., Nakamura, G.V., Dupuis, S.E., & Yang, C. (2000). The use of schematic knowledge about sources in source monitoring. Memory and Cognition, 28, 480–500. Bink, M.L., Marsh, R.L., & Hicks, J.L. (1999). An alternative conceptualization to memory ‘‘strength’’ 161 in reality monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 804–809. Brainerd, C.J., & Reyna, V.F. (1998). When things that were never experienced are easier to ‘‘remember’’ than things that were. Psychological Science, 9, 484– 493. Deese, J. (1959). On the prediction of occurrence of particular verbal intrusions in immediate recall. Journal of Experimental Psychology, 58, 17–22. Engel, S. (1999). Context is everything: The nature of memory. New York: W.H. Freeman & Company. Forster, J., & Strack, F. (1998). Subjective theories about encoding may influence recognition: Judgmental regulation in human memory. Social Cognition, 16, 78–92. Gallo, D.A., Roberts, M.J., & Seamon, J.G. (1997). Remembering words not presented in lists: Can we avoid creating false memories? Psychonomic Bulletin & Review, 4, 271–276. Hicks, J.L., & Marsh, R.L. (1999). Attempts to reduce the incidence of false recall with source monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1195–1209. Hyman, I.E., & Billings, J. (1998). Individual differences and the creation of false childhood memories. Memory, 6, 1–20. Hyman, I.E., Husband, T.H., & Billings, J.F. (1995). False memories of childhood experiences. Applied Cognitive Psychology, 9, 181–197. Israel, L., & Schacter, D.L. (1997). Pictorial encoding reduces false recognition of semantic associates. Psychonomic Bulletin & Review, 4, 577–581. Johnson, M.K., Hashtroudi, S., & Lindsay, D.S. (1993). Source monitoring. Psychological Bulletin, 114, 3–28. Johnson, M.K., & Raye, C.L. (1981). Reality monitoring. Psychological Review, 88, 67–85. Kensinger, E.A., & Schacter, D.L. (1999). When true memories suppress false memories: Effects of ageing. Cognitive Neuropsychology, 16, 399–415. Lampinen, J.M., Neuschatz, J.S., & Payne, D.G. (1997). Source attributions and false memories: A test of the demand characteristics account. Psychonomic Bulletin & Review, 6, 130–135. Mather, M., Henkel, L.A., Johnson, M.K. (1997). Evaluating characteristics of false memories: Remember/know judgments and memory characteristics questionnaire compared. Memory & Cognition, 25, 826–837. Mather, M., Johnson, M.K., & DeLeonardis, D.M. (1999). Stereotype reliance in source-monitoring: Age differences and neuropsychological test correlates. Cognitive Neuropsychology, 16, 437–458. McDermott, K.B. (1996). The persistence of false memories in list recall. Journal of Memory and Language , 35, 212–230. McDermott, K.B., & Roediger, H.L. (1998). Attempting to avoid illusory memories: Robust false recognition of associates persists under conditions of explicit warnings and immediate testing. Journal of Memory and Language, 39, 508–520. Melo, B., Winocur, G., & Moscovitch, M. (1999). False recall and false recognition: An examination of the effects of selective and combined lesions to the 162 LIBBY AND NEISSER medial temporal lobe/diencephalon and frontal lobe structures. Cognitive Neuropsychology, 16, 343–359. Neuschatz, J.S., & Payne, D.G. (1996). The influence of warnings and encoding instructions on the magnitude of the false memory effect. Paper presented at the Eastern Psychological Association, Philadelphia, PA, USA. Norman, K.A., & Schacter, D.L. (1997). False recognition in young and older adults: Exploring the characteristics of illusory memories. Memory and Cognition, 25, 838–48. Payne, D.G., Elie, C.J., Blackwell, J.M., & Neuschatz, J.S. (1996). Memory illusions: Recalling, recognizing, and recollecting events that never occurred. Journal of Memory and Language, 35, 261–285. Reyna, V.F., & Brainerd, C.J. (1995). Fuzzy trace theory: An interim synthesis. Learning and Individual Differences, 7, 1–75. Robinson, K.J., & Roediger, H.L. (1997). Associative processes in false recall and false recognition. Psychological Science, 8, 231–237. Roediger, H.L., & McDermott, K.B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. Ross, M. (1989). Relation of implicit theories to the construction of personal histories. Psychological Review, 96, 341–357. Schacter, D.L., Israel, L., & Racine, C. (1999). Suppressing false recognition in younger and older adults: The distinctiveness heuristic. Journal of Memory and Language, 40, 1–24. Schacter, D.L., Vervaeillie, M., & Pradere, D. (1996). The neuropsychology of memory illusions: False recall and recognition in amnesic patients. Journal of Memory and Language, 35, 319–334. Sherman, J.W., & Bessenoff, G.R. (1999). Stereotypes as source-monitoring cues: On the interaction between episodic and semantic memory. Psychological Science, 10, 106–110. Tussing, A.A., & Greene, R.L. (1997). False recognition of associates: How robust is the effect? Psychonomic Bulletin & Review, 4, 572–576. APPENDIX A The ten lists used in Experiment 3a Slow fast lethargic stop listless snail speed cautious delay sluggish traffic Mountain hill valley molehill climb summit top peak plain glacier Black white dark cat charred night blue funeral color bottom grief brown Sweet sour candy sugar bitter good taste soda tooth nice D octor nurse sick lawyer medicine health hospital dentist physician ill patient office Chair table sit legs seat couch desk recliner cushion sofa wood Bread butter food eat sandwich rye jam milk flour jelly Sleep bed snooze rest awake tired slumber snore dream wake blanket River water stream lake Mississippi boat tide swim flow run barge creek Man woman husband uncle lady mouse male father strong friend beard Key-focus words are in italics. THE ASSOCIATIVE FALSE MEMORY PARADIGM APPENDIX B The nine lists used in Experiment 3b Slow (L) fast LETHARGIC stop LISTLESS snail cautious delay traffic turtle hesitant Mountain (G) hill valley molehill climb summit top peak plain GLACIER Black (C) white dark CAT CHARRED night blue funeral COLOR bottom grief brown Sweet (T) sour candy sugar bitter good TASTE soda TOOTH nice TART Chair (S) table SIT legs SEAT couch desk recliner custhion wood Bread (R) butter food eat sandwich RYE jam milk flour jelly Sleep (D ) bed snooze rest DREAM awake tired DOZE slumber snore DROWSY wake River (B) water stream lake Mississippi BOAT tide swim flow run creek D octor (M) nurse sick lawyer MEDICINE health hospital dentist physician ill patient office Letters in parentheses are other-focus letters. Key-focus words are in italics. Other-focus words are in capitals. 163
© Copyright 2026 Paperzz