Long-term Memory for 400 Pictures on a CommonTheme Stine Vogta* and Svein Magnussen,a,b a Department of Psychology, University of Oslo, Norway b Centre for Advanced Study, The Norwegian Academy of Science and Letters, Norway Word count: main text 2472, abstract 134 *Corresponding author: Stine Vogt, Department of Psychology, University of Oslo, Box 1094 Blindern, 0317 Oslo, Norway. E-mail: [email protected] 2 Abstract – Long-term memory for large numbers of colour photographs with a common motif - doors - was studied using pictures with two levels of informative cues: original photographs, and edited pictures in which extraneous information on details such as vegetation, paint scratches, sings and lamp posts, was removed. In the study phase, subjects viewed 400 pictures, and were subsequently tested for memory on 2AFC discriminations between studied and distracter pictures, at retention intervals between 0.5 hour and one week. When tested with the non-edited original photographs, immediate memory performance was close to 85% correct, when pictorial details were removed, memory performance dropped by 20 percent points. The decay functions were shallow with parallel paths for the categories of pictures. It is concluded that specific details of visual scenes contributes to long-term memory of those scenes. Keywords: Visual long-term memory – complex scenes 3 Introduction Classical studies of pictorial memory demonstrate that human subjects have an astonishing ability for remembering previously viewed objects and scenes (Nickerson, 1965; Shepard, 1967; Standing, Conezio & Haber, 1970, Standing 1973). Shepard (1967) found that subjects were 98% correct at immediate testing of 68 familiar-unfamiliar pairs following learning trials of 612 pictures. Standing (1973) presented participants with 10.000 different photographs for 5 seconds each, and found that in a subsequent two alternative forced choice test of recognition, mean percentage of correct responses was 83%, which suggests the retention of approximately 6,600 pictures. In the same experiment, memory testing of 400 study pictures after en interval of 2 days showed a memory decline of 11,4 percent points. Such results might indicate that long-term visual representations of natural scenes are complete, detailed and verdical (Neisser, 1967), a view currently referred to as the composite scene representation hypothesis (Hollingworth & Henderson, 2002; Castelhano & Henderson, 2005). However, one liability in these early experiments was that the distracters were typically selected to be very different from studied images, and this makes it difficult to identify the types of information supporting the memory performance. Already in 1974, Goldstein and Chance argued that the high performance levels in pictorial memory do not necessarily imply that the entire image is stored in long-term memory. They pointed out that the extreme visual variety in the pictures employed in many of the early studies indicated that subjects had a wealth of different cues available by which to differentiate familiar from unfamiliar 4 pictures, and that in a two alternative forced choice paradigm, subjects need merely remember a single feature or aspect of a complex image – gist or layout - to distinguish a target from a distracter picture. Gist refers to an immediate assessment of overall contextual information comprised of low-level features that fill the scene, object relations and spatial layout as well as the potential presence of objects (Wolfe, 1998). Tatler, Gilchrist and Rusted (2003) describe gist as the general meaning of scenes, for example whether it is a scene of an office or a kitchen, and is independent on explicit knowledge of what the scenes contain. More recently, the composite scene representation hypothesis has been challenged by the results of a large body of research on change blindness, showing that people frequently fail to detect large changes in a visual scene when online perception is briefly interrupted, both in laboratory and in natural contexts (Simons, 2000; Rensink, 2002). The alternative account, termed localist-minimalist theories by Castelhano and Henderson (2005), assume that visual scene representation is limited to the object that is currently in the focus of attention and that non-attended objects quickly fade from memory and that long-term visual scene memory depends upon the more global aspects of the scenes. Rensink (2000, 2002) has proposed a theory of visual scene perception in terms of three systems: a low-level vision system that analyzes the basic features of visual stimuli, a second system of focused attention that provides information about particular attended objects, and a third system that takes care of scene schema, providing information about layout and gist; it is the latter system that carries long-term memories of natural visual scenes. According to this view, then, the robust long-term memory of naturalistic pictures found in the classical experiments (Nickerson, 1965; 5 Shepard, 1967; Standing, Conezio & Haber, 1970; Standing, 1973) is principally the result of gist memory for pictures displaying a wide variety of motifs. However, recent experiments suggest that visual details may preserved in scene memory, and are perhaps more important in long-term memory that assumed by the localist-minimalist view. Simons, Chabris, Schnur and Levin (2002) found that scene changes that were not detected in a naturalistic change blindness task, nevertheless were remembered when participants were questioned about the critical details. Hollingworth and Henderson (2002) found that local changes in natural scenes were accurately detected after attention was withdrawn from the objects after intervals up to 30 min, and Hollingworth (2005) found very little decline in performance for memory intervals up to 24 hours. Castelhano and Henderson (2005) compared memory for visual details in natural scenes with participants performing an intentional memorization task or performing a visual search task in which participants were unaware that memory was to be tested at a later stage, and found comparable memory for visual detail in the two conditions. These results indicate that long-term memory for visual detail in a scene is quite robust, and is a natural product of visual scene perception. The fact that visual details of scenes are preserved in long-term memory when specifically tested does not necessarily imply that such details contribute to the long-term pictorial memory tested in two-alternative forced choice tasks; pictorial long-term memory in general and the impressive performance evinced in the classical studies might well rely on gist memory. The present experiment was designed to provide some information on this question, testing long-term memory for a large number (n = 400) of 6 briefly viewed colour photographs introducing two modifications of the classical studies: First, the pictures were confined to a single motif – photographs of front doors - thus limiting the probability of coding in terms of gist and layout. Second, the photographs were studied and tested in one of two versions, as originals or in a cue-stripped version where some minor, hardly noticeable details were removed – e.g. lamp posts, flower pots, paint scratches and name tags - on the assumption that if such “irrelevant” details assisted pictorial long-term memory, their removal should impair memory performance. Method Participants Twenty-four participants between the ages of 23 and 30 with normal or corrected to normal vision and no reported colour blindness were recruited from the student body at the University of Oslo. They were given a small fee for their efforts. There were 12 subjects in each condition, and no participant took part in more than one condition. Stimuli Stimuli were presented on an 21’ Eizo Flexscan T 960 colour monitor with 1152 x 864 screen resolution and 32 bit colour for all conditions. The software Superlab, designed for experimental purposes by Cedrus was used to run the trials, controlling the experiment and compiling the participants’ response type and times. Subjects viewed the monitor binocularly at a distance of 75 cm at which the picture stimuli, all of the same height, 258 7 mm (725 pixels), and between 71 mm (401 pixels) and 198 mm (559 pixels) wide, subtended the visual angles of 20º vertically, and from 6º to 16º horizontally. The pictures were of 800 different doors taken in the Oslo city area and surrounding towns. They were prepared for presentation in such a way as to comprise a decrease of meaningful cues in two conditions: original, in which the pictures were presented as they were taken; edited, in which the pictures were edited using Adobe Photoshop 5 so that all features unrelated to doors as such were removed, thus curtailing the availability of the range of meaningful cues in different object categories present in the first condition (e.g. parts of vehicles in front of the doors, signs, vegetation etc.) while retaining the figurative nature of the pictures. In most pictures these modifications were barely noticeable in a brief glance, but required careful comparisons between originals and edited pictures to be detected (see examples in Fig. 1) Thus, the variety in the pictures was contained in very few categories, obstructing verbal or semantic encoding, particularly in the last condition. None of the pictures affords many depth cues. Thus, there is variety in terms of perceptual input, and very little in terms of object categories. The available semantic cues were thus limited to the names of colours and architectural styles, and perhaps to estimates of the socioeconomic status of the occupants behind the doors.1 1 The original experiment included a third condition where a separate group of participants were tested with a blocked version of the stimuli, pixellated by a factor of 40, thus removing all detailed information leaving colour and coarse structure. However, the performance under this condition was at chance and it is not included. 8 Fig. 1. Examples of pictures used in the present study, showing original photographs of doors (top row) and edited photographs (bottom row). Note that it is not immediately obvious which cues have been removed in the edited pictures. Procedure Subjects were randomly assigned to the two conditions, originals and edited. In the study phase, subjects were shown a series of 400 pictures for the original and edited picture types. Each picture was presented for 5 seconds with intervals of 600 msec. This session lasted for approximately ¾ hr, including time for (verbal) instruction. Subjects were requested to concentrate on the pictures in order to remember as many as possible. Memory was tested at four different time intervals, 0.5 hours, 2 hours, 2 days and 7 days. For each test session, subjects were shown 100 pictures from the study sample together 9 with 100 novel pictures, and decided by two-alternative forced choice which picture had been presented in the study phase. Pictures were presented in pairs, and the subject signalled her/his choice by pressing the appropriate key on the computer keyboard. The response to one stimulus initiated presentation of the next. Pictures were presented in random order (generated by the software) during both study and test sessions, and completely different samples of both study and distractor pictures were shown at the various test sessions. Test sessions lasted approximately 15 minutes. Results Accuracy Figure 2 shows percent correct recognition for the two types of pictures as a function of retention interval on a logarithmic scale, with corresponding curves individually fitted to each data set. Firstly, the results that the level of memory performance in the 2AFC task is strongly dependent upon picture type. When memory was tested 30 min after the study session, the average score for original pictures was 83.7 % correct, dropping to 63.6 % correct recognition for edited pictures. Thus, stripping the pictures of minor, apparently “insignificant”, elements had a large detrimental effect on memory. Secondly, the stimulus manipulation affected the process of encoding pictures into memory, but not the subsequent memory decay, as the individual curves follow a parallel course. Confirming the results of previous studies, the long-term memory for naturalistic pictures is very good. However, a comparison with the results of Standing (1973), representing the 10 Fig. 2. Memory for original ( ) and edited ( ) pictures at different time delays. Each point is the mean of 12 subjects, each making 100 2AFC decisions at each interval. Single point shows results from Standing (1973), representing memory for ‘normal’ ( ) pictures in a study set of 400 pictures. memory for “normal” pictures of great variability in pictorial motif with a study set of 400 pictures (represented by filled circle in Fig. 2), shows that the level of memory performance in the present experiment is 15 percent point below. The trends were supported by statistical analyses, showing a main effect of picture type: [F (1.20) = 26.69, p<.000], and retention interval:[F (1.20) = 12.23, p=.002], but no interaction between the variables. No difference was found in the slopes of the two conditions. 11 Fig. 3 Response times for original ( ) and edited ( ) pictures at different time delays. Each point is the mean of 12 subjects, each making 100 forced-choice 2AFC decisions at each interval. Response time The instructions to the participants did not stress speed of response, leaving the participants free to work at their own pace. However, response times were recorded and Fig. 3 plots mean response times for corrects decisions, with response times longer than 7000 msec removed as outliers. No effect of picture type was observed, but response times were observed to decrease as a function of the length of the memory interval: [F (1.1) = 10.14, p < .005]. A plot of accuracy versus response times for original and edited pictures (Fig. 4) indicates that picture type and memory interval act independently on reaction times, as they do on accuracy. 12 Fig. 4 A plot of correct responses versus response times for original ( ) and edited ( ) pictures Discussion The results of the present paper confirm previous studies that pictorial memory is very good and that information is maintained in long-term memory with little loss of information across intervals of one week. When fairly limited amounts of information were removed from the pictures - details that may appear hardly noticeable – the memory performance dropped dramatically, by 20 percent points. This suggests that details of the visual scene are important in forming long-term memories, in line with the results of Simons et al. (2002), Hollingworth (2005; Hollingworth & Henderson, 2002) and Castelhano and Henderson (2005). However, comparing the present results for naturalistic pictures, all representing a single category of objects – doors – with the 13 results of Standing (1973), whose study sample included a wide range of scenes and memory was tested with test and distracter pictures that were very different, indicates that gist and layout memory is also factor in pictorial long-term memory. Figure 3 shows that response times decrease as a function of the memory interval, and Fig. 4 shows that response times are correlated with accuracy but not in speedaccuracy trade-off, quite to the contrary response times are positively correlated with accuracy, forming parallel functions for the two experimental conditions, favoring the least memorable edited pictures. Both findings appear counterintuitive and at odds with other findings. For example, several studies of short-term visual memory reviewed by Magnussen (2000) have found that decision times in a two-interval forced choice task increased as the retention interval was prolonged, and were uncorrelated with memory accuracy. However, Jiang, Song and Rigas (2005) trained participants on a visual search task, detecting a target ‘T’ among distractor ‘L’s’ for five consecutive days and tested the memory for the displays one week later. Using reaction times as a measure of recognition for old displays, they found a very high capacity for spatial context learning for close to 2000 displays, with accuracy measures of 97% or higher in all five sessions, As in the present study, they observed a significant decrease of reaction times with time. Why should it take a longer time to match an online stimulus to at set of memory representations established 30 minutes ago than to match the stimulus to representations there were established two weeks ago? One possible answer is that older representations are better consolidated and therefore more easily accessible than newly formed representations. The evidence suggests that memory consolidation is a process that may 14 persist over weeks and even years (Wixted, 2004). A second factor that would account for the effect of picture type is that it may take a shorter time to search memory with a smaller set of retained memory representations than with a larger set, if the system uses something like an exhaustive serial search process (Sternberg, 1966). Thus, as the forgetting process proceeds or the informational content of the stimuli is reduced, the available set of memory representations decreases, and the time spent in memory research is reduced. The fewer cues and the longer the time interval, the fewer items need to be assessed. The results of the present experiment further indicate that the stimulus factors affecting pictorial memory operate on the encoding into long-term memory, and that the memory process follows a similar course independent of the level of memory performance. The rate of decay is quite slow: Extrapolating from the average slopes predicts an additional decay of 11.5% after 50 years for original and edited pictures. Thus the memory for the original pictures would still be well above chance after 50 years when tested with the 2AFC method. The results of the present paper thus confirm the large capacity of long-term visual memory for natural scenes shown by the classical studies of pictorial memory (Nickerson, 1965, Shepard, 1967, Standing, Conezio & Haber, 1970; Standing, 1973). 15 Acknowledgements 1 This study was supported by the Norwegian Research Council (MH). We thank T. Endestad and D. E. Eilertsen for assistance with statistical analyses. References Castelhano, M. S. & Henderson, J. M. (2005). Incidental visual memory for objects in scenes. Visual Cognition, 12, 1017-1040. Goldstein, A. G. & Chance, J. E. (1974). Some factors in picture recognition memory. Journal of General Psychology, 90, 69-85. Psychology: Human Perception and Performance, 25, 210-228. Hollingworth, A. & Henderson, J. M. (2002). Accurate visual memory for previously attended objects in natural scenes. Journal of Experimental Psychology: Human Perception and Performance, 28, 113 – 136. Hollingworth, A. (2005). The relationship between online visual representation of a scene and long-term scene memory. Journal of Experimental Psychology; Learning, Memory and Cognition, 31, 396-411. Jiang, Y., Song, J. H. & Rigas, A. (2005). High-capacity spatial contextual memory. Psychonomic Bulletin & Review, 12, 524-529. Magnussen, S. (2000). Low-level memory processes in vision. Trends in Neurosciences, 16 23, 247-251. Neisser, U. (1967). Cognitive Psychology. Appleton: Century Crofts. Nickerson, R. S. (1965). Short-term memory for complex meaningful visual configuration: a demonstration of capacity. Canadian Journal of Psychology, 19, 155-160. Reinvang, I., Magnussen, S. & Greenlee, M.W. (2002). Hemispheric asymmetry in visual discrimination and memory: ERP-evidence for the spatial frequency hypothesis. Experimental Brain Research, 144, 483-495. Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17-42. Rensink, R. A. (2002). Change detection. Annual Review of Psychology, 53, 245-277. Shepard, R. N. (1967). Recognition memory for words, sentences and pictures. Journal of Verbal Learning and Verbal Behavior, 6, 156-163. Simons, D. J. (2000). Current approaches to change blindness. Visual Cognition, 7, 1-15. Simons, D. J., Chabris, C. F., Schnur, T. & Levin, D. T. (2002). Evidence for preserved representations in change blindness. Consciousness and Cognition, 11, 78-97. Standing, L. (1973). Learning 10.000 pictures. Quarterly Journal of Experimental Psychology, 25, 207-222.. Standing, L., Conezio, J. & Haber, R. N. (1970). Perception and memory for pictures: single trial learning of 2500 visual stimuli. Psychonomic Science, 19, 73-74. Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652 – 654. Tatler, B. W., Gilchrist, I. D. & Rusted, J. (2003). The time course of abstract visual representation. Perception, 32, 579-592. 17 Wixted, J. T. (2004). The psychology and neuroscience of forgetting. Annual Review of Psychology, 55, 235-269. Wolfe, J. E. (1998). What do you know about what you saw? Current Biology, 8 R303-R304.
© Copyright 2026 Paperzz