Long-term Memory for 400 Pictures on a CommonTheme

Long-term Memory for 400 Pictures on a CommonTheme
Stine Vogta* and Svein Magnussen,a,b
a
Department of Psychology, University of Oslo, Norway
b
Centre for Advanced Study,
The Norwegian Academy of Science and Letters, Norway
Word count: main text 2472, abstract 134
*Corresponding author: Stine Vogt, Department of Psychology, University of Oslo, Box
1094 Blindern, 0317 Oslo, Norway. E-mail: [email protected]
2
Abstract – Long-term memory for large numbers of colour photographs with a common
motif - doors - was studied using pictures with two levels of informative cues: original
photographs, and edited pictures in which extraneous information on details such as
vegetation, paint scratches, sings and lamp posts, was removed. In the study phase,
subjects viewed 400 pictures, and were subsequently tested for memory on 2AFC
discriminations between studied and distracter pictures, at retention intervals between 0.5
hour and one week. When tested with the non-edited original photographs, immediate
memory performance was close to 85% correct, when pictorial details were removed,
memory performance dropped by 20 percent points. The decay functions were shallow
with parallel paths for the categories of pictures. It is concluded that specific details of
visual scenes contributes to long-term memory of those scenes.
Keywords: Visual long-term memory – complex scenes
3
Introduction
Classical studies of pictorial memory demonstrate that human subjects have an
astonishing ability for remembering previously viewed objects and scenes (Nickerson,
1965; Shepard, 1967; Standing, Conezio & Haber, 1970, Standing 1973). Shepard (1967)
found that subjects were 98% correct at immediate testing of 68 familiar-unfamiliar pairs
following learning trials of 612 pictures. Standing (1973) presented participants with
10.000 different photographs for 5 seconds each, and found that in a subsequent two
alternative forced choice test of recognition, mean percentage of correct responses was
83%, which suggests the retention of approximately 6,600 pictures. In the same
experiment, memory testing of 400 study pictures after en interval of 2 days showed a
memory decline of 11,4 percent points. Such results might indicate that long-term visual
representations of natural scenes are complete, detailed and verdical (Neisser, 1967), a
view currently referred to as the composite scene representation hypothesis
(Hollingworth & Henderson, 2002; Castelhano & Henderson, 2005). However, one
liability in these early experiments was that the distracters were typically selected to be
very different from studied images, and this makes it difficult to identify the types of
information supporting the memory performance. Already in 1974, Goldstein and Chance
argued that the high performance levels in pictorial memory do not necessarily imply that
the entire image is stored in long-term memory. They pointed out that the extreme visual
variety in the pictures employed in many of the early studies indicated that subjects had a
wealth of different cues available by which to differentiate familiar from unfamiliar
4
pictures, and that in a two alternative forced choice paradigm, subjects need merely
remember a single feature or aspect of a complex image – gist or layout - to distinguish a
target from a distracter picture. Gist refers to an immediate assessment of overall
contextual information comprised of low-level features that fill the scene, object relations
and spatial layout as well as the potential presence of objects (Wolfe, 1998). Tatler,
Gilchrist and Rusted (2003) describe gist as the general meaning of scenes, for example
whether it is a scene of an office or a kitchen, and is independent on explicit knowledge
of what the scenes contain. More recently, the composite scene representation hypothesis
has been challenged by the results of a large body of research on change blindness,
showing that people frequently fail to detect large changes in a visual scene when online
perception is briefly interrupted, both in laboratory and in natural contexts (Simons,
2000; Rensink, 2002). The alternative account, termed localist-minimalist theories by
Castelhano and Henderson (2005), assume that visual scene representation is limited to
the object that is currently in the focus of attention and that non-attended objects quickly
fade from memory and that long-term visual scene memory depends upon the more
global aspects of the scenes. Rensink (2000, 2002) has proposed a theory of visual scene
perception in terms of three systems: a low-level vision system that analyzes the basic
features of visual stimuli, a second system of focused attention that provides information
about particular attended objects, and a third system that takes care of scene schema,
providing information about layout and gist; it is the latter system that carries long-term
memories of natural visual scenes. According to this view, then, the robust long-term
memory of naturalistic pictures found in the classical experiments (Nickerson, 1965;
5
Shepard, 1967; Standing, Conezio & Haber, 1970; Standing, 1973) is principally the
result of gist memory for pictures displaying a wide variety of motifs.
However, recent experiments suggest that visual details may preserved in scene
memory, and are perhaps more important in long-term memory that assumed by the
localist-minimalist view. Simons, Chabris, Schnur and Levin (2002) found that scene
changes that were not detected in a naturalistic change blindness task, nevertheless were
remembered when participants were questioned about the critical details. Hollingworth
and Henderson (2002) found that local changes in natural scenes were accurately detected
after attention was withdrawn from the objects after intervals up to 30 min, and
Hollingworth (2005) found very little decline in performance for memory intervals up to
24 hours. Castelhano and Henderson (2005) compared memory for visual details in
natural scenes with participants performing an intentional memorization task or
performing a visual search task in which participants were unaware that memory was to
be tested at a later stage, and found comparable memory for visual detail in the two
conditions. These results indicate that long-term memory for visual detail in a scene is
quite robust, and is a natural product of visual scene perception.
The fact that visual details of scenes are preserved in long-term memory when
specifically tested does not necessarily imply that such details contribute to the long-term
pictorial memory tested in two-alternative forced choice tasks; pictorial long-term
memory in general and the impressive performance evinced in the classical studies might
well rely on gist memory. The present experiment was designed to provide some
information on this question, testing long-term memory for a large number (n = 400) of
6
briefly viewed colour photographs introducing two modifications of the classical studies:
First, the pictures were confined to a single motif – photographs of front doors - thus
limiting the probability of coding in terms of gist and layout. Second, the photographs
were studied and tested in one of two versions, as originals or in a cue-stripped version
where some minor, hardly noticeable details were removed – e.g. lamp posts, flower pots,
paint scratches and name tags - on the assumption that if such “irrelevant” details assisted
pictorial long-term memory, their removal should impair memory performance.
Method
Participants
Twenty-four participants between the ages of 23 and 30 with normal or corrected to
normal vision and no reported colour blindness were recruited from the student body at
the University of Oslo. They were given a small fee for their efforts. There were 12
subjects in each condition, and no participant took part in more than one condition.
Stimuli
Stimuli were presented on an 21’ Eizo Flexscan T 960 colour monitor with 1152 x 864
screen resolution and 32 bit colour for all conditions. The software Superlab, designed for
experimental purposes by Cedrus was used to run the trials, controlling the experiment
and compiling the participants’ response type and times. Subjects viewed the monitor
binocularly at a distance of 75 cm at which the picture stimuli, all of the same height, 258
7
mm (725 pixels), and between 71 mm (401 pixels) and 198 mm (559 pixels) wide,
subtended the visual angles of 20º vertically, and from 6º to 16º horizontally. The pictures
were of 800 different doors taken in the Oslo city area and surrounding towns. They were
prepared for presentation in such a way as to comprise a decrease of meaningful cues in
two conditions: original, in which the pictures were presented as they were taken; edited,
in which the pictures were edited using Adobe Photoshop 5 so that all features unrelated
to doors as such were removed, thus curtailing the availability of the range of meaningful
cues in different object categories present in the first condition (e.g. parts of vehicles in
front of the doors, signs, vegetation etc.) while retaining the figurative nature of the
pictures. In most pictures these modifications were barely noticeable in a brief glance, but
required careful comparisons between originals and edited pictures to be detected (see
examples in Fig. 1) Thus, the variety in the pictures was contained in very few categories,
obstructing verbal or semantic encoding, particularly in the last condition. None of the
pictures affords many depth cues. Thus, there is variety in terms of perceptual input, and
very little in terms of object categories. The available semantic cues were thus limited to
the names of colours and architectural styles, and perhaps to estimates of the socioeconomic status of the occupants behind the doors.1
1
The original experiment included a third condition where a separate group of participants were tested with
a blocked version of the stimuli, pixellated by a factor of 40, thus removing all detailed information leaving
colour and coarse structure. However, the performance under this condition was at chance and it is not
included.
8
Fig. 1. Examples of pictures used in the present study, showing original photographs of
doors (top row) and edited photographs (bottom row). Note that it is not immediately
obvious which cues have been removed in the edited pictures.
Procedure
Subjects were randomly assigned to the two conditions, originals and edited. In the study
phase, subjects were shown a series of 400 pictures for the original and edited picture
types. Each picture was presented for 5 seconds with intervals of 600 msec. This session
lasted for approximately ¾ hr, including time for (verbal) instruction. Subjects were
requested to concentrate on the pictures in order to remember as many as possible.
Memory was tested at four different time intervals, 0.5 hours, 2 hours, 2 days and 7 days.
For each test session, subjects were shown 100 pictures from the study sample together
9
with 100 novel pictures, and decided by two-alternative forced choice which picture had
been presented in the study phase. Pictures were presented in pairs, and the subject
signalled her/his choice by pressing the appropriate key on the computer keyboard. The
response to one stimulus initiated presentation of the next. Pictures were presented in
random order (generated by the software) during both study and test sessions, and
completely different samples of both study and distractor pictures were shown at the
various test sessions. Test sessions lasted approximately 15 minutes.
Results
Accuracy
Figure 2 shows percent correct recognition for the two types of pictures as a function of
retention interval on a logarithmic scale, with corresponding curves individually fitted to
each data set. Firstly, the results that the level of memory performance in the 2AFC task
is strongly dependent upon picture type. When memory was tested 30 min after the study
session, the average score for original pictures was 83.7 % correct, dropping to 63.6 %
correct recognition for edited pictures. Thus, stripping the pictures of minor, apparently
“insignificant”, elements had a large detrimental effect on memory. Secondly, the
stimulus manipulation affected the process of encoding pictures into memory, but not the
subsequent memory decay, as the individual curves follow a parallel course. Confirming
the results of previous studies, the long-term memory for naturalistic pictures is very
good. However, a comparison with the results of Standing (1973), representing the
10
Fig. 2. Memory for original ( ) and edited ( ) pictures at different time delays. Each
point
is the mean of 12 subjects, each making 100 2AFC decisions at each interval. Single point
shows results from Standing (1973), representing memory for ‘normal’ ( ) pictures in a
study set of 400 pictures.
memory for “normal” pictures of great variability in pictorial motif with a study set of
400 pictures (represented by filled circle in Fig. 2), shows that the level of memory
performance in the present experiment is 15 percent point below. The trends were
supported by statistical analyses, showing a main effect of picture type: [F (1.20) =
26.69, p<.000], and retention interval:[F (1.20) = 12.23, p=.002], but no interaction
between the variables. No difference was found in the slopes of the two conditions.
11
Fig. 3 Response times for original ( ) and edited ( ) pictures at different time delays.
Each point is the mean of 12 subjects, each making 100 forced-choice 2AFC decisions at
each interval.
Response time
The instructions to the participants did not stress speed of response, leaving the
participants free to work at their own pace. However, response times were recorded and
Fig. 3 plots mean response times for corrects decisions, with response times longer than
7000 msec removed as outliers. No effect of picture type was observed, but response
times were observed to decrease as a function of the length of the memory interval: [F
(1.1) = 10.14, p < .005]. A plot of accuracy versus response times for original and edited
pictures (Fig. 4) indicates that picture type and memory interval act independently on
reaction times, as they do on accuracy.
12
Fig. 4 A plot of correct responses versus response times for original ( )
and edited ( ) pictures
Discussion
The results of the present paper confirm previous studies that pictorial memory is
very good and that information is maintained in long-term memory with little loss of
information across intervals of one week. When fairly limited amounts of information
were removed from the pictures - details that may appear hardly noticeable – the memory
performance dropped dramatically, by 20 percent points. This suggests that details of the
visual scene are important in forming long-term memories, in line with the results of
Simons et al. (2002), Hollingworth (2005; Hollingworth & Henderson, 2002) and
Castelhano and Henderson (2005). However, comparing the present results for
naturalistic pictures, all representing a single category of objects – doors – with the
13
results of Standing (1973), whose study sample included a wide range of scenes and
memory was tested with test and distracter pictures that were very different, indicates
that gist and layout memory is also factor in pictorial long-term memory.
Figure 3 shows that response times decrease as a function of the memory interval,
and Fig. 4 shows that response times are correlated with accuracy but not in speedaccuracy trade-off, quite to the contrary response times are positively correlated with
accuracy, forming parallel functions for the two experimental conditions, favoring the
least memorable edited pictures. Both findings appear counterintuitive and at odds with
other findings. For example, several studies of short-term visual memory reviewed by
Magnussen (2000) have found that decision times in a two-interval forced choice task
increased as the retention interval was prolonged, and were uncorrelated with memory
accuracy. However, Jiang, Song and Rigas (2005) trained participants on a visual search
task, detecting a target ‘T’ among distractor ‘L’s’ for five consecutive days and tested the
memory for the displays one week later. Using reaction times as a measure of recognition
for old displays, they found a very high capacity for spatial context learning for close to
2000 displays, with accuracy measures of 97% or higher in all five sessions, As in the
present study, they observed a significant decrease of reaction times with time.
Why should it take a longer time to match an online stimulus to at set of memory
representations established 30 minutes ago than to match the stimulus to representations
there were established two weeks ago? One possible answer is that older representations
are better consolidated and therefore more easily accessible than newly formed
representations. The evidence suggests that memory consolidation is a process that may
14
persist over weeks and even years (Wixted, 2004). A second factor that would account
for the effect of picture type is that it may take a shorter time to search memory with a
smaller set of retained memory representations than with a larger set, if the system uses
something like an exhaustive serial search process (Sternberg, 1966). Thus, as the
forgetting process proceeds or the informational content of the stimuli is reduced, the
available set of memory representations decreases, and the time spent in memory
research is reduced. The fewer cues and the longer the time interval, the fewer items need
to be assessed.
The results of the present experiment further indicate that the stimulus factors
affecting pictorial memory operate on the encoding into long-term memory, and that the
memory process follows a similar course independent of the level of memory
performance. The rate of decay is quite slow: Extrapolating from the average slopes
predicts an additional decay of 11.5% after 50 years for original and edited pictures. Thus
the memory for the original pictures would still be well above chance after 50 years when
tested with the 2AFC method. The results of the present paper thus confirm the large
capacity of long-term visual memory for natural scenes shown by the classical studies of
pictorial memory (Nickerson, 1965, Shepard, 1967, Standing, Conezio & Haber, 1970;
Standing, 1973).
15
Acknowledgements
1
This study was supported by the Norwegian Research Council (MH). We thank T.
Endestad and D. E. Eilertsen for assistance with statistical analyses.
References
Castelhano, M. S. & Henderson, J. M. (2005). Incidental visual memory for objects in
scenes. Visual Cognition, 12, 1017-1040.
Goldstein, A. G. & Chance, J. E. (1974). Some factors in picture recognition memory.
Journal of General Psychology, 90, 69-85.
Psychology: Human Perception and Performance, 25, 210-228.
Hollingworth, A. & Henderson, J. M. (2002). Accurate visual memory for previously
attended objects in natural scenes. Journal of Experimental Psychology: Human
Perception and Performance, 28, 113 – 136.
Hollingworth, A. (2005). The relationship between online visual representation of a scene
and long-term scene memory. Journal of Experimental Psychology; Learning,
Memory and Cognition, 31, 396-411.
Jiang, Y., Song, J. H. & Rigas, A. (2005). High-capacity spatial contextual memory.
Psychonomic Bulletin & Review, 12, 524-529.
Magnussen, S. (2000). Low-level memory processes in vision. Trends in Neurosciences,
16
23, 247-251.
Neisser, U. (1967). Cognitive Psychology. Appleton: Century Crofts.
Nickerson, R. S. (1965). Short-term memory for complex meaningful visual
configuration: a demonstration of capacity. Canadian Journal of Psychology, 19,
155-160.
Reinvang, I., Magnussen, S. & Greenlee, M.W. (2002). Hemispheric asymmetry in visual
discrimination and memory: ERP-evidence for the spatial frequency hypothesis.
Experimental Brain Research, 144, 483-495.
Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17-42.
Rensink, R. A. (2002). Change detection. Annual Review of Psychology, 53, 245-277.
Shepard, R. N. (1967). Recognition memory for words, sentences and pictures. Journal of
Verbal Learning and Verbal Behavior, 6, 156-163.
Simons, D. J. (2000). Current approaches to change blindness. Visual Cognition, 7, 1-15.
Simons, D. J., Chabris, C. F., Schnur, T. & Levin, D. T. (2002). Evidence for preserved
representations in change blindness. Consciousness and Cognition, 11, 78-97.
Standing, L. (1973). Learning 10.000 pictures. Quarterly Journal of Experimental
Psychology, 25, 207-222..
Standing, L., Conezio, J. & Haber, R. N. (1970). Perception and memory for pictures:
single trial learning of 2500 visual stimuli. Psychonomic Science, 19, 73-74.
Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652 – 654.
Tatler, B. W., Gilchrist, I. D. & Rusted, J. (2003). The time course of
abstract visual representation. Perception, 32, 579-592.
17
Wixted, J. T. (2004). The psychology and neuroscience of forgetting. Annual Review of
Psychology, 55, 235-269.
Wolfe, J. E. (1998). What do you know about what you saw? Current Biology,
8 R303-R304.