Visual memories are stored along a compressed timeline

bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Visual memories are stored along a compressed timeline
Inder Singh
Center for Memory and Brain
Department of Psychological and Brain Sciences
Boston University
Aude Oliva
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Marc W. Howard
Center for Memory and Brain
Department of Psychological and Brain Sciences
Boston University
Abstract
In continuous recognition the recency effect manifests as a decrease in accuracy and a sublinear increase in response time (RT) with the lag of a
repeated stimulus. The recency effect could result from the gradual weakening of mnemonic traces. Alternatively, the recency effect could result from
a search through a compressed timeline of recent experience. These two hypotheses make very different predictions about the shape of response time
distributions. Using highly-memorable pictures to mitigate changes in accuracy enabled a detailed examination of the effect of recency on retrieval
dynamics. The recency at which pictures were repeated ranged over two
orders of magnitude across three experiments. Analysis of the RT distributions showed that the time at which memories became accessible changed
with the recency of the probe, as predicted by a serial search model suggesting that visual memories can be accessed by sequentially scanning along a
compressed representation of the past.
In recognition memory experiments participants must determine whether a probe
stimulus has been previously experienced or not. As the recency of a repeated probe decreases, the accuracy of the judgment decreases and response time (RT) increases (Donkin
& Nosofsky, 2012; Hockley, 1982; Monsell, 1978; Murdock & Anderson, 1975; Shepard &
The complete set of images can be downloaded from http://cvcl.mit.edu/MM/stimuli.html. Address
correspondence to Marc Howard, Center for Memory and Brain, Department of Psychological and Brain
Sciences, 2 Cummington Mall, Boston, MA 02215, [email protected]. Supported by NSF IIS 1631460 and
PHY 1444389 (MWH and IS) and a Google award (AO).
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
2
SCANNING ALONG A COMPRESSED REPRESENTATION
a
b
c
.
..
g3
La
3.8
s
...
Figure 1. Two forms of memory representations that account for the recency effect.
a. Schematic of a continuous recognition experiment. Participants experience a long sequence
of pictures. Their task is to detect occasional repeated pictures. The variable “lag” measures the
difference in position between the two presentations of the repeated picture. b. Cartoon representing
a composite memory representation. Traces of more recent items (alligator) are more prominent than
the traces of less recent items (clock). c. Cartoon representing a compressed timeline. This form
of representation contains information about the order in which items were experienced. Because
the timeline is ordered it can be serially scanned starting at the present (alligator) and going back
through the past. The foreshortening of the image is intended to suggest compression.
Teghtsoonian, 1961). Despite decades of empirical and modeling work, it remains unclear
what changes in the state of memory cause the recency effect in recognition memory.
In continuous recognition, an item is presented at each time step; the participant
1
indicates whether it was previously experienced. Because previously-experienced items must
be identified from a stream of information, continuous recognition is somewhat similar to
the experience of memory in the real world. Consider the task of an individual engaged in
continuous recognition (Figure 1a). In order to correctly identify an item as old, one must
compare it to the contents of their memory. In continuous recognition, the recency effect
manifests as a sublinear increase in RT with increasing lag of the repeated probe (Hintzman,
1969; Hockley, 1982; Okada, 1971); Hockley (1982) found a logarithmic increase in RT with
increasing lag. Why does it take longer to retrieve memories from further in the past?
Many distributed memory models assume that memory is a composite store containing
a noisy record of features from all the studied items (e.g., Anderson, 1973; Murdock, 1982;
Shiffrin, Ratcliff, Murnane, & Nobel, 1993). A composite memory store can account for
the recency effect if the features of items experienced further in the past are stored with
less fidelity than items experienced more recently (Figure 1b). In contrast to the “bag of
features” of a composite memory, another class of models proposes that features are stored
along a timeline of experience (Figure 1c; G. D. A. Brown, Neath, & Chater, 2007; Murdock,
1974; Howard, Shankar, Aue, & Criss, 2015). As an analogy, the memory store behaves
like a conveyor belt that recedes into the past (Murdock, 1974). As each item is presented,
it is placed at the front of the belt; previously-stored items shift back towards the past.1
Both frameworks can accommodate the sub-linear increase in RTs with lag. A composite
1
In this study time per se and number of intervening items are confounded so we will not attempt to
disentangle them.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
3
memory can account for the sub-linear increase if the strength of the match between a
probe and the contents of memory decreases appropriately and this strength is coupled
with a model of information accumulation (e.g., S. Brown & Heathcote, 2005; Ratcliff,
1978; Usher & McClelland, 2001). Information along a timeline can be sequentially accessed
during memory search, which terminates when a match to the probe is found. Critically, if
this timeline is compressed into a logarithmic scale (G. D. A. Brown et al., 2007; Chater &
Brown, 2008; Shankar & Howard, 2013; Howard et al., 2015), then a logarithmic increase in
RT with lag naturally results from this self-terminating search. On a logarithmic scale the
difference between lag 1 and lag 2 is larger than the difference between lag 100 and lag 101.
Rather, the difference between 1 and 2 is equivalent to the difference between 100 and 200.
This property naturally results in a sublinear increase in RT for probes experienced further
in the past.
While these two models cannot be distinguished based on RT alone, they make very
different predictions regarding the shape of the RT distributions (Figure 2). Because all
of the traces are stored together in a single composite representation (Figure 2a), the time
needed to access a composite memory should be the same regardless of how far in the past
the probe was experienced. However, the strength of that match should depend on the
probe’s recency. This is analogous to changing the drift rate in a drift diffusion process
with increasing lag (Donkin & Nosofsky, 2012; Ratcliff, 1978). A composite memory representation suggests a parallel access model in which the RT distributions rise from zero at the
same time but differ systematically in the tail of the distribution as a function of recency.
If memories are aligned along a timeline very different qualitative predictions are possible.
If the timeline is accessed via a self-terminating serial scan, then the time it takes to access
the right memory should depend on the recency of the probe stimulus. Thus this kind of
sequential access model results in RT distributions that start at different times (Figure 2b).
Although the rate of information accumulation may also depend on lag in a scanning model,
a change in the time to initiate the search with lag is a distinctive prediction of a scanning
model.
To the best of our knowledge, a systematic change in the time to initiate the memory search has not been observed in continuous recognition. The main issue in continuous
recognition is that as lag increases, accuracy decreases (Hockley, 1982; Shepard & Teghtsoonian, 1961), making it more difficult to measure the effect of recency on retrieval dynamics
independently of changes in accuracy. Brady, Konkle, Alvarez, and Oliva (2008) showed
participants hundreds of memorable images in a continuous recognition task with lags varying over more than two orders of magnitude, with lags from 1 (no intervening items) to 128.
Because the pictures were highly memorable, there was little variability in accuracy, even at
very long lags. In addition the RT data are minimally affected by sequential dependencies,
which are known to affect RTs in recognition memory (Malmberg & Annis, 2012). Because
of the use of highly memorable pictures, wide range of lags tested, and the elimination of
sequential dependencies, the Brady et al. (2008) is well-suited to study the effect of recency
on RT distributions. This paper analyzes the RT data collected during the Brady et al.
(2008) task (referred to as Experiment 1) and replicated the basic finding twice. Experiment 2 was a straightforward replication; In Experiment 3 a visual mask separated the
images and studied a wider range of lags.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
4
Cumulative reaction time
Accumulated evidence
a
Time
Time
b
.
Cumulative reaction time
Accumulated evidence
..
Time
Time
Figure 2. Distinguishing the two forms of memory representation. a. A composite memory
implies that the items are accessed in parallel. The rate of information accumulation is higher for
more recent probes and the time to start accessing the memory of the probe item does not depend
on its lag. Thus a parallel access hypothesis results in RT distributions that rise at about the same
time but show systematically longer tails as the probe becomes less recent. This trend can be seen
readily in cumulative RT plots (right). b. A timeline can accommodate a self-terminating scanning
hypothesis. Under this hypothesis, more recent probes do not differ in their rate of information
accumulation, but rather the time at which information starts to accumulate. Thus a scanning
hypothesis results in RT distributions that rise at different times but maintain the same shape.
Materials and Methods
1
We analyzed the data collected as a part of the repeat detection task in the visual long
term memory experiment conducted by Brady et al. (2008) using mTurk (Experiment 1)
and we also did two in lab replications of the same task (Experiments 2 & 3). The main
difference between Experiments 2 & 3 is that we used a mask to separate the images and
added repetitions at longer longs in Experiment 3. Categorically distinct images were
obtained from a commercially available database (Hemera Photo-Objects,Vol. I and II) and
through internet searches using Google Image Search. Examples of the images are used in
Figure 1. The images (subtending approximately 7.5◦ by 7.5◦ of visual angle) were presented
one at a time at the center of the screen. Participants were required to respond to repeated
images but not required to respond “no” to new items. The sequence of presentations did
not include successive repetitions. As a result, sequential responses were only made when
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
5
a repetition was followed by a false alarm or vice versa. Because the false alarm rate was
very low, this was quite rare.
Lag was defined as the difference between the position of a repetition and the previous
presentation of that stimulus; immediate repetition thus corresponds to a lag of 1. In
Experiment 1, we analyzed memory for repeated pictures that were presented within the
same block at lags from 1 to 128.2 The way the experiment was constructed, repetitions at
longer lags were somewhat less likely than repetitions at shorter lags. The average number
of observations per participant included in our main analyses ranged from 27 at lag 1 to
9 at lag 128 for Experiment 1. In Experiment 2 the average number of observations per
participant varied from 30 at lag 1 to 16 at lag 128 and in Experiment 3 the average number
of observations per participant varied from 30 at lag 1 to 12 at lag 512.
There is strong evidence suggesting that the time to access immediate repetitions is
very different from the time to access repetitions after intervening stimuli, (e.g., McElree &
Dosher, 1989). In order to ensure that the findings were not driven by immediate repetitions,
we excluded lag 1 from the analyses. Noting the sublinear effect of lag on RT, we perform
statistical analyses on the base 2 logarithm of lag. That is, for the purpose of statistical
analyses lags 2, 4, 8, 16 . . . are given values 1, 2, 3, 4 . . . . This means that when a regression
coefficient is reported for lag, it is interpretable as the change associated with doubling of
lag. For simplicity of exposition, in some cases we will not explicitly state “base 2 log of
lag” but simply refer to “lag” in describing statistical analyses.
Some of the images were repeated multiple times during the course of the experiment.
Only the first repetitions are included in the first round of analyses. Second repetitions are
analyzed later in the subsection entitled “Analysis of second repetitions”. The specific
differences between the three experiments are listed below.
Experiment 1
The accuracy data from Experiment 1 were originally reported in Brady et al. (2008),
which provides a detailed description of the methods. During this task a total of 2896
images (2,500 new and 396 repeated images) were shown to 14 participants across 10 study
blocks of approximately 20 minutes each. The images were presented for 3 s each, followed
by an 800 ms fixation cross.
Experiment 2 & 3
Experiments 2 and 3 were implemented in PyEPL (Geller, Schlefer, Sederberg, Jacobs,
& Kahana, 2007). Participants from Boston University were recruited for one session each
and were paid $15 per hour for their time. In Experiment 2, a total of 900 images (650
new and 250 repeated images) were shown to 35 participants across 2 study blocks of
approximately 20 minutes each. In Experiment 3, a total of 1360 images (1084 new and
276 repeated images) were shown to 39 participants across 2 study blocks of approximately
35 minutes each. The images were presented for 2.6 s each, followed by a 400 ms of cross
for Experiment 2. In Experiment 3 the images were separated by phase scrambled images
as masked separators instead of the cross.
2
We excluded lag 256 in Experiment 1, which was repeated fewer than 5 times per participant.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
6
Model fitting
In addition to standard distributional measures, we also characterized RT distributions using the shifted Wald distribution. The shifted Wald distribution gives the finishing
times for a drift diffusion process (Wald, 1947) with one absorbing boundary. This is appropriate for this dataset because the participants only provided “yes” responses. The density
function of the shifted Wald distribution is described by three parameters, µ, λ, and σ and
is given by:
f (t; µ, λ, σ) =
λ
{λ − µ (t − σ)}2
exp
−
2 (t − σ)
2π (t − σ)3
(1)
The parameter µ describes the rate of information accumulation, i.e., the drift rate. The
parameter λ describes the distance of the boundary from the starting point of the diffusion
process and σ describes the non-decision time before information begins to accumulate.
To determine if there was a significant effect of the time at which information accumulation begins, we considered a model where σ and µ were allowed to vary freely as a
function of lag.3 Since we are fitting three parameters per participant for each lag, we only
included the subjects who had at least four correct responses per lag. All participants in
Experiments 1 and 2 met this threshold. In Experiment 3, this cut-off was met by 32 out
of 39 participants.
Log likelihood of each response was computed using the analytic expression in Eq. 1
for each participant assuming responses are independent. The optimization function tried to
minimize the negative log likelihood of the data given the parameters varying for each model
using the Nelder-Mead algorithm. To avoid local minima, each parameter optimization was
run from multiple starting points and the parameters from one iteration were passed back
to the optimizer until the parameter values stopped changing.
Results
There was a recency effect on accuracy, but hit rate remained high at all lags
In all three experiments, accuracy decreased with increasing lag, but remained quite
high (Figure 3a). In Experiment 1, the overall hit rate was .95, compared to a false alarm
rate (incorrect detection of new images) of .01, corresponding to a d0 of about 4. Hit rate
decreased with lag; at lag 128, the hit rate was .89 corresponding to a d0 of about 3.5. In
Experiment 2, the overall hit rate was .89 and the false alarm rate was .03, corresponding
to a d0 of about 3. The hit rate varied from 0.96 (d0 = 3.57) for immediate repetitions
to .69 (d0 = 2.3) for a lag of 128. In Experiment 3 the overall hit rate was .81 and the
false alarm rate was .03, corresponding to a d0 of about 2.8. The hit rate for a lag 1 was
0.92 (d0 = 3.28) and dropped to 0.57 (d0 = 1.9) for lag 512. Overall, the base 2 logarithm
of lag was a significant predictor of the hit rate, across the three experiments (see slope of
Hit rates in Table 1). This drop in hit rate seemed to accelerate at longer lags.
3
Note that it is not sensible to allow boundary separation to vary as a function of recency—this would
require that the participant know the recency of the probe before information begins to accumulate.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
7
SCANNING ALONG A COMPRESSED REPRESENTATION
a
b
c
Hit rate
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.6
●
●
0.4
0.2
Exp 1
Exp 2
Exp 3
0.0
1
2
4
8
16
Lag
64
1st Decile by Lag
1.1
●
●
●
0.8
Hit Rate
●
256
1.1
1.0
0.9
●
●
●
●
●
●
●
0.7
0.5
●
●
0.8
0.6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Exp 1
Exp 2
Exp 3
●
0.4
1
2
4
8
16
Lag
64
256
Response Time (s)
●
●
Response Time (s)
1.0
Median RT by Lag
1.0
0.9
●
●
0.8
0.7
●
●
●
●
●
●
●
●
●
●
●
4
8
16
●
●
●
●
●
0.6
0.5
●
●
●
●
Exp 1
Exp 2
Exp 3
●
●
0.4
1
2
●
●
64
256
Lag
Figure 3. a. Hit Rate as a function of lag on log2 paper for Experiments 1, 2, and 3. The hit
rate goes down with lag. There appears to be a more pronounced drop in the hit rate at higher
lags. b. Median response time as a function of lag on log2 paper. Median response time increased
approximately linearly with the logarithm of lag. c. The 1st decile as a function of lag on log2
paper. Error bars in all figures represent the 95% confidence interval of the mean across participants
normalized using the method described in Morey (2008).
Non-parametric measures of RT distributions showed that more recent memories were accessed more quickly
The median RT increased as a function of (base 2 logarithm of) lag in all three
experiments (Figure 3b). Allowing for independent intercepts for each participant using
linear mixed effects model, we found that lag was a significant predictor of the median
response time across Experiment 1, 2 & 3 (Table 1).
A serial self-terminating model predicts RT distributions that change systematically
with lag in terms of their starting positions (Figure 2b). This implies that the difference
due to lag should be observable at the earliest parts of the distribution. Examination of the
cumulative RT distributions (Figure 4) appeared to show a systematic change in the starting
position of the distributions as a function of lag across all three experiments. In order to
quantify this visual impression, we analyzed the 1st decile of the response time distribution.
Across the three experiments, the slope of the 1st decile was significant. Notably, the slope
of the 1st decile was comparable over the three experiments, with overlapping confidence
intervals in Experiments 1 & 3. The value of the slope indicates that each doubling of lag
resulted in a shift of about 20 ms in the distribution.
Model-based analyses of RT distributions showed that more recent memories were accessed
more quickly
The model-based analyses of RT distributions confirmed the impression from nonparametric statistics that the RT distributions shifted with increasing lag. Figure 5 summarizes the results for the shift and drift parameters of the shifted Wald distribution;
statistics are shown in Table 1. To summarize the results, the shift parameter showed a
consistent effect of lag across all three experiments. The intercept for the shift parameter,
about 400 ms, was similar for the three experiments despite large changes in median RT. All
three experiments showed reliable slopes demonstrating that the shift parameter σ changed
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
8
SCANNING ALONG A COMPRESSED REPRESENTATION
a
Experiment 2
0.8
0.6
0.4
0.2
0.0
0.3
0.6
0.9
1.2
Experiment 3
1.0
Cumulative probability
Cumulative probability
Cumulative probability
Experiment 1
1.0
0.8
0.6
0.4
0.2
0.0
1.5
0.3
Response Time (s)
0.6
0.9
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1.5
0.3
0.6
Response Time (s)
0.9
1.2
Response Time (s)
b
Figure 4. Time to access memory changed systematically with lag in all three experiments. a. Unsmoothed across-participant cumulative response distributions for each lag. Shorter
lags are darker shades. Note that the cumulative distributions shift with decreasing recency. The
three panels are Experiments 1-3 (left to right). Compare to Figure 2 (right). b. Inset to make the
shift more readily apparent.
a
b
Shift parameter
Boundary/Drift rate parameter
0.7
0.7
0.6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
seconds
seconds
●
0.4
0.6
●
●
0.5
0.3
0.2
Exp 1
Exp 2
Exp 3
0.1
0.0
2
4
8
16
Lag
64
256
●
●
●
●
0.5
●
●
●
0.4
●
●
●
0.3
●
●
●
●
●
●
●
●
●
●
●
●
0.2
Exp 1
Exp 2
Exp 3
0.1
0.0
2
4
8
16
64
256
Lag
Figure 5. The time to access memory, estimated by the shift parameter of the Wald
distribution, changed systematically with lag and was consistent across the three experiments. a. The shift parameter of the Wald distribution as a function of lag on log2 paper for
the three experiments. b. The drift parameter of the Wald distribution as a function of lag across
the three experiments.
1.5
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
Hit Rate
Intercept
Slope
Median
Intercept
Slope
1st decile
Intercept
Slope
Shift
Intercept
Slope
Drift
Intercept
Slope
Shifttill lag 32
Intercept
Slope
Drifttill lag 32
Intercept
Slope
Experiment 1
1.02 ± 0.01
t(83) = 82.12∗∗
−0.014 ± 0.002
t(83) = −6.46∗∗
0.93 ± 0.04
t(83) = 22.22∗∗
0.013 ± 0.003
t(83) = 4.54∗∗
0.73 ± 0.03
t(83) = 22.05∗∗
0.023 ± 0.002
t(83) = 9.29∗∗
0.37 ± 0.06
t(83) = 6.72∗∗
0.022 ± 0.003
t(83) = 6.66∗∗
0.57 ± 0.07
t(83) = 7.91∗∗
−0.003 ± 0.005
t(83) = −0.72, ns
0.36 ± 0.06
t(55) = 6.57∗∗
0.024 ± 0.005
t(55) = 5.15∗∗
0.59 ± 0.07
t(55) = 8.23∗∗
−0.011 ± 0.006
t(55) = −1.94
Experiment 2
0.99 ± 0.02
t(209) = 48.64∗∗
−0.032 ± 0.003
t(209) = −11.46∗∗
0.58 ± 0.01
t(209) = 43.87∗∗
0.021 ± 0.002
t(209) = 13.27∗∗
0.493 ± 0.008
t(209) = 59.28∗∗
0.017 ± 0.001
t(209) = 15.21∗∗
0.394 ± 0.009
t(209) = 42.21∗∗
0.015 ± 0.001
t(209) = 13.28∗∗
0.23 ± 0.02
t(209) = 14.74∗∗
0.011 ± 0.002
t(209) = 4.46∗∗
0.401 ± 0.009
t(139) = 44.2∗∗
0.012 ± 0.002
t(139) = 7.79∗∗
0.24 ± 0.02
t(139) = 14.89∗∗
0.004 ± 0.003
t(139) = 1.37
9
Experiment 3
0.99 ± 0.02
t(255) = 43.89∗∗
−0.045 ± 0.002
t(255) = −18.62∗∗
0.598 ± 0.02
t(255) = 30.61∗∗
0.032 ± 0.002
t(255) = 17.52∗∗
0.52 ± 0.01
t(255) = 37.43∗∗
0.022 ± 0.001
t(255) = 17.65∗∗
0.41 ± 0.01
t(255) = 33∗∗
0.019 ± 0.002
t(255) = 12.67∗∗
0.25 ± 0.02
t(255) = 11.41∗∗
0.014 ± 0.002
t(255) = 6.23∗∗
0.43 ± 0.01
t(127) = 40.35∗∗
0.011 ± 0.002
t(127) = 5.14∗∗
0.28 ± 0.02
t(127) = 12.16∗∗
0 ± 0.004
t(127) = 0.06
Table 1: The slopes and intercepts of the empirical and model fits across the three experiments.
Statistical tests were done on the base 2 logarithm of lag; regression coefficients with respect to lag
are understandable as the rate per doubling of lag. Effects significant at p < 0.001 are indicated by
∗∗
.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
10
systematically as a function of lag. The regression coefficients were similar for the three
experiments (the confidence intervals for Experiments 1 and 3 overlapped); each doubling
of lag shifted the time to access memory by about 15-20 ms.
To assess the effect of lag on µ in similar units, we took the λ divided by µ4 as a
dependent measure. In contrast the drift parameter µ did not show a consistent trend across
experiments (see Figure 5).
Qualitatively, it seems that the drift parameter did not change reliably at relatively
short lags. In order to assess this, noting that lag 32 was the point at which hit rate began
to fall off more abruptly (Fig. 3a), we recomputed statistics for shift and drift for lags 2-32.
The results are in Table 1. Over this range of lags, the shift parameter still showed a reliable
linear trend, whereas the drift parameter did not. Moreover, the confidence intervals for
the regression coefficients did not overlap in any of the three experiments, demonstrating
that for lags from 2 to 32 there was a differential effect of recency on the shift parameter
vs the drift parameter. Over this range of lags, corresponding to delays of up to about a
minute and a half, the effect of recency was solely carried by the shift parameter.
We find converging evidence in favor of a systematic change in the non-decision time
with lag from the model based analyses. The shift parameter has a significant slope across
the three experiments. Thus the evidence from non-parametric analyses and model-based
analyses strongly support the hypothesis that the time necessary to access visual memories
changes with recency.
Analysis of second repetitions argued against a non-scanning account of the results
Analysis of the first repetitions showed evidence that recency affects RT distributions
via a shift in the distribution, consistent with a change in the time to access memory
as predicted by a self-terminating search model. While a shift in the RT distribution is
consistent with a serial self-terminating search during the memory comparison phase, this
is not the only possible explanation. Presumably, the probe must be encoded before it
can be compared to memory. The shift in the RT distributions is also consistent with the
hypothesis that encoding of recently-experienced probes is facilitated but that there is no
effect of recency on the memory-comparison stage per se (Sternberg, 1969).
If the recency of a repeated item allows it to be processed faster as a probe, then
repeating the item again should have an additional effect on RT. Thus far we have examined
RTs to the first repetition (second presentation) of the probe stimulus. In order to evaluate
the hypothesis that the recency effect was attributable to processing fluency, we examined
RTs to the second repetition (third overall presentation). If changes in RTs were being
driven by greater facility of processing of recently-presented probes, then the lags of the
two presentations ought to both affect RT. In contrast, if the changes in RT are driven by a
self-terminating scanning model during the memory comparison stage, then only the most
recent lag should affect RT. Scanning further predicts that the effect of the most recent lag
should be the same on both the first and second presentations and that in both cases the
effect of lag should be to cause a shift of the RT distribution. All three of the predictions
of a self-terminating scanning model held across all three experiments.
4
The drift rate µ is in units of evidence per unit time. The boundary separation λ is in units of evidence.
λ/µ thus has units of time, making it directly comparable to σ.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
11
SCANNING ALONG A COMPRESSED REPRESENTATION
b
c
Experiment 1
●
●
●
●
●
●
0.9
0.8
0.7
0.6
0.5
Rep once
Rep twice
0.4
2
4
8
16
Lag
32
64
128
Rep once
Rep twice
1.0
0.9
0.8
0.7
●
●
0.6
1st Decile by Lag
Experiment 3
●
0.5
0.4
1.1
Response Time (s)
1.1
●
Response Time (s)
Response Time (s)
1.1
1.0
d
Experiment 2
Rep once
Rep twice
1.0
0.9
0.8
0.7
●
●
●
0.6
0.5
4
16
Lag
1.0
0.9
●
●
0.8
●
●
●
2
4
16
Lag
●
●
0.7
0.6
●
0.5
●
●
●
●
●
0.4
0.4
2
1.1
Response Time (s)
a
2
4
Exp 1
Exp 2
Exp 3
16
Lag
Figure 6. The effect of lag on RT was the same for first and second repetitions. a-c.
Median response time as a function of the most recent lag for first (solid) and second (repetitions)
for the three experiments. To the extent the lines are parallel, it means that the effect of recency on
RT was the same on the first and second presentations. Analyses reported in the text demonstrated
that only the most recent lag affected the RT of old probes repeated twice. d. The 1st decile of the
second repetitions as a function of lag on log2 paper. The results are comparable to those for first
repetitions (Fig. 3c). Distributions for second repetitions are shown in Fig. 7.
Accuracy was very high for probes that were repeated a second time. In Experiment 1,
49 images that were repeated a second time and the overall hit rate for the second repetitions
was 0.99 ± 0.006. In Experiments 2 and 3, 54 images were presented a third time and the
overall hit rates were 0.98 ± 0.01 and 0.96 ± 0.01 respectively.
Only the most recent lag affected RT on second repetitions. There are two lags associated with the third presentation of a probe. Let us refer to the lag between the first and
second presentation as lag1 and the lag between the second and third presentation as lag2 , so
that at the third presentation lag2 is the most recent lag. Allowing each participant to have
an independent intercept (to account for between participant variability) in a linear mixed
effects model using the base 2 logarithm of the two lags as regressors, there were reliable
effects of lag2 but no effect of lag1 in all three experiments. In Experiment 1 lag2 showed
a reliable effect, .016 ± .005, t(600) = 3.39, p < .01; but no effect of lag1 , −.003 ± .005,
t(600) = −.60. In Experiment 2, lag2 showed a reliable effect, .012 ± .003, t(1812) = 3.84,
p < .01, but lag1 did not, −.001 ± .003, t(1812) = .02. In Experiment 3, the same pattern
held, lag2 : .015 ± .004, t(1623) = 3.44, p < .01; lag1 , −.007 ± .005, t(1623) = 1.5.
The effect of lag on RT was the same for first and second repetitions. In the analysis
shown above, RT to images repeated a third time depended on the most recent lag (second
repetition) and did not depend on the lag to the first repetition. A scanning model predicts
further that the effect of lag2 on the second repetition should be the same as the effect of
lag on the initial repetition. Figure 6a-c summarize these comparisons. Across all three
experiments, plots of RT as a function of lag (lag2 for repeated stimuli) showed parallel
curves for first and second repetitions on log2 paper. This visual impression was confirmed
by a multiple regression with lag, repetition and an interaction term of lag and repetition.
In all the three experiments, there was a significant effect of lag and repetition but the
interaction term was not significant (statistics are in Table 2).
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
12
SCANNING ALONG A COMPRESSED REPRESENTATION
a
b
c
Experiment 2
0.8
0.6
0.4
0.2
0.0
0.3
0.6
0.9
1.2
Experiment 3
1.0
Cumulative probability
1.0
Cumulative probability
Cumulative probability
Experiment 1
0.8
0.6
0.4
0.2
0.0
1.5
Response Time (s)
0.3
0.6
0.9
1.2
Response Time (s)
1.5
1.0
0.8
0.6
0.4
0.2
0.0
0.3
0.6
0.9
1.2
1.5
Response Time (s)
Figure 7. For probes repeated a second time, the most recent lag affected the time
to access memory. Unsmoothed across-participant cumulative response distributions for second
repetitions are shown for the most recent lag. Shorter lags are darker shades. The inset zooms in
on the time at which the distributions start rising from 0. a. Experiment 1. b. Experiment 2. c.
Experiment 3.
Exp
1
2
3
Lag
0.013 ± 0.004
t(179) = 3.47∗∗
0.013 ± 0.002
t(172) = 5.33∗∗
0.010 ± 0.002
t(57) = 3.78∗∗
Repetition
−0.07 ± 0.02
t(179) = −2.96∗
−0.068 ± 0.009
t(172) = −7.31∗
−0.08 ± 0.01
t(157) = −7.96∗
Interaction
0.004 ± 0.005
t(179) = 0.84
−0.004 ± 0.004
t(172) = −1.02
−0.003 ± 0.004
t(157) = 0.75
Table 2: The RTs to second repetitions are faster than to first repetitions but both repetitions show
the same effect of lag. Results of a multiple regression analysis with (base 2 logarithm of) lag,
repetition and their interaction. See also Figure 6a-c. Effects significant at p < 0.001 are indicated
by ∗∗ and at p < 0.01 are indicated by ∗ .
RT distributions shifted as a function of recency on second repetitions. A scanning
account predicts that the effect of lag on second repetitions should not only be of a similar
magnitude as the effect of lag on first repetitions, but that in both cases the effect should
be associated with a shift in the distributions. Figure 7 shows the cumulative RT distributions for second repetitions for different lags. As in the initial repetitions (Fig. 4), the
RT distributions appeared to shift with lag. This visual impression was confirmed by nonparametric analysis.5 Figure 6d shows the 1st decile of the RT distributions as a function
of lag for all three experiments. A regression of the 1st decile of the RT for the second
repetitions onto lag was significant across the three experiments (Exp. 1: 0.015 ± 0.003
,t(83) = 4.5, p < 0.001; Exp. 2: 0.010 ± 0.001, t(69) = 6.9, p < 0.001; Exp. 3: 0.012 ± 0.002,
t(63) = 5.08, p < 0.001).
To summarize, while there was an effect of the lag to the most recent presentation of
5
Because the number of responses per lag to second repetitions was relatively small, we did not attempt
model-based analyses of second repetitions.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
13
the probe stimulus, there was no additional effect of its prior presentation. The response
time distributions for the second repetitions show a systematic change in the time at which
cumulative distributions rise from zero and the effect of multiple repetitions manifests itself
in the intercept term of the response time vs. lag line. These observations fail to provide
any evidence for the hypothesis that the effect of recency on RT was due to a facilitation
of probe encoding. In contrast, these findings are exactly as one would have predicted if
the effect of recency on RT was caused by serial self-terminating scanning of a compressed
representation of the past.
Discussion
It has long been known that response time in a continuous recognition experiment
increases with the lag to the probe. If memory search requires scanning of a timeline to
find the appropriate memory, then the increase in RT should be associated with a shift in
the distribution. Three experiments showed that lag consistently shifted RT distributions.
This effect was observable both using non-parametric analyses of RT distributions (Fig. 4)
as well as a model-based analysis (Fig. 5). Consistent with the scanning account, RTs to
second repetitions depended only the most recent lag, as if the search terminated when
the first match is found. Although RTs were faster to second repetitions overall, the effect
of the most recent lag on second repetition RTs was the same as the effect of lag on first
presentation RTs (Fig. 6a-c). This effect on second repetitions was manifest as a shift of
the RT distributions (Fig. 7).
Consistent with earlier findings (e.g., Hockley, 1982) the results of these experiments
showed a sublinear shift with lag. The results of this paper are roughly consistent with
a logarithmic increase in RT as a function of lag; each doubling of lag resulted in a shift
of approximately 15-20 ms in the RT distribution. We observed a continuous shift in
RT distributions from lags covering a few seconds up to lags of more than 20 minutes.
The results of this study are consistent with scanning along a logarithmically-compressed
timeline.
There is extensive evidence for self-terminating serial search models in short-term
memory tasks (Hacker, 1980; Hockley, 1984; McElree & Dosher, 1993; Sternberg, 2016).
There is also evidence consistent with scanning along a timeline in JOR tasks over longer
time scales (Hintzman, 2010). However, most previous studies of item recognition have
found evidence for parallel access to memory, not sequential scanning, in study-test recognition (Nosofsky, Little, Donkin, & Fific, 2011; Ratcliff & Murdock, 1976; McElree & Dosher,
1989; Hockley, 1984; Nosofsky, Cox, Cao, & Shiffrin, 2014). Several potentially important
methodological differences may account for the discrepancy between those studies and the
results in this paper. This experiment used continuous recognition rather than the studytest procedure (Nosofsky et al., 2011; Ratcliff & Murdock, 1976; McElree & Dosher, 1989;
Hockley, 1984; Nosofsky et al., 2014) and highly memorable trial-unique visual stimuli. In
addition, this study only required positive responses to repeated stimuli and more recent
lags were tested more frequently than more remote lags. The question of which combination
of these methodological differences accounts for the evidence for serial scanning is an extremely important one that merits further investigation. It is worth noting that there is no
reason in principle that a compressed timeline could not be accessed in parallel (Howard et
al., 2015), whereas it is not clear how (or why) a composite representation could be scanned
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
14
in a recognition memory task. Notably, if a logarithmically-compressed timeline is accessed
in parallel, the recency effect for strength of match would fall off like a power law (Howard
et al., 2015), much like the change Donkin and Nosofsky (2012) observed experimentally in
drift rate.
On its face, serial scanning of a compressed timeline requires a more elaborate memory
representation than a simple composite memory. However, it also suggests a deep analogy
between search through memory and directed attention along perceptual dimensions. Neural receptive fields in vision form a compressed representation of retinal space, with broader
receptive fields at locations further from the fovea (Hubel & Wiesel, 1974; Schwartz, 1977).
By analogy, “time cells” in the hippocampus (MacDonald, Lepage, Eden, & Eichenbaum,
2011), prefrontal cortex (Tiganj, Kim, Jung, & W., in press) and striatum (Mello, Soares,
& Paton, 2015) can be understood as constructing a compressed timeline. We can deploy
attention strategically to sensory dimensions resulting in preferential access to information
available along those dimensions (e.g., Teder-Sälejärvi, Münte, Sperlich, & Hillyard, 1999;
Shomstein & Yantis, 2004). To the extent both perception and memory make use of the
same form of compressed neural representation, scanning in memory can be understood as
exploiting the same kind of computational mechanisms used to direct visual attention.
References
Anderson, J. A. (1973). A theory for the recognition of items from short memorized lists. Psychological Review , 80 , 417-438.
Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual long-term memory has a
massive storage capacity for object details. Proceedings of the National Academy of Sciences,
105 (38), 14325–14329.
Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model of memory. Psychological
Review , 114 (3), 539-76.
Brown, S., & Heathcote, A. (2005). A ballistic model of choice response time. Psychological Review ,
112 (1), 117-28. doi: 10.1037/0033-295X.112.1.117
Chater, N., & Brown, G. D. A. (2008). From universal laws of cognition to specific cognitive models.
Cognitive Science, 32 (1), 36-67. doi: 10.1080/03640210701801941
Donkin, C., & Nosofsky, R. M. (2012). A power-law model of psychological memory strength in
short- and long-term recognition. Psychological Science. doi: 10.1177/0956797611430961
Geller, A. S., Schlefer, I. K., Sederberg, P. B., Jacobs, J., & Kahana, M. J. (2007). PyEPL: a
cross-platform experiment-programming library. Behavior Research Methods, 39 (4), 950-8.
Hacker, M. J. (1980). Speed and accuracy of recency judgments for events in short-term memory.
Journal of Experimental Psychology: Human Learning and Memory, 15 , 846-858.
Hintzman, D. L. (1969). Recognition time: Effects of recency, frequency and the spacing of repetitions. Journal of Experimental Psychology, 79 (1p1), 192.
Hintzman, D. L. (2010). How does repetition affect memory? Evidence from judgments of recency.
Memory & Cognition, 38 (1), 102-15.
Hockley, W. E. (1982). Retrieval processes in continuous recognition. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 8 , 497-512.
Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 10 (4), 598-615.
Howard, M. W., Shankar, K. H., Aue, W., & Criss, A. H. (2015). A distributed representation of
internal time. Psychological Review , 122 (1), 24-53.
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
15
Hubel, D. H., & Wiesel, T. N. (1974). Uniformity of monkey striate cortex: a parallel relationship between field size, scatter, and magnification factor. Journal of Comparative Neurology,
158 (3), 295-305. doi: 10.1002/cne.901580305
MacDonald, C. J., Lepage, K. Q., Eden, U. T., & Eichenbaum, H. (2011). Hippocampal “time cells”
bridge the gap in memory for discontiguous events. Neuron, 71 , 737-749.
Malmberg, K. J., & Annis, J. (2012). On the relationship between memory and perception: Sequential dependencies in recognition memory testing. Journal of Experimental Psychology:
General , 141 (2), 233-59. doi: 10.1037/a0025277
McElree, B., & Dosher, B. A. (1989). Serial position and set size in short-term memory: The time
course of recognition. Journal of Experimental Psychology: General , 118 , 346-373.
McElree, B., & Dosher, B. A. (1993). Serial recovery processes in the recovery of order information.
Journal of Experimental Psychology: General , 122 , 291-315.
Mello, G. B., Soares, S., & Paton, J. J. (2015). A scalable population code for time in the striatum.
Current Biology, 25 (9), 1113–1122.
Monsell, S. (1978). Recency, immediate recognition memory, and reaction time. Cognitive Psychology, 10 , 465-501.
Morey, R. D. (2008). Confidence intervals from normalized data: A correction to cousineau (2005).
Tutorial in Quantitative Methods for Psychology, 4 (2), 61–64.
Murdock, B. B. (1974). Human memory: Theory and data. Potomac, MD: Erlbaum.
Murdock, B. B. (1982). A theory for the storage and retrieval of item and associative information.
Psychological Review , 89 , 609-626.
Murdock, B. B., & Anderson, R. E. (1975). Encoding, storage and retrieval of item information. In
R. L. Solso (Ed.), Information Processing and Cognition: The Loyola Symposium (p. 145-194).
Hillsdale, New Jersey: Erlbaum.
Nosofsky, R. M., Cox, G. E., Cao, R., & Shiffrin, R. M. (2014). An exemplar-familiarity model
predicts short-term and long-term probe recognition across diverse forms of memory search.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 40 (6), 1524.
Nosofsky, R. M., Little, D. R., Donkin, C., & Fific, M. (2011). Short-term memory scanning viewed
as exemplar-based categorization. Psychological Review , 118 (2), 280-315.
Okada, R. (1971). Decision latencies in short-term recognition memory. Journal of Experimental
Psychology, 90 (1), 27.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review , 85 , 59-108.
Ratcliff, R., & Murdock, B. B. (1976). Retrieval processes in recognition memory. Psychological
Review , 83 (3), 190-214.
Schwartz, E. L. (1977). Spatial mapping in the primate sensory projection: analytic structure and
relevance to perception. Biological Cybernetics, 25 (4), 181-94.
Shankar, K. H., & Howard, M. W. (2013). Optimally fuzzy scale-free memory. Journal of Machine
Learning Research, 14 , 3753-3780.
Shepard, R. N., & Teghtsoonian, M. (1961). Retention of information under conditions approaching
a steady state. Journal of Experimental Psychology, 62 , 302-309.
Shiffrin, R. M., Ratcliff, R., Murnane, K., & Nobel, P. (1993). TODAM and the list-strength and
list-length effects: A reply to Murdock and Kahana. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 19 , 1445-1449.
Shomstein, S., & Yantis, S. (2004). Control of attention shifts between vision and audition in human
cortex. The Journal of Neuroscience, 24 (47), 10702–10706.
Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders’ method. Acta
Psychologica, 30 , 276-315.
Sternberg, S. (2016). In defence of high-speed memory scanning. Quarterly Journal of Experimental
Psychology, 69 , 2020-2075.
Teder-Sälejärvi, W. A., Münte, T. F., Sperlich, F.-J., & Hillyard, S. A. (1999). Intra-modal and
cross-modal spatial attention to auditory and visual stimuli. an event-related brain potential
bioRxiv preprint first posted online Jan. 18, 2017; doi: http://dx.doi.org/10.1101/101295. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SCANNING ALONG A COMPRESSED REPRESENTATION
16
study. Cognitive Brain Research, 8 (3), 327–343.
Tiganj, Z., Kim, J., Jung, M. W., & W., H. M. (in press). Sequential firing codes for time in rodent
mPFC. Cerebral Cortex .
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: the leaky, competing
accumulator model. Psychological Review , 108 (3), 550-92.
Wald, A. (1947). Foundations of a general theory of sequential decision functions. Econometrica,
Journal of the Econometric Society, 279–313.