Variability in the quality of visual working memory

ARTICLE
Received 22 Feb 2012 | Accepted 31 Oct 2012 | Published 27 Nov 2012
DOI: 10.1038/ncomms2237
Variability in the quality of visual
working memory
Daryl Fougnie, Jordan W. Suchow & George A. Alvarez
Working memory is a mental storage system that keeps task-relevant information accessible
for a brief span of time, and it is strikingly limited. Its limits differ substantially across people
but are assumed to be fixed for a given person. Here we show that there is substantial
variability in the quality of working memory representations within an individual.
This variability can be explained neither by fluctuations in attention or arousal over time, nor
by uneven distribution of a limited mental commodity. Variability of this sort is inconsistent
with the assumptions of the standard cognitive models of working memory capacity,
including both slot- and resource-based models, and so we propose a new framework for
understanding the limitations of working memory: a stochastic process of degradation that
plays out independently across memories.
Department of Psychology, Harvard University, William James Hall, 33 Kirkland Street, Cambridge, Massachusetts 02138, USA. Correspondence and
requests for materials should be addressed to D.F. (email: [email protected]).
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
1
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
Here, using variants of the behavioral task outlined, in Figure 1
we show that there is variability in the quality of working memory
within an individual. This variability can be explained neither by
fluctuations in attention or arousal over time, nor by uneven
distribution of a limited mental commodity. We propose that a
complete understanding of the capacity and architecture of
working memory must consider both the allocation of a mental
commodity and the stability of the information that is stored.
Results
Revealing variability in the quality of working memory.
Participants were asked to perform a working memory task using
displays with 1, 3 or 5 colourful dots (Fig. 1). The standard fixedprecision model1 and our variable-precision model was fit to each
participant’s distribution of errors (see Methods). Figure 3
overlays the maximum a posteriori fixed- and variable-precision
models on a histogram of errors made by one of the participants,
for each of three set sizes: 1, 3 and 5. The variable-precision
model provided a better fit to the data for each participant at each
set size. When the two models were compared using the Akaike
Information Criterion (AIC), a measure of goodness of fit, we
found lower AIC values for the variable-precision model for each
Probability
a
–180
–120
–60
0
Error (deg)
60
120
180
b
Probability
W
orking memory is critical to navigation, communication, problem solving and other activities but there are
stark limits to how much we can remember1–9. A goal
of research on working memory is to describe these limitations
and determine their source. Cognitive models typically explain
the capacity of working memory by postulating a mental
commodity that is divided among the items or events to be
remembered1,3,5,9–12. A core assumption of these models is that
the quality (‘fidelity’, ‘uncertainty’) of memories is determined
solely by the amount of the commodity that is allocated to each
item, being otherwise fixed1,5,10–13. Items receiving more of the
commodity are better remembered.
A common approach to measuring the capacity of visual
working memory is through the use of simple behavioral tasks.
In one such task, participants are asked to remember the colours
of a set of colourful dots, and then, after some delay, to report the
colour of a dot selected at random (Fig. 1)1,9. Error is recorded as
the difference between the correct colour and the reported colour.
The distribution of these errors is used to test models of memory.
Standard models1,5,10–13 assume that for each dot in the display,
the participant is in one of two states, either remembering or
forgetting. Importantly, these two states lead to different errors
when the participant is asked to report the dot’s colour: errors for
a forgotten item are uniformly distributed on the colour
wheel, whereas errors for a remembered item cluster around
the correct colour, with a spread that depends on how well it was
remembered. The standard models assume that the form of the
error distribution for remembered items is a circular normal
(von Mises) distribution, with a spread that can be fully
characterized by a single number: its precision, here defined as
the s.d. (ref. 1).
It is possible to relax the assumption that the quality of
memory is fixed, instead supposing that it varies within an
individual. To determine whether such variability is present, we
can inspect the shape of the error distribution. Critically,
variability in the precision of memory within an individual leads
to an error distribution that is more peaked than a circular
normal distribution because it involves mixing error distributions
that differ in precision but have the same mean (Fig. 2). With
sufficient data, it is possible to measure this peakedness to
estimate the extent of high-order variability in the quality of
memory within an individual.
0
6
12
18
24
30
Precision (deg)
36
42
48
c
Probability
Fixed precision
Variable precision
–180
Response screen
900 ms
retention
600 ms
encoding
Figure 1 | Timeline of a working memory task. First, a display of colourful
dots is briefly presented (lower left). This is followed by an interval during
which the participant is asked to hold the colours in mind. Finally, a
response screen appears. On the response screen, the position of a
randomly sampled item is cued and the participant is asked to select the
colour of the item that was presented at that location (upper right).
2
–120
–60
0
Error (deg)
60
120
180
Figure 2 | Fixed and variable-precision models produce different
distributions of error. (a) Each curve shows the distribution of precision
(the s.d. of error on the memory task) for a different fixed-precision model.
(b) A variable-precision model can be thought of as a higher-order
distribution of fixed-precision models. One such higher-order distribution is
shown here; it is a truncated normal with a mean of 241 and a s.d. of 71,
bounded at 01 and 1001. (c) Variable-precision models have different
distributions of error. The green line is the distribution of errors for a model
whose precision is fixed at 241. (The green distribution in panels a and c are
the same.) The red line is the expected distribution of errors if precision
were drawn from the full distribution in panel b. Mixing together
distributions with different precision produces a peaked shape that is a
hallmark of the variable-precision model.
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
participant at each set size (Supplementary Table S1); the average
magnitude of these AIC differences (7, 11.3 and 12) is a decisive
victory for the variable-precision model, leaving the fixedprecision model with ‘essentially no support’ from the data14.
Furthermore, the amount of variation in memory precision
increased with set size for each participant (Fig. 4). The average
precision across subjects was 11.61, 17.41 and 23.71 for set sizes 1,
3 and 5, respectively. The average s.d. of precision, a measure of
variability in memory quality, was 3.01, 5.41 and 8.91. These
results were obtained with a variable-precision model in which
the variability in precision was assumed to follow a truncated
normal distribution. Evidence of variable quality was also found
when substituting a gamma distribution over the inverse of the
variance, similar to ref. 15 (average inverse variance 0.0091,
a
Frequency
Fixed precision
Variable precision
–180
1 item
–120
–60
0
Error (deg)
60
120
180
Frequency
b
–180
3 items
–120
–60
0
Error (deg)
60
120
180
Frequency
c
–180
5 items
–120
–60
0
Error (deg)
60
120
180
Figure 3 | Distinguishing between models with fixed and variable
precision. Each panel shows a histogram of errors on the working memory
task and the best-fit fixed precision (green line) and variable-precision (red
line) model. The working memory task used 1, 3 or 5 items (panels a, b and
c, respectively). These data are from a representative participant.
b
c
Probability
Probability
1 item
3 items
5 items
Determining the source of the variability. There are many
possible sources of variability, including both true variability in
memory precision as well as variability from other sources, such
as perceptual noise, eye movements and differences in the
memorability of certain colours or locations. A series of control
experiments were performed that demonstrate that these other
sources made only a negligible contribution to the observed
variability in memory quality.
When the memory demands of the task were removed by
keeping the stimulus present while the participant responded,
variability was eliminated and the data were better fit by a
fixed-precision model (Supplementary Table S2). Even when the
average error on the perception task was increased to match that
of the memory task (Supplementary Methods), variability was
still smaller (2.21 versus 8.91, t(4) ¼ 3.86, Po0.05, independent
samples t-test).
A further experiment showed that there were negligible differences in the memorability of different colours (Supplementary
Methods). We measured the variation in precision for six colour
bins, each spanning 301 of the colour wheel (0–301, 60–901, y,
300–3301). Even though the range of colours within each bin was
now only a small subset of the colour wheel, we still found
substantial variability within a bin (Supplementary Fig. S1). In
fact, the variability within a bin was comparable to that in the full
data set (Supplementary Fig. S1c, one-sample t-tests, all t(5)’s
o0.5, all P40.65 for both participants). Furthermore, the
variation in estimates of precision across bins was minimal and
accounts for at most a small (3.9%) portion of the total variance.
Similarly, we tested whether performance differed depending
on the location of the tested item, a possible contributor to the
observed variability in memory. For both the three- and five-item
displays, we found little variance in the estimated precision across
locations (4.15 deg2 for three-item displays, 3.24 deg2 for five-item
displays). (This is an upper bound on the contribution to variance
because it includes measurement error.) Furthermore, the
observed variance explains only 12.7% and 3.8% of the total
variance for the three-item and five-item conditions, respectively.
Representational quality could differ for items that are fixated
compared with items that are not. The encoding duration used in
Experiment 1 was long enough to allow for a saccade, therefore
raising the possibility that variability in precision is due entirely to
imbalances in encoding introduced by eye movements. We found
that memories are variably precise even when eye movements are
not possible. Two participants were asked to remember the colour
of three items presented for 100 ms. Although this encoding
duration is too short for an eye movement, we nonetheless found
Probability
a
0.0051, 0.0031 deg ! 2; average s.d. of inverse variance, 2.3 " 10 ! 5,
1.1 " 10 ! 5, 9.0 " 10 ! 6 deg ! 2).
S1
0
20
40
60
Precision (deg)
80
S2
0
20
40
60
80
S3
0
Precision (deg)
20
40
60
80
Precision (deg)
Figure 4 | Variability in the quality of working memory depends on the load. The estimated distribution of precision according to the best-fitting variableprecision model for set sizes 1 (red), 3 (green) and 5 (blue), for each of three subjects (panels a–c). The bars along the horizontal axis show the precision
values of the best-fit fixed-precision models.
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
3
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
4
Probability
Random probe
–180
–120
–60
0
60
120
180
Error (deg)
b
Probability
Observed
Choose the best
–180
–120
–60
0
Error (deg)
60
c
120
180
Simulated
Choose the best
Probability
There is variability between items within a trial. In the following experiments, we show that this variability can be
explained neither by state-based fluctuations in attention or
arousal across trials16, nor by the uneven allocation of a mental
commodity within a trial. This pattern of results is not easily
captured by models of working memory in which the quality of a
memory is determined solely by the allocation of a finite
commodity.
In a new experiment, on half of the trials participants were
given the opportunity to report the colour of the item they
remembered best. On the other half of the trials, they reported the
colour of an item that was selected for them at random. These
two conditions were intermixed in random order. Note that only
variability within a trial can contribute to the participant’s
decision in this task. Therefore, if this variability is accessible to
the participant, precision will be better for the best-remembered
item than for one selected at random. This enables us to measure
the extent to which the quality of memory varies within versus
across trials.
When probed on a random item, the mean precision was
19.9±1.21 (s.e.m.) and its s.d. was 6.2±0.601 (Fig. 5a). However,
when given the opportunity to choose which item to report, the
mean precision was better (15.4±0.751, paired samples t-test,
t(9) ¼ 5.3, Po0.001) and less variable (4.4±0.381, t(9) ¼ 3.4,
P ¼ 0.01) (Fig. 5b).
This outcome provides an existence proof of within-trial
variability and implies that it is cognitively present and useful. But
how much of the variation is contained within a trial, and how
much of it is across trials? To answer this question, we ran
simulations under conditions of purely within- and purely acrosstrial variability (Fig. 5c; see Methods). We did not find a
difference between the observed distribution and the simulated
distribution for purely within-trial variation, either in mean
(15.41 observed versus 15.5±0.751 simulated; paired samples
t-test, t(9) ¼ 0.05, P40.85) or s.d. (4.41 observed versus
5.01±0.411 simulated; t(9) ¼ 1.21, P40.25) (Fig. 5d). Nor did
we find a difference in the mean or s.d. of precision between the
trials in which a random item was probed and trials in another
control experiment, run with ten new participants, that did not
include a ‘choose the best’ condition (19.91 versus 22.1±1.181,
independent samples t-tests, t(18) ¼ 0.91, P ¼ 0.38; 6.21 versus
6.0±0.741, t(18) ¼ 0.74, P ¼ 0.48). This suggests that giving the
participant the opportunity to choose the best-remembered
item on some trials did not alter how items were encoded on
the other trials.
In showing that nearly all of the variability in memory quality
is accessible to the participant on a single trial, these results rule
out a sizeable contribution of across-trial variability in the present
study and suggest near-perfect metamemory for the relative
quality of working memory representations. These findings also
speak to an alternative account of the results of the first
a
–180
–120
–60
0
Error (deg)
d
60
120
180
Random probe
Observed choose the best
Simulated choose the best
Probability
variability in the precision of working memory. Both participants’
data were better fit by the variable- than fixed-precision model
(average difference in AICc values of 4.0 in favour of the variableprecision model).
The previous manipulation eliminated the possibility of
eye movements during stimulus presentation, but it does not
address the possibility that gaze was not centered on fixation.
We performed an additional follow-up experiment in which stimulus presentation was conditional on 1,000 ms of continuous,
uninterrupted fixation (Supplementary Methods). The data for
two out of the three participants were better fit by the variablethan fixed-precision model, with an average difference in AICc of
6.7 in favour of the variable-precision model.
0
20
40
60
80
100
Precision (deg)
Figure 5 | Choosing the item that was remembered best. (a) When
participants were asked to report an item selected at random, they made
errors. (b) When allowed to report the item they remembered best, they
performed better, seen as a narrower distribution of error. The ability to pick
out the best-remembered item is evidence that the items differed in how well
they were remembered. (c) Using the distribution in panel a, we simulated the
distribution of errors that we would expect to find in panel b if participants had
perfect metamemory, always picking out the best-remembered item. (d) The
observed and simulated distributions of memory quality match, suggesting
that participants have nearly perfect metamemory.
experiment, where we found that the distribution of response
error was more peaked than is predicted by standard fixedprecision models1,5,10–13. Is it possible that the distribution’s
peaked shape arose not because of variability in precision, but
because of a fixed-precision error distribution that is naturally
more peaked than a circular normal distribution? The results of
the current experiment make this alternative account unlikely
because nearly all of the peakedness in the distribution of report
error is accounted for by variability across items within a trial,
suggesting that the peakedness of the error distribution reflects
variability across items and leaving little reason to posit an
additional source.
Variability in memory quality is independent across items.
Many theories hold that the quality of a memory is determined by
the allocation of a finite commodity, where items receiving more
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
Second report, sorted by first
Third report, sorted by first
Second report, sorted by third
Third report, sorted by second
Above median error
Below median error
Frequency
Frequency
First report, sorted by second
−180 −120 −60 0
60
Error (deg)
First report, sorted by third
120 180
Figure 6 | Evidence that memory degradation is independent across items. Distributions of report error for the first, second and third responses sorted
by whether the response to another item in the trial was below (black) or above (gray, flipped) the median error. Memory quality does not differ between
the two, which is evidence of memory degradation that is independent across items.
of the commodity are better remembered 1,3,5,9,12. These theories
predict that some of the within-trial variation in precision can be
accounted for by uneven allocation: when more of the commodity
has been allocated to one item, less is available for the others,
thereby producing tradeoffs between items within a display. To
determine whether the variability we observed was due to uneven
commodity allocation, we performed an experiment in which
participants were asked to report the colour of every item in the
display. We then tested for tradeoffs by comparing the errors
made for one item when the response to another item was good
versus bad (that is, above versus below the median absolute error
for that response across trials) (Fig. 6). We found no difference in
precision between above-median and below-median splits for any
combination of items, regardless of whether the estimates of
precision were derived from the fixed-precision model (Fig. 7a)
(paired samples t-tests, all t(15)’so1.6 and all P40.1) or the
variable-precision model (Fig. 7b) (all t(15)’so1.2 and all
P40.25).
This evidence of independence is based on a null result.
However, using Monte Carlo simulations we found that our
tradeoff-detection method is powerful enough to reliably detect
tradeoffs if they were to exist (Supplementary Notes and
Methods). We considered two sources of tradeoffs: (1) noiseless
uneven allocation, where the participant knowingly attends or
gives preferential processing to a chosen item, and (2) noisy even
allocation, where the participant tries to distribute her resources
evenly, but fails, unwittingly giving preference to some items over
others. We found that any unevenness strong enough to produce
the observed variability in precision would have been readily
detected through our procedure for detecting tradeoffs
(Supplementary Notes and Methods). Finding independence of
precision for items within a display17,18 rules out that the
observed variability in precision reflects unevenness in the
allocation of discrete slots1 or a graded resource5.
Discussion
Visual working memory is not perfect. Its imperfections have
been used to fashion cognitive models of visual working memory
in which the quality of a memory is determined solely by the
amount of a mental commodity that it receives. In contrast to the
predictions of these models, the present results show that even
under conditions of even allocation, the quality of working
memory varies across trials and items. This finding is consistent
with a stochastic process that degrades memory for each
individual item and plays out independently across them. The
outcome of this process leaves each memory in its own state, with
its own precision, which when combined across items, produces
the characteristic peaked shape of report-error distributions.
Thus, visual working memory is stochastic both at the level of a
memory’s content and at the level of its quality.
A physiological mechanism that might account for variability
in the quality of working memory representations is cortical noise
that impedes the sustained activation of the neural populations
that code for the remembered information5,12,19–23. The precision
of memory representations may correspond to the amount of
signal drift over time and the extent to which noisy signals are
pooled within a neural population1,5,9,20,24–26. The recruitment of
larger neural populations will improve the precision of working
memory representations through decreased signal drift or
increased redundancy in coding. Critically, the probability that
neurons within a population will self-sustain after stimulus offset
is known to be affected by internal noise20–22. Under these
circumstances, the precision of memory representations would be
determined by the outcome of stochastic physiological processes
operating independently across representations.
Existing models explain the source of limitations in working
memory solely by reference to a finite capacity for storing
information. Here, we have shown that a complete account of
visual working memory must also consider the stability of stored
information. To put forward one model capable of producing
variability of the sort described here, consider a finite mental
commodity that must be divided among the items to be
remembered. As in existing models, this information limit can
be formalized as a set of N independent samples that are allocated
to the items being remembered1,9–12, with slot models setting N
to B3, and with continuous resource models considering the
limit as N tends to infinity. Suppose that the samples are unstable
and that each has some probability of surviving until the time of
the test. This leads to variability in memory quality of a specific
form: a binomial distribution of samples per item. This random
process, known as the pure death process in studies of population
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
5
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
a
Fixed precision model
Probability
First report, sorted by second
Third report, sorted by first
Second report, sorted by first
Above median error
Below median error
s.d. 21.3 ± 3.5°
s.d. 19.7 ± 3.2°
−180 −120 −60
0
60
Error (deg)
120
s.d. 23.2 ± 2.8°
s.d. 26.9 ± 3.5°
s.d. 22.5 ± 3.2°
s.d. 23.6 ± 3.3°
180
First report, sorted by third
Third report, sorted by second
Second report, sorted by third
s.d. 20.2 ± 4.2°
s.d. 22.0 ± 2.9°
b
s.d. 23.1 ± 3.4°
s.d. 23.4 ± 2.9°
s.d. 23.7 ± 3.7°
s.d. 24.0 ± 4.3°
Variable precision model
Probability
First report, sorted by second
Third report, sorted by first
Second report, sorted by first
Above median error
Below median error
s.d. 22.9 ± 5.0°
s.d. 23.5 ± 6.1°
s.d. 21.4 ± 4.8°
s.d. 19.5 ± 6.3°
−180 −120 −60
0
60
Error (deg)
120
s.d. 23.0 ± 6.0°
s.d. 26.2 ± 6.9°
180
First report, sorted by third
Second report, sorted by third
s.d. 20.6 ± 6.7°
s.d. 23.0 ± 5.7°
Third report, sorted by second
s.d. 22.5 ± 2.5°
s.d. 23.5 ± 6.5°
s.d. 24.3 ± 5.0°
s.d. 26.0 ± 5.2°
Figure 7 | Model fits for distributions of report error. (a) We separately fit a fixed-precision model to the data from each participant, response (first,
second or third), sorting and split (above versus below the median error). The best-fit s.d. did not differ between the splits, which means that memory
quality was comparable. (b) We did the same using the variable-precision model and found the same result: evidence of memory degradation that is
independent across items.
genetics, is one of many possible random processes that might
degrade memory.
Our model includes a guess state, a catch-all for trials in which
the participant guesses randomly, either because of lapses in
memory or because of other hiccups, such as blinks and slips of
the hand. A recent paper15 has suggested that allowing for
variability in the quality of working memory obviates the need to
include a guess state. Though we agree that there is risk of
overestimating the rate of guessing if variability is ignored, we
found that even taking variability into account, a guess state still
accounts for a sizeable proportion of trials (Supplementary Notes
and Methods). Moreover, the goal of the present work is not to
argue that participants never guess, but to show that variability in
the quality of working memory reflects a random process of
degradation that plays out independently across memories.
Models of working memory that are constrained by the known
properties of the brain uniformly propose that the maintenance
of working memory representations involves stochastic processes
that operate at all levels of processing21,27,28. Yet most cognitive
models explain working memory solely by reference to the
division of a mental commodity: a resource divided among stored
items1,3–5,9,10,12. In these models, the quality of a memory
representation is deterministic and based solely on the amount of
the commodity allocated to it. Demonstrating independent,
stochastic variation in memory quality within an individual
requires a new framework that includes a role for stochastic
processes, helping to bridge the gap between biologically plausible
neural models on the one hand and cognitive models on the
other.
6
Methods
Participants. Participants were between the ages of 18 and 28 years, had normal or
corrected-to-normal vision and received either $10 per hour or course credit.
Participants whose testing ran over multiple days were given a bonus of $10
per session after the first. The studies were performed in accordance with Harvard
University regulations and approved by the Committee on the Use of Human
Subjects in Research under the Institutional Review Board for the Faculty of
Arts and Sciences.
Stimuli. The stimulus was 1, 3 or 5 colourful dots arranged in a ring around a
central fixation mark. The radius of each dot was 0.41 in visual angle and the radius
of the ring was 3.81. Each dot was randomly assigned one of 180 equally spaced
equiluminant colours drawn from a circle (radius 591) in the CIE L*a*b* colour
space, centered at L ¼ 54, a ¼ 18 and b ¼ ! 8. The grouping strength of the colours
was controlled by transforming the colour values to vectors in polar space, selecting
displays with the constraint that the magnitude of the mean vector of each display
was equal to that expected of a randomly sampled display.
Presentation. Stimuli were rendered by a computer running MATLAB with the
Psychophysics Toolbox29,30. The display’s resolution was 1,920 " 1,200 at 60 Hz,
with a pixel density of 38 pixels per cm. The viewing distance was 60 cm.
The background was gray, with a luminance of 45.4 cd m ! 2.
Procedure. On each trial, the stimulus was presented for 600 ms and then
removed. After a 900-ms delay, a filled white circle appeared at the location of a
randomly selected item and hollow circles appeared at the location of the other
items. Participants were asked to report the probed item’s colour by selecting it
from a response screen with the full colour wheel, 6.51 in radius, centered around
fixation. A black indicator line was placed at the outer edge of the response wheel at
the position closest to the cursor. Once the participants moved the mouse, the filled
circle’s colour was continuously updated to match the currently selected colour.
Participants registered their choice by clicking a mouse. Responses were not
speeded. Feedback was provided after each response by displaying onscreen
the error in degrees.
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
Experiment 1 methods. Three participants performed three sessions, each lasting
1.5–2 h. Each session used 1, 3 or 5 items, and their order (and therefore, set size)
was counterbalanced across participants. There were 800 trials per session for set
sizes 1 and 3 and 720 trials per session for set size 5.
Experiment 2 methods. Ten participants performed two 700-trial sessions with a
set size of 3. On half of the trials, participants reported the item they remembered
best. On the other half, participants reported the colour of a randomly selected
item. Probe types were intermixed in random order. On ‘choose the best’ trials,
instructions appeared at the top of the screen telling participants to report a colour
and then to select the location of the best-remembered item by clicking its location.
On these trials, filled white circles appeared at the locations of the probed and
non-probed items.
Experiment 3 methods. A total of 16 participants performed one 500-trial session
with a set size of 3. Instead of reporting only one item, participants were asked
to report all three. Items were probed in random order. Feedback was given.
Non-probed items, even those already reported, were indicated by hollow circles.
General framework for modeling the error distribution. All of the models
considered here can be seen as special cases of an infinite scale mixture, a general
framework that describes error distributions with a fixed mean and a precision
(scale) that is sampled from some higher-order distribution, known as the ‘mixing
distribution’.
Fixed-precision model. In the fixed-precision model, the participant is in one of
two states for each item on the display. With probability 1–g, she remembers the
item and has some fixed amount of information about it. Limited memory leads to
errors in recall, which are assumed to be distributed according to a von Mises
distribution (the circular analogue of a normal distribution) centered at zero
(or, perhaps, with some bias m) and with spread determined by a concentration
parameter k that reflects the memory’s precision. The mixing distribution is thus
the Dirac delta function at k, a distribution concentrated at that one point. With
probability g, she remembers nothing about the item and guesses randomly, producing errors distributed according to a circular uniform. Together, the probability
density function of this model is given by
1
þ ð1 ! gÞfðx ; m; kÞ;
360
where f is the von Mises density, defined by
f ðx ; g; m; kÞ ¼ ðgÞ
fðx ; m; kÞ ¼
ek cosðx ! mÞ
;
360I0 ðkÞ
where I0 is the modified Bessel function of order zero.
Variable-precision model. The variable-precision model is the same as the fixedprecision model, except that precision is distributed according to some higherorder distribution, for example, a truncated normal or gamma distribution. Unless
guided by a theory for the source of variability in precision, the choice of mixing
distribution is arbitrary, but certain combinations of error distribution and mixing
distribution are convenient. For example, when error is normally distributed about
the correct value, and when precision (that is, the inverse of variance) is gamma
distributed, the resulting experimental data takes the form of a generalized Student’s t-distribution. (This result generalizes to the case of distributions wrapped
on the circle, which is useful for representing colours and orientations.) For guess
rate g, bias m and s.d. s, the probability density function of this model is given by
1
þ ð1 ! gÞcðx ; m; s; vÞ;
360
where c is the wrapped generalized Student’s t-distribution31 and is given by
" ! ðv þ 1Þ/2
1 !
c X
ðx þ 360l ! mÞ2
1þ
;
cðx ; m; s; vÞ ¼
2
s v
s l¼ ! 1
f ðx ; m; s; g; vÞ ¼ ðgÞ
with
c¼G
!
"
v þ 1 #$ $v %pffiffiffiffiffi%
G
pv :
2
2
Incorporating higher-order variability into new models of working memory is
therefore a simple matter of replacing the usual choice of error distribution, a von
Mises distribution, with a wrapped generalized Student’s t-distribution.
Some combinations of error and mixing distributions give rise to named
distributions with known analytic expressions32,33, but others, like the truncated
normal mixing distribution, must be simulated. The mean precision and its s.d. were
free parameters, bounded between 01 and 1001. (The upper bound is arbitrary, but an
s.d. above 1001 is typically indistinguishable from guesses anyways).
This should not be taken as a strong claim about the form of the distribution of
memory precision, which will depend crucially on the process that degrades memories.
However, no matter which distribution is chosen, the presence of variability produces a
signature effect on the data—peakedness in the error distribution—seen most clearly in
Fig. 2, and which is readily detected using
the present methods.
Model fitting. Markov Chain Monte Carlo was used to find the maximum a
posteriori parameter set of a model given the errors on the memory task. Each
model was fit separately for each subject and experimental condition. The data
were not binned. We used the Metropolis–Hastings algorithm, which takes a
random walk over the parameter space, sampling locations in proportion to how
well they describe the data and match prior beliefs. A non-informative Jeffreys
prior was placed over each parameter. The proposal distribution used to recommend jumps was a multivariate Gaussian centered at the current parameter set,
whose s.d. was tuned during a burn-in period that ended when all of the chains,
which started at different locations in the parameter space, converged. We collected
15,000 samples from these converged chains and report the sample with the
maximum posterior probability.
Model comparison. To compare the relative goodness of fit of the fixed- and
variable-precision models, we computed each model’s AIC, a measure that includes
a penalty term for the number of free parameters34. For a model with k parameters
and maximum log-likelihood L, the AIC is given by
AIC ¼ ! 2L þ 2k:
ð1Þ
Because this formulation of the AIC assumes infinite data, it is necessary to
correct the value when estimating it from experimental data. The corrected
AIC, AICc, is given by
AICc ¼ AIC þ
2kðk þ 1Þ
;
n!k!1
ð2Þ
where n is the sample size—that is, the number of trials performed by a participant
on the working memory task.
Simulating across- and within-trial variation in Experiment 2. We performed
Monte Carlo simulations to estimate the distribution of precision when ‘choosing
the best’ under conditions of purely across-trial variation or purely within-trial
variation. For each subject, 1,000 trials (of three-item displays) were simulated
according to that participant’s maximum a posteriori parameter values for the
random probe condition. When assuming across-trial variation, the precision value
for each trial was selected at random from one of the three randomly drawn
precision values. Error was simulated by sampling from a von Mises distribution
with the drawn precision value. For each participant, the resulting error distributions were fit with the variable-precision model, yielding parameter estimates that
matched the random probe data (mean 20.21 and s.d. 6.41; comparable to Fig. 5,
red line). When assuming within-trial variation, we selected the best precision
value from the three that were randomly sampled. These values were used to
simulate error responses that were then fit with the variable-precision model
(Fig. 5, blue line, mean 15.41 and s.d. 5.01).
References
1. Zhang, W. & Luck, S. Discrete fixed-resolution representations in visual
working memory. Nature 453, 233–235 (2008).
2. Vogel, E. K., Woodman, G. F. & Luck, S. J. Storage of features, conjunctions
and objects in visual working memory. J. Exp. Psychol. Hum. Percept. Perform.
27, 92–114 (2001).
3. Alvarez, G. A. & Cavanagh, P. The capacity of visual short-term memory is set
both by visual information load and by number of objects. Psychol. Sci. 15,
106–111 (2004).
4. Luck, S. J. & Vogel, E. K. The capacity of visual working memory for features
and conjunctions. Nature 390, 279–281 (1997).
5. Bays, P. M. & Husain, M. Dynamic shifts of limited working memory resources
in human vision. Science 321, 851–854 (2008).
6. Rensink, R. A. The dynamic representation of scenes. Vis. Cogn. 7, 17–42
(2000).
7. Baddeley, A. D. & Logie, R. In Models of Working Memory: Mechanisms of
Active Maintenance and Executive Control. (eds Miyake, A. & Shah, P.) 28–61
(Cambridge University Press, 1999).
8. Baddeley, A. D. Working Memory. (Oxford University Press, 1986).
9. Wilken, P. A detection theory account of change detection. J. Vis. 4, 1120–1135
(2004).
10. Anderson, D. E., Vogel, E. K. & Awh, E. Precision in visual working memory
reaches a stable plateau when individual item limits are exceeded. J. Neurosci.
31, 1128–1138 (2011).
11. Zhang, W. & Luck, S. J. The number and quality of representations in working
memory. Psychol. Sci. 22, 1434–41 (2011).
12. Bays, P. M., Catalao, R. F. G. & Husain, M. The precision of visual working
memory is set by allocation of a shared resource. J. Vis. 9, 1–11 (2009).
13. Fougnie, D., Asplund, C. L. & Marois, R. What are the units of storage in visual
working memory? J. Vis. 10, 1–11 (2010).
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
7
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms2237
14. Burnham, K. P. & Anderson, D. R. Multimodel inference: understanding AIC
and BIC in model selection. Sociol. Methods Res. 33, 261–304 (2004).
15. Van den Berg, R., Shin, H., Chou, W.C., George, R & Ma, W.J. Variability in
encoding precision accounts for visual short-term memory limitations.
Proc. Natl Acad. Sci. 109, 8780–8785 (2012).
16. Kahneman, D. Attention and Effort. (Prentice-Hall, 1973).
17. Fougnie, D. & Alvarez, G. A. Object features fail independently in visual
working memory: Evidence for a probabilistic feature-store model. J. Vis. 11,
1–12 (2011).
18. Huang, L. Visual working memory is better characterized as a distributed
resource rather than discrete slots. J. Vis. 10, 1–8 (2010).
19. Deco, G., Edmund, T. R. & Romo, R. Stochastic dynamics as a principle of
brain function. Prog. Neurobiol. 88, 1–16 (2009).
20. Camperi, M. & Wang, X. J. A model of visuospatial short-term memory in
prefrontal cortex: recurrent network and cellular bistability. J. Comput.
Neurosci. 5, 383–405 (1998).
21. Wang, X. J. Synaptic reverberation underlying mnemonic persistent activity.
Trends Neurosci. 24, 455–463 (2001).
22. Constantinidis, C. & Wang, X. J. A neural circuit basis for spatial working
memory. Neuroscientist 10, 553–565 (2004).
23. Fuster, J. & Alexander, G. Neuron activity related to short-term memory.
Science 173, 652–654 (1971).
24. Luce, D. R. Thurstone’s discriminal processes fifty years later. Psychometrika
42, 461–489 (1977).
25. Bonnel, A. M. & Miller, J. Attentional effects on concurrent psychophysical
discriminations: Investigations of a sample-size model. Attention, Perception, &
Psychophysics 55, 162–179 (1994).
26. Thurstone, L. L. A law of comparative judgment. Psychol. Rev. 34, 273–286
(1927).
27. Durstewitz, D., Seamans, J. K. & Sejnowski, T. J. Neurocomputational models of
working memory. Nature Neurosci. 3, 1184–1191 (2000).
28. Durstewitz, D., Kelc, M. & Güntürkün, O. A neurocomputational theory of
the dopaminergic modulation of working memory functions. J. Neurosci. 19,
2807–2822 (1999).
8
29. Brainard, D. H. The Psychophysics Toolbox. Spat. Vis. 10, 433–436 (1997).
30. Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming
numbers into movies. Spat. Vis. 10, 437–442 (1997).
31. Pewsey, A., Lewis, T. & Jones, M. C. The wrapped t family of circular
distributions. Aust. NZ J. Stat. 49, 79–91 (2007).
32. Andrews, D. F. & Mallows, C. L Scale mixtures of normal distributions.
J. Royal Stat. Soc. Series B (Methodological) 36, 99–102 (1974).
33. Gneiting, T. Normal scale mixtures and dual propability densities.
J. Stat. Comput. Simulat. 59, 375–384 (1997).
34. Akaike, H. A new look at the statistical model identification. IEEE Trans.
Automatic Control 19, 716–723 (1974).
Acknowledgements
We thank Sarah Cormeia and Christian Forhby for help with data collection.
We are grateful to Geoff Woodman, Chris Asplund, Michael Cohen, Olivia Cheung,
Justin Jungé and Tim Brady for helpful discussions. This work was supported by NIH
Grant 1F32EY020706 to D.F. and NIH Grant R03-086743 to G.A.A.
Author contributions
D.F., J.W.S and G.A.A designed the experiments. D.F. performed the experiments.
D.F. and J.W.S. analysed the results. D.F., J.W.S and G.A.A wrote the paper.
Additional information
Supplementary Information accompanies this paper at http://www.nature.com/
naturecommunications
Competing financial interests: The authors declare no competing financial interests.
Reprints and permission information is available online at http://npg.nature.com/
reprintsandpermissions/
How to cite this article: Fougnie D. et al. Variability in the quality of visual working
memory. Nat. Commun. 3:1229 doi: 10.1038/ncomms2237 (2012).
NATURE COMMUNICATIONS | 3:1229 | DOI: 10.1038/ncomms2237 | www.nature.com/naturecommunications
& 2012 Macmillan Publishers Limited. All rights reserved.
Supplementary Figure S1. Memorability differences between colors does not drive
variation in precision. (a) Best-fit precision distributions for six color bins, each of
30°. Each distribution is labeled with the color at the bin’s center, and the black
curve is all of the data together. The legend shows the range of colors included in
each bin. The curves were fit simultaneously to all subjects’ data. (b) How much
variability is to be expected by chance? We simulated six data sets by sampling
precision values from the black curve. Even though each simulation is based on the
same distribution, the estimated model parameters display a range of values. This
range is comparable to that seen in the observed binned data in panel (a). (c) The
variability in the full data set (solid black line) is the same as that for the average bin
(dashed black line).
Supplementary Figure
Figure S2. Simulating uneven allocation policies. (a) A ternary plot
that shows the noiseless uneven allocation policies used for simulations. At each
point, the allocation to x1, x2, and x3 sums to 1. The corners correspond to the three
policies for which one item is given all of the commodity. The center corresponds to
a policy of even allocation across the items. Each filled circle corresponds to a fixed,
uneven allocation policy that could produce at least as much variability in precision
as we observed in our data. (b & c).
c) Noisy allocation distributions are formalized as
a Dirichlet distribution over the space of allocation policies, centered on even
allocation. The distribution in panel (b) has lower noise levels (concentration
parameters of 9, 9, and 9) than the one in panel (c) (concentration parameters of 4,
4, and 4).
Supplementary Figure S3.
S3. Variability from noisy even allocation produces detectable
tradeoffs.
tradeoffs. Noisy even allocation was formalized as a symmetric Dirichlet distribution
on the simplex, centered at even allocation, with concentration parameters (α,α,α)
that determine the noise level. Monte Carlo simulations were performed for 15
levels of noise, evenly-spaced in log units from low noise where α = (10,10,10) to
high noise where α = (1,1,1). Simulated data was fit with the variable-precision
model in order to estimate the standard deviation of precision (blue line). We also
performed a trade-off analysis to determine the percentage of simulations at each
noise level (red line) in which at least one of the six possible sortings showed a
difference in estimated precision. The dashed line is drawn at the variability
estimated in the data. Critically, at the level of allocation noise necessary to produce
this amount of variability our tradeoff detection procedure is nearly perfect.
Supplementary Table S1. AICc values for Experiment 1.
Subject
Fixed (n=1)
Variable (1)
Fixed (3)
Variable (3)
Fixed (5)
Variable (5)
1
6371
6367
7876
7870
7550
7545
2
6463
6457
7785
7780
7326
7312
3
6171
6160
7207
7184
7390
7373
Lower AICc values are better fits.
Supplementary Table S2. AICc values for perceptual control study.
study.
No noise
Noise
Subject
Fixed
Variable
Fixed
Variable
1
4528
4528
7141
7140
2
4734
4744
7116
7116
3
4423
4435
7866
7882
Lower AICc values are better fits.
Supplementary Notes
The power of tradeoff detection
In Experiment 3, we asked participants to report multiple items per display and
found independence of precision across items. From this we concluded that
variability in precision is not due to tradeoffs in the allocation of a commodity to
remembered items. In the Supplementary Methods, we describe our method for
showing that the tradeoff-detection method is powerful enough to reliably detect
tradeoffs if they were to exist. We considered two sources of tradeoffs: (1) noiseless
uneven allocation, where the participant knowingly attends or gives preferential
processing to a chosen item, and (2) noisy even allocation, where the participant
tries to distribute her commodity evenly, but fails, unwittingly giving preference to
some of the items.
Each allocation (i.e., allotment of resources) can be described as a vector of
proportions of the commodity dedicated to each item (e.g., [1/3, 1/3, 1/3] for even
allocation across three items, or [0.1, 0.1, 0.1, 0.2, 0.5] for uneven allocation across
five). The set of all such possible allocations defines a simplex (Supplementary
Figure 2A).
Guessing
The present data was not as well described by models that did not include a
uniform error component. Indeed, the AICc values indicated that models with a
uniform component provided a better fit to the data (all AICc values were decisively,
>100, better). Furthermore, estimates of this guess state do not differ to a large
extent between the variable- and fixed-precision models. Estimates of the rate of
guessing in Experiment 1 were 1.1%, 14.1%, and 21.6% for the variable-precision
model for set sizes 1, 3, and 5 respectively). This was comparable to the guess rates
for the fixed-precision model (1.4%, 15.9%, and 22.0%). In Experiment 2 there was
a large drop in guess rate between the ‘random probe’ and ‘choose the best’
condition (variable-precision: 18.7% to 7.5%; fixed-precision: 20% to 8.2%). The
data do suggest that fixed-precision models may slightly over-estimate guess rates,
presumably because these models misclassify some highly imprecise responses as
guess responses.
Supplementary Methods
Variability
Variability in perception
We eliminated the memory demands of the task by keeping the stimulus present
while the participant responded. The stimulus was a single colorful circle, 0.75° in
radius, placed in the center of the display. The participant was asked to select the
color of the dot from a color wheel containing all of the possible colors. Three
participants completed 750 trials of this perceptual task.
Naturally, participants were better at the perceptual task than at the memory
task, and so in a variant of the perceptual task, we added noise to the stimulus so
that the error matched that of the working memory task. This was accomplished by
setting eighty percent of the stimulus’ pixels to either black or white. The pixels
were selected randomly for each participant but held constant across trials to
minimize variability caused by differences in the noise. Three participants
completed 750 trials of this perceptual task with noise.
Variability in memory for specific colors
To measure any contribution of differences in memorability to the variability in
precision observed in the first experiment, we replicated the study and compared
the variability measured in the full data set to that measured in subsets of trials
binned by color. Variability found within one of these subsets cannot be attributed
to differences in the memorability of colors because all of the trials within a bin
were similar in color. Two participants performed 3,120 trials of a color working
memory task with three items.
Variability in memory is not due to eye movements
We performed an experiment on three participants using an EyeLink eye
tracker, in which the stimulus was displayed conditional on 1000 ms of continuous,
uninterrupted fixation. The fixation region was defined by a square (1.35° from
center to edge) centered on the fixation mark. The stimulus display consisted of the
presentation of three colorful circles for 100 ms (see Methods for information on
the visual angle and screen position of the stimuli).
Noiseless uneven allocation
To detect whether our analysis is sensitive to trade-offs due to noiseless uneven
allocation (perhaps as a byproduct of a quantized storage capacity that can not be
evenly divided among items, e.g., 4 slots among 3 objects), we performed Monte
Carlo simulations for each point on the simplex capable of producing sufficient
variability to explain the present data. The possible allocation policies of the
commodity to n items are defined by the set of points (x1, x2, …, x n), where x i is the
proportion of the commodity dedicated to the ith item and where the x i’s sum to 1.
In the case of n = 3 items, these policies can be visualized in a ternary plot whose
vertices correspond to the three policies where one item is given all of the
commodity, and whose center corresponds to even allocation across the items
(Supplementary Fig. 2a). To convert from the proportion of allocated commodity to
the precision of report error, we used a sample-size model in which the variance of
expected report error is inversely proportional to an item’s allocation 1,38,39. The
proportionality constant was determined by setting the precision for an item under
even allocation to 21.5°, the median precision across participants in Experiment 3.
The sample size model makes it possible to estimate the expected precision under
different allocation policies. Note that since we have defined precision as the square
root of the report error variance, changing the allocation to one item will lead to a
change in precision equivalent to dividing 21.5° by the square root of the change in
allocation. For example, doubling the allocation to an item (66.7%) will lead to an
expected precision of 21.5°/√2 =15.2°.
Rather than make assumptions about the particular allocation policy chosen by
the participant, instead we simulated data for each of the possible allocation
strategies, in increments of 2%, that are capable of producing the variability
observed in our experiments. (For example, there’s no need to consider the case of
even allocation because it is incapable of producing any variability at all.) For each
of the 963 allocation policies that met these requirements, we simulated response
error data for 16 participants, where each participant makes three responses on
each of 450 trials. We then asked whether sorting by the first response led to an
observable difference on the precision of the second response, and vice versa. If
either result was statistically significant at α = 0.05, we considered it as evidence of
non-independence between the two responses. We repeated this procedure 1,000
times. For the worst-case allocation policy (i.e., the one that was hardest to detect),
we found evidence of non-independence 96.6% of the time. Averaged over all tested
allocation policies, we found evidence of non-independence 99.9% of the time. Thus,
comparing responses within a trial is a powerful test for detecting tradeoffs in
allocation between items.
Noisy even allocation
We performed Monte Carlo simulations to determine whether our analysis
method is sensitive to trade-offs induced by noisy even allocation. We formalize
noisy even allocation through a symmetric Dirichlet distribution on the simplex,
centered at perfect evenness, and whose concentration parameter corresponds to
the amount of noise. We simulated 15 levels of noise ranging from little noise
(concentration parameter = 10, 10, 10) to very high levels of noise (concentration
parameter = 1, 1, 1) (Supplementary Figure 2b). (The simulated noise levels were
evenly spaced intervals in log space between these extremes). For each noise level,
50 separate simulations were performed. For each simulation, 500 trials (consisting
of 3 items) were independently generated for 16 participants. Precision values for
each item were generated by applying a sample size model [1,38,39] (as in the
noiseless uneven allocation simulations) to allocation vectors drawn from a
Dirichlet distribution. Measurement error for each simulation was generated by
drawing values from a von Mises distribution according to the stimulated precision
of each item. We then asked whether sorting responses by whether another
response was an accurate or inaccurate judgment led to an observable difference in
precision. We determined the percentage of simulations at each noise level that
showed evidence of a trade-off for at least one of the six possible sortings. At low
noise there is no measurable variability and consequently trade-offs are rarely
detected. With increased noise, variability increases and trade-offs are easier to
detect. Importantly, the likelihood of detecting tradeoffs rises above 95% well
before the mean estimate of variability reaches 8.5° (the estimate of the variability
in precision in the observed data) (Supplementary Figure 3).
These simulations demonstrate that our method of sorting error distributions by
performance on another item in the same trial is a highly sensitive method test for
detecting trade-offs between items regardless of whether these trade-offs emerge
from a noisy allocation policy or a fixed, but uneven allocation process.
Supplementary References
38.
39.
Bonnel, A.M., and Miller, J. (1994). Attentional effects on concurrent psychophysical
discriminations: Investigations of a sample-size model. Attention, Perception, &
Psychophysics 55, 162-179.
Palmer, J. (1990). Attentional limits on the perception and memory of visual
information. Journal of Experimental Psychology: Human Perception and
Performance 16, 332-350.