Expected behavioral loss function by task

1
SUPPLEMENTARY INFORMATION
2
3
A rational theory of the limitations of working memory and attention
4
Ronald van den Berg & Wei Ji Ma
5
6
7
8
Contents
9
MODEL DETAILS ......................................................................................................................... 1
10
Relation between J and κ ............................................................................................................ 1
11
Variable precision ....................................................................................................................... 1
12
Expected behavioral loss function by task .................................................................................. 2
13
The behavioral loss function drops out when the behavioral error is binary .............................. 4
14
Conditions under which optimal precision declines with set size .............................................. 4
15
REFERENCES ............................................................................................................................... 6
16
SUPPLEMENTARY FIGURES..................................................................................................... 7
17
18
MODEL DETAILS
19
Relation between J and κ
20
We measure encoding precision as Fisher Information, denoted J. As derived in earlier work1,
21
the mapping between J and the concentration parameter κ of a Von Mises encoding noise
22
distribution is J    
23
1. Larger values of J map to larger values of κ, corresponding to narrower noise distributions.
I 1  
, where I1 is the modified Bessel function of the first kind of order
I 0  
24
25
Variable precision
26
In all our models, we incorporated variability in precision
27
encoded item independently from a Gamma distribution with mean J and scale parameter τ. We
1
2,3
by drawing the precision for each
28
denote the distribution of a single precision value by p  J | J ,  and the joint distribution of the
29
precision values of all N items in a display by p  J | J ,    p  J i | J ,  .
N
i 1
30
31
32
Expected behavioral loss function by task
As a consequence of variability in precision, computation of expected behavioral loss requires
33
integration over both the behavioral error, ε, and the vector with precision values, J,
34
 
  Lbehavioral    p   | J , N  p  J | J ,  dJ if  is discrete
 0
Lbehavioral  J , N     
 L
 p  | J, N  p  J | J ,  dJd  if  is continuous
   behavioral   
0 0
35
The distribution of precision, p  J | J ,  , is the same in all models, but Lbehavioral(ε) and p(ε|J,N)
36
are task-specific. We next specify these two components separately for each task.
37
Delayed estimation. In delayed estimation, the behavioral error only depends on the
38
memory representation of the target item. We assume that this representation is corrupted by
39
Von Mises noise,
40
p   | J, N   p   | J T  
1
F J cos 
e  T  ,
2 I 0  F  J T  
41
where JT is the precision of the target item and F(.) maps Fisher Information to a concentration
42
parameter κ; we implement this mapping by numerically inverting the mapping specified above.
43
Furthermore, the behavioral loss function is assumed to be a power-law function of the absolute
44
estimation error, Lbehavioral=|ε|β, where β>0 is a free parameter.
45
46
47
Change detection. We assume that subjects report “change present” whenever the
posterior ratio for a change exceeds 1,
p  change present | x, y 
p  change absent | x, y 
 1,
48
where x and y denote the vectors of noisy measurements of the stimuli in the first and second
49
displays, respectively. Under the Von Mises assumption, this rule evaluates to 4
2
pchange
50
1  pchange
1 N

N i 1 I
0

I 0  x,i  I 0  y,i 
 x,2 i   x,2 i  2 x,i y,i cos  yi  xi 

 1,
51
where pchange is a free parameter representing the subject’s prior belief that a change will occur,
52
and κx,i and κy,i denote the concentration parameters of the Von Mises distributions associated
53
with the observations of the stimuli at the ith location in the first and second displays,
54
respectively.
55
56
The behavioral error, ε, takes only two values in this task: correct and incorrect. We
assume that observers map each of these values to a loss value,
57
58
if  is "incorrect"
 Lincorrect
Lbehavioral    
 Lcorrect
if  is "correct".
59
60
For example, an observer might assign a loss of 0 to any correct decision and a loss of 1 to any
61
incorrect decision. The expected behavioral loss is a weighted sum of Lincorrect and Lcorrect,
62
63


Lbehavioral  J , N   pcorrect  J , N  Lcorrect  1  pcorrect  J , N  Lincorrect ,
64
65
where pcorrect  J , N  is the probability of a correct decision. This probability is not analytic, but
66
can be easily be approximated using Monte Carlo simulations.
67
Change localization. Expected behavioral loss is computed in the same way as in the
68
change-detection task, except that a different decision rule must be used to compute
69
pcorrect  J , N  . As shown in earlier work 3, the Bayes-optimal rule for the change-localization
70
task is to report the location that maximizes
I0
71

I 0  x,i  I 0  y,i 
 x,2 i   x,2 i  2 x,i y,i cos  yi  xi 

, where all
terms are defined in the same way as in the model for the change-detection task.
72
Visual search. The expected behavioral loss in the model for visual search is also
73
computed in the same way as in the model for change detection, again with the only difference
74
being the decision rule used to compute pcorrect  J , N  . The Bayes-optimal rule for this task is to
3
ppresent
I 0  D  e
 i cos xi  sT 
75
report “target present” when
76
subject’s prior belief that the target will be present, κD the concentration parameter of the
77
distribution from which the distractors are drawn, κi the concentration parameter of the noise
78
distribution associated to the stimulus at location i, xi the noisy observation of the stimulus at
79
location i, and sT the value of the target (see 5 for a derivation).
1  ppresent I
0

 i2   D2  2 i D cos  xi  sT 

, where ppresent is the
80
81
The behavioral loss function drops out when the behavioral error is binary
82
When the behavioral error ε takes only two values, the behavioral loss can also take only two
83
values. The integral in the expected behavioral loss (Eq (2) in the main text) then simplifies to a
84
sum of two terms,
85
86


Lbehavioral  J , N   pcorrect  J , N  Lcorrect  1  pcorrect  J , N  Lincorrect
 pcorrect  J , N   Lcorrect  Lincorrect   Lincorrect .
87
88
The optimal (loss-minimizing) value of J is then
89
J optimal  N   argmin  pcorrect  J , N   Lcorrect  Lincorrect   Lincorrect   Lneural  J , N  
90
J
 argmin  pcorrect  J , N   L   Lneural  J , N   ,
J
91
92
where ΔL ≡ Lcorrect – Lincorrect. Since ΔL and  have interchangeable effects on J optimal , we fix ΔL to
93
1 and fit only  as a free parameter.
94
95
Conditions under which optimal precision declines with set size
96
In this section, we show that when the expected behavioral loss is independent of set size (as in
97
delayed estimation, but also single-probe change detection), the rational model predicts optimal
98
precision to decline with set size whenever the following four conditions are satisfied:
99
100
1) Expected behavioral loss is a strictly decreasing function of encoding precision, i.e., an
increase in precision results in an increase in performance.
4
101
2) Expected behavioral loss is subject to a law of diminishing returns 6: the behavioral
102
benefit obtained from a unit increase in precision decreases with precision. This law will
103
hold when condition 1 holds and the loss function is bounded from below, which is
104
generally the case as errors cannot be negative.
105
3) Expected neural loss is an increasing function of encoding precision.
106
4) Expected neural loss is subject to a law of increasing loss: the amount of loss associated
107
with a unit increase in precision increases with precision. This condition translates to
108
stating that the loss per spike must either be constant or increase with spike rate, which
109
has been found to be generally the case 7.
110
These conditions translate to the following constraints on the first and second derivatives of the
111
expected loss functions,
1. L 'behavioral  J   0
2. L "behavioral  J   0
112
3. L 'neural  J   0
4. L "neural  J   0.
113
114
The loss-minimizing value of precision is found by setting the derivative of the expected total
115
loss function to 0,
116
0  L 'total  J   L 'behavioral  J    NL 'neural  J  ,
117
118
119
120
which is equivalent to

L 'behavioral  J 
L 'neural  J 
  N.
(S1)
121
122
The left-hand side is strictly positive for any J , because of constraints 1 and 3 above. In
123
addition, it is a strictly decreasing function of J , because
124
5

125
d L 'behavioral  J  L ''behavioral  J  L 'neural  J   L 'behavioral  J  L ''neural  J 

2
dJ L 'neural  J 
L'
J 

neural

126
is necessarily greater than 0 due to the four constraints specified above. As illustrated in
127
Supplementary Figure S1, Eq. (S1) can be interpreted as the intersection point between the
128
function specified by the left-hand side (solid curve) and a flat line at a value  N (dashed lines).
129
The value of J at which this intersection occurs (i.e., J optimal ) necessarily decreases with N.
130
When expected behavioral loss does depend on set size (such as in whole-array change
131
detection or change localization), the proof above does not apply and we were not able to extend
132
the proof to this domain. 8 9 10
133
134
REFERENCES
135
1.
136
137
perception under variability in encoding precision. PLoS One 7, (2012).
2.
138
139
Keshvari, S., van den Berg, R. & Ma, W. J. Probabilistic computation in human
Fougnie, D., Suchow, J. W. & Alvarez, G. A. Variability in the quality of visual working
memory. Nat. Commun. 3, 1229 (2012).
3.
van den Berg, R., Shin, H., Chou, W.-C., George, R. & Ma, W. J. Variability in encoding
140
precision accounts for visual short-term memory limitations. Proceedings of the National
141
Academy of Sciences 109, 8780–8785 (2012).
142
4.
143
144
Keshvari, S., van den Berg, R. & Ma, W. J. No Evidence for an Item Limit in Change
Detection. PLoS Comput. Biol. 9, (2013).
5.
145
Mazyar, H., Van den Berg, R., Seilheimer, R. L. & Ma, W. J. Independence is elusive :
Set size effects on encoding precision in visual search. J. Vis. 13, 1–14 (2013).
146
6.
Mankiw, N. G. Principles of economics. Book 328, (2004).
147
7.
Sterling, P. & Laughlin, S. Principles of neural design. (MIT Press, 2015).
148
8.
Anderson, D. E. & Awh, E. The plateau in mnemonic resolution across large set sizes
149
indicates discrete resource limits in visual working memory. Atten. Percept. Psychophys.
150
74, 891–910 (2012).
151
9.
152
153
Anderson, D. E., Vogel, E. K. & Awh, E. Precision in visual working memory reaches a
stable plateau when individual item limits are exceeded. J. Neurosci. 31, 1128–38 (2011).
10.
Rademaker, R. L., Tredway, C. H. & Tong, F. Introspective judgments predict the
6
154
precision and likelihood of successful maintenance of visual working memory. J. Vis. 12,
155
21 (2012).
156
157
SUPPLEMENTARY FIGURES
Circular kurtosis
Circular variance
158
Anderson & Awh
2012 (180 deg)
Anderson & Awh
2012 (360 deg)
Anderson et al.
2011 (Exp 1)
Rademaker et al.
2012
2 4 6 8
2 4 6 8
2 4 6 8
2 4 6 8
2 4 6 8
2 4 6 8
2 4 6 8
2 4 6 8
1
0.5
0
1
0.5
0
Data
159
160
Model
Set size
Supplementary figure S1. Fits to the three delayed-estimation benchmark
data sets that were excluded from the main analyses. Circular variance (top)
and circular kurtosis (bottom) of the estimation error distributions as a function of
set size, split by experiment. Error bars and shaded areas represent 1 s.e.m. of the
mean across subjects. The first three datasets were excluded from the main
analyses on the ground that they were published in papers that were later retracted
(Anderson & Awh, 2012; Anderson et al. 2012). The Rademaker et al. (2012)
dataset was excluded from the main analyses because it contains only two set
sizes, which makes it less suitable for a fine-grained study of the relationship
between encoding precision and set size.
7
0.1
0.08
0.06
0.04
Joptimal 8  1.7
8
Joptimal  6   2.1
Joptimal  4   3.2
Joptimal  2   6.1
0.02
6
4
2
0
0
2
4
6
8 10
Mean encoding precision,
161
Figure S2. Graphical illustration of Eq. (S1). The
value of at which the equality described by Eq. (S1)
holds is the intersection point between the function
specified by the left-hand side (red curve) and a flat
line at a value Nλ. Since the left-hand side is strictly
positive and also a strictly decreasing function of ,
the value at which this intersection occurs (i.e., optimal)
necessarily decreases with N.
162
163
8