Learning the Exception to the Rule: Model-Based

Supplementary Material for
Learning the Exception to the Rule: Model-Based fMRI Reveals
Specialized Representations for Surprising Category Members
Tyler Davis*, Bradley C. Love, Alison R. Preston
*To whom correspondence should be addressed. Email: [email protected]
This file includes:
Tables S1 to S5
Supplementary Methods
1
Supplemental Tables
Table S1. Regions correlated with the recognition strength measure from SUSTAIN during the
stimulus presentation period.
Region
Hemisphere
Z
Anterior Cingulate Cortex
Caudate
Cerebellum
R
R
L
Fusiform Gyrus
L
R
3.24
4.42
4.05
4.40
3.59
3.50
3.44
3.88
3.25
3.52
4.26
3.42
3.88
3.90
3.62
4.25
4.53
3.71
3.33
5.40
4.15
3.31
4.52
4.03
3.33
3.88
4.50
3.81
4.28
3.78
4.44
3.75
4.87
4.19
3.73
4.27
4.56
4.01
4.17
4.06
4.71
3.95
3.37
Frontal Pole
Hippocampus
Inferior Frontal Gyrus—Pars Triangularis
Inferior Temporal Gyrus
Insula
Intracalcerine Cortex
Midbrain
Middle Frontal Gyrus
L
R
L
R
L
R
L
L
L
R
Bilateral
L
R
Lateral Occipital Cortex—Inferior Division
Lateral Occipital Cortex—Superior Division
R
L
R
Nucleus Accumbens
Occipital Pole
Orbital Frontal Cortex
Pallidum
Paracingulate Gyrus
Precuneus
Posterior Cingulate Gyrus
Supramarginal Gyrus
Superior Frontal Gyrus
Superior Parietal Lobule
Thalamus
L
R
R
R
L
R
R
R
L
R
R
L
R
MNI coordinates
(x, y, z)
12, 28, 26
10, 10, -2
-8, -22, -30
-22, -58, -36
0, -62, -34
-6, -76, -24
-22, -86, -6
40, -58, -10
38, -44, -18
-44, 40, 32
50, 28, 24
-24, -32, -6
20, -38, 2
-52, 24, 18
48, 28, 6
-40, -58, -6
-32, 16, -10
-16, -72, 6
14, -68, 16
0, -24, -24
-36, 4, 42
-38, 20, 28
48, 14, 32
46, 6, 52
30, 2, 42
44, -86, 12
-26, -60, 40
-26, -76, 52
30, -66, 32
18, -62, 52
-8, 12, -4
16, -88, 2
34, 28, -6
12, -6, -6
-4, 10, 52
4, 26, 44
14, -70, 36
8, -18, 30
-42, -44, 44
12, 12, 58
32, -50, 42
-12, -10, -4
20, -24, 8
2
Table S2. Regions correlated with the error correction measure from SUSTAIN during the
feedback period.
Region
Angular Gyrus
Anterior Cingulate Cortex
Caudate
Cerebellum
Fusiform Gyrus
Frontal Pole
Hippocampus
Inferior Frontal Gyrus—Pars Opercularis
Inferior Frontal Gyrus—Pars Triangularis
Insula
Juxtapositional Lobule
Midbrain
Middle Frontal Gyrus
Middle Temporal Gyrus
Lateral Occipital Cortex—Inferior Division
Lateral Occipital Cortex—Superior Division
Hemisphere
Z
L
R
R
L
R
L
L
R
L
R
L
R
L
R
R
L
L
R
R
L
R
L
L
R
4.55
4.05
4.89
3.32
3.95
3.94
3.73
4.36
4.12
3.62
3.42
4.76
4.24
4.47
3.29
4.83
4.17
5.00
5.24
4.08
3.36
4.15
3.36
5.4
4.69
3.76
4.45
3.95
3.40
4.76
4.89
3.89
4.63
4.55
3.99
4.47
5.07
4.24
3.70
4.01
3.99
3.94
3.60
L
Orbital Frontal Cortex
Paracingulate Gyrus
R
R
Precuneus
Precentral Gyrus
Posterior Cingulate Gyrus
R
L
R
L
R
R
L
L
R
L
Supramarginal Gyrus
Superior Frontal Gyrus
Superior Temporal Gyrus
Superior Parietal Lobule
Thalamus
MNI coordinates
(x, y, z)
-62, -56, 14
2, 6, 26
10, 6, 2
-14, 8, 6
6, -56, -38
-18, -76, -28
-30, -76, -28
34, -48, -16
-32, -50, -12
32, 48, 4
-32, 54, 0
24, -40, 0
-44, 22, 20
48, 26, 10
30, 12, -18
-32, 18, -8
-14, 10, 46
4, -28, -6
36, 20, 32
-40, 12, 58
72, -30, -4
-50, -28, -6
-50, -69, 10
24, -60, 50
36, -74, 38
26, 78, 50
-34, -66, 60
-22, -78, 54
-28, -86, 54
38, 22, -6
10, 38, 22
14, 22, 38
14, -64, 34
-36, 6, 26
8, -26, 26
-8, -30, 26
42, -46, 40
12, 16, 54
-50, 2, -16
-32, -54, 38
10, -14, 8
-10, -8, 0
-10, -26, 6
3
Table S3. Regions correlated with the error correction measure of ALCOVE during the feedback
period.
Region
Hemisphere
Z
Angular Gyrus
Anterior Cingulate Cortex
L
R
Caudate
Cerebellum
R
R
3.69
4.39
3.39
4.20
4.48
3.49
3.36
4.70
4.03
4.43
4.03
4.96
3.60
4.67
3.63
3.71
4.46
4.93
4.70
4.19
3.74
3.79
3.45
4.68
5.36
4.57
5.57
4.20
4.67
3.91
4.72
3.92
5.32
4.74
4.43
3.99
3.47
3.23
4.91
5.11
4.68
4.85
4.55
3.45
4.81
4.68
3.92
L
Fusiform Gyrus
Frontal Pole
Hippocampus
Inferior Frontal Gyrus—Pars Opercularis
R
L
R
L
R
R
L
Inferior Occipital Gyrus
Inferior Temporal Gyrus
R
R
Juxtapositional Lobule
Midbrain
Middle Temporal Gyrus
L
R
L
R
L
R
Lateral Occipital Cortex—Inferior Division
Lateral Occipital Cortex—Superior Division
L
L
R
Middle Frontal Gyrus
L
Orbital Frontal Cortex
Pallidum
Paracingulate Gyrus
Planum Polare
Precuneus
R
L
L
R
L
R
R
L
MNI coordinates
(x, y, z)
-40, -54, 24
4, 6, 26
10, 44, 6
10, 8, 2
0, -58, -28
30, -66, -28
10, -76, -22
-2, -46, -14
-16, -36, -26
-14, -74, -28
-32, -74, -30
34, -50, -18
40, -12, -36
-30, -52, -12
32, 48, 4
-24, 52, 14
24, -20, 0
46, 24, 10
-44, 22, 20
-46, 24, -4
44, -80, 12
48, -54, -10
56, -20, -32
-14, 8, 48
4, -28, -4
-12, -16, -10
34, 18, 34
-42, 12, 58
48, -18, -12
62, -30, -6
-50, -26, -6
-46, -78, 12
22, -58, 50
30, -76, 40
-20, -72, 40
-32, -62, 56
-28, -86, 34
-32, -84, 18
40, 20, -8
-24, 22, -10
-12, 2, 2
10, 40, 22
8, 26, 34
-8, 44, 12
48, 0, -16
14, -66, 34
-18, -60, -36
4
Table S3. Continued
Region
Hemisphere
Z
Precentral Gyrus
R
L
Posterior Cingulate Gyrus
R
3.49
4.85
4.67
4.25
4.66
4.90
4.51
3.37
4.75
3.17
4.01
3.80
3.69
3.55
4.76
Supramarginal Gyrus
Superior Frontal Gyrus
Superior Temporal Gyrus
Superior Parietal Lobule
Temporal Pole
Thalamus
L
R
L
R
L
L
L
R
R
MNI coordinates
(x, y, z)
36, -8, 44
-48, -4, -20
-42, 0, 32
2, -24, -22
10, -26, 28
-8, -28, 26
42, -44, 40
-44, -34, 38
12, 16, 56
-24, -4, 54
-4, 46, 32
-66, -40, 4
-32, -54, 38
38, 4, -40
8, -14, 6
5
Table S4. Regions demonstrating greater activation for correctly categorized exception items
relative to correct rule-following items during stimulus presentation.
Region
Anterior Cingulate Cortex
Caudate
Cerebellum
Hemisphere
Z
L
R
L
R
3.57
5.98
5.93
4.74
4.08
3.97
4.81
4.21
3.87
3.76
3.73
3.53
4.18
4.37
4.37
4.25
4.72
4.31
3.92
3.14
3.73
3.40
4.67
4.64
4.09
4.91
4.48
MNI coordinates
(x, y, z)
-8, 40, 16
8, 8, -2
-8, 12, -2
36, -56, -32
14, -50, -16
26, -66, -22
-26, -64, -30
-6, -76, -24
-4, -60, -34
-42, -60, -38
0, -66, -8
-24, -46, -30
40, -50, -10
-20, -84, -8
-30, -70, -10
-42, 20, 6
48, 38, 24
42, 58, 12
26, 50, -16
40, 52, -6
-18, 50, -20
-14, 66, -14
26, -26, -8
18, -36, 4
48, 28, 8
-58, 26, 20
-40, -58, -6
5.78
4.12
4.65
5.70
5.18
4.81
5.29
5.10
4.65
4.37
3.34
3.65
3.92
5.10
5.72
3.77
4.82
4.67
3.71
3.70
5.59
4.75
34, 24, -4
14, -78, 14
-10, -78, 12
2, -20, -12
44, 16, 30
36, 6, 40
-42, 4, 50
-36, 8, 32
-40, 30, 20
-34, 2, 64
62, -34, -10
-54, -34, -14
36, -84, 4
32, -66, 32
-28, -62, 40
-18, -64, 54
30, -56, 2
18, -90, 2
6, -94, 14
-12, -96, 0
-32, 24, -6
22, -8, -4
L
Fusiform Gyrus
R
L
Frontal Operculum Cortex
Frontal Pole
L
R
L
Hippocampus
R
Inferior Frontal Gyrus—Pars Triangularis
R
R
L
Inferior Temporal Gyrus
Insula
Intracalcerine Cortex
Midbrain
Middle Frontal Gyrus
R
R
L
R
R
L
Middle Temporal Gyrus
Lateral Occipital Cortex—Inferior Division
Lateral Occipital Cortex—Superior Division
R
L
R
R
L
Lingual Gyrus
Occipital Pole
R
R
Orbital Frontal Cortex
Pallidum
L
L
R
6
Table S4. Continued
Region
Hemisphere
Z
Paracingulate Gyrus
R
Precuneus
R
L
R
L
R
R
L
L
L
5.88
4.60
4.41
5.29
5.86
3.57
3.66
4.23
5.54
5.38
3.98
4.05
3.24
Precentral Gyrus
Posterior Cingulate Gyrus
Superior Parietal Lobule
Temporal Pole
Thalamus
MNI coordinates
(x, y, z)
6, 30, 40
6, 10, 54
10, 40, 20
16, -70, 44
-6, -74, 42
32, -18, 52
-58, 6, 38
4, -30, 32
34, -50, 42
-36, -46, 40
-54, 18, -12
-10, -20, 12
-20, -32, -2
7
Table S5. Regions correlated with the recognition strength measure from SUSTAIN, controlling
for the effect of reaction time.
Region
Caudate
Cerebellum
Frontal Pole
Fusiform Gyrus
Hippocampus
Inferior Frontal Gyrus—Pars Triangularis
Inferior Temporal Gyrus
Insula
Intracalcerine Cortex
Lateral Occipital Cortex—Inferior Division
Lateral Occipital Cortex—Superior Division
Lingual Gyrus
Midbrain
Middle Frontal Gyrus
Nucleus Accumbens
Occipital Pole
Orbital Frontal Cortex
Pallidum
Paracingulate Gyrus
Posterior Cingulate Gyrus
Precuneus
Superior Frontal Gyrus
Superior Parietal Lobule
Supramarginal Gyrus
Thalamus
Hemisphere
Z
R
L
L
L
L
L
R
R
R
R
L
R
R
L
R
L
L
L
L
R
4.42
4.05
4.40
3.61
3.50
3.52
4.26
3.88
3.44
3.25
3.44
3.88
3.77
3.42
3.62
3.90
4.25
3.95
4.53
3.24
MNI coordinates
(x, y, z)
10 , 10, -2
-8, -22, -30
-22, -58, -36
-32, -62, -32
-6, -76, -24
-44, 40, 32
50, 38, 24
40, -58, -10
32, -52, -16
38, -44, -18
-22, -86, -6
20, -38, 2
26, -28, -8
-24, -32, -6
48, 28, 6
-52 , 24, 18
-40, -58, -6
-22, 46, -28
-32, 16, -10
18, -78, 6
L
L
R
R
L
R
L
R
R
Bilateral
L
L
R
R
R
R
L
R
R
R
R
3.71
3.48
3.33
3.88
4.50
4.28
3.81
3.78
3.51
5.40
4.15
3.31
4.52
4.12
4.03
3.33
4.44
3.75
4.87
4.19
4.27
-16, -72, 6
-10, -78, 14
14, -68, 16
44, -86, 12
-26, -60, 40
30, -66, 32
-26, -76, 52
18, -62, 52
30, -58, 0
0, -24, -14
-36, 4, 42
-38, 20, 28
48, 14, 32
36, 6, 36
46, 6, 52
30, 2, 42
-8, 12, -4
16, -88, 2
34, 28, -6
12, -6, -6
4, 26, 44
L
L
R
R
R
R
L
L
R
3.73
3.58
3.99
4.56
4.06
4.71
4.17
3.95
3.37
-4, 10, 52
-8, 18, 48
8, -18, 30
14, -70, 36
12, 12, 58
32, -50, 42
-42, -44, 44
-12, -10, 4
20, -24, 8
8
Table S6. Abstract category structure. Each row represents a unique stimulus (i.e., beetle). The
four values assigned to a stimulus denote the four stimulus dimensions (e.g., antenna, legs, etc.)
assigned to a beetle. Each numeric value (1 or 2) represents a specific feature instantiation (e.g.,
red or green eyes). The first dimension represents the rule-relevant dimension. Most Hole A
beetles have a 1 on the first dimension (e.g., red eyes) whereas most Hole B beetles have a 2
(e.g., green eyes). The first stimulus in each of the columns is therefore an exception.
Hole A Beetles
Hole B Beetles
2 2 2 2*
1 2 2 2*
1112
2112
1121
2121
1211
2211
*Exception item
Recognition Test Foils
1111
1122
1212
1221
2222
2211
2121
2112
Supplemental Methods: Model Formalism and Procedures
SUSTAIN
Input Stimulus Representation. An input stimulus is represented to SUSTAIN as a pattern of
activation on input units that code the different stimulus features and possible values that these
features can take. For each stimulus feature, i (e.g., a beetle’s eye color), with k possible values
(two in the present experiment; e.g., red or green eyes), there are k input units. Input units are set
to one if the unit represents the feature value or zero otherwise. The entire stimulus is
represented by I posik , with i indicating the stimulus feature and k indicating the value for feature i.
“Pos” indicates that the stimulus is represented as a point in a multidimensional space.
The distance µij between the ith stimulus feature and cluster j’s position along the ith feature is
!
vi
ik
µij = 1/2" | I posik # H pos
|
j
(S1)
k=1
!
ik
where vi is the number of possible values that the ith stimulus feature can take and H pos
is
j
cluster j’s position on the ith feature for value k. Distance µij is always between 0 and 1,
inclusive.
!
9
Response Selection. After encoding, each cluster is activated based on the similarity of the
cluster to the input stimulus. Cluster activation is given by:
na
% (" ) e
#
$ "i µ ij
i
H act
j =
(S2)
i=1
na
% (" )
#
i
i=1
!
where H act
is cluster j’s activation, na is the number of stimulus features, and "i is the receptive
j
field tuning (which controls the model’s attention to a given feature) for feature i, and r is the
attentional parameter (constrained to be non-negative). "i is set to 1 at the start of learning.
!
! Clusters enter into a competition to respond to an input stimulus through
mutual inhibition. The
out
final output of each cluster j is H j , which is !
given by:
H
out
j
=
"
(H act
j )
nc
# (H
act "
j
)
H act
j
!
(S3)
i=1
!
where nc is the current number of clusters and β is a lateral inhibition parameter (constrained to
be non-negative) that controls the level of cluster competition.
out
Only the cluster with the largest output value is allowed to pass its output, H j , across its
connections to the output layer. The cluster that wins the competition, Hm, passes its output to
the k output units of the unknown feature dimension z
!
(S4)
Czkout = w m,zk H mout
out
!
where Czk is the output of the unit representing the kth feature value of the zth feature, and wm,zk
is the weight from the winning cluster, Hm, to the output unit Czk. In the simulations present here
out
where only the category label is queried, z always stands for the category label, and Czk is
! calculated for each of the k (k=2) values that the category label can take (Hole A or B).
The probability of making a response k for a queried dimension, z, on a given trial is:
!
out
P(k) =
e(dC zk
vz
"e
)
(dC zkout
(S5)
)
j=1
!
10
Cluster recruitment. In the present simulation, SUSTAIN was initialized with two clusters, one
for each category, that represent the modal stimulus in the category, excluding exceptions. This
initialization reflects the cuing procedure used in the study, which alerted subjects to the rulerelevant dimension. Other clusters are recruited in SUSTAIN in response to surprising stimulus.
Similar to recent uses of SUSTAIN (Love & Gureckis, 2007), we included a hippocampal
function parameter, τh (constrained to be between 0 and 1) that probabilistically determines
whether an error will lead to new cluster recruitment. If SUSTAIN makes a prediction error, and
τh exceeds q, where q is a randomly generated value (between 0 and 1), a new cluster is
recruited. If the value of τh is 1, all errors lead to new cluster recruitment, and the model is
equivalent to SUSTAIN’s original formulation (i.e., Love, Medin, & Gureckis, 2007).
Learning. The learning rules determine how the clusters are updated. Only the winning clusters
are updated. If a new cluster is recruited on a trial, it will be the winner. Otherwise, the cluster
that is most similar to the current stimulus will be the winner. The winning cluster Hm, will have
its position adjusted by:
(S6)
"H mposik = #(I posik $ H mposik )
!
where η is the learning rate parameter. The result of the updating is that the winning cluster
moves toward the current stimulus. Over the course of learning, each cluster will tend toward
the center of its members.
Receptive field tunings are updated according to
(S7)
"#i = $e% #i µ im (1% #iµim )
where m indexes the winning cluster.
!
The weights from the winning cluster to the output units are adjusted by the one layer delta
learning rule (Rumelhart, Hinton & Williams, 1986).
"w m,zk = #(t zk $ Czkout )H mout
!
(S8)
Simulations. For the present simulation, items were presented to SUSTAIN using the same order
and randomization methods as for human subjects. To reflect the rule cuing procedure, the model
was initialized with a cluster for each category that represented the average of the rule-following
items within the category, and the attention weight on the rule-relevant dimension was initialized
and fixed at 5.1 (cf. Love & Gureckis, 2007). The free parameters, γ, β, η, d, and τh, were fit to
the average learning curve using standard optimization techniques. Obtained values were: γ =
1.259, β = 0.702, η = 0.415, d = 25.264, τh = 0.174.
The recognition strength measure R was calculated by summing the output H out
j for all clusters:
!
11
nc
R = " H out
j
(S9)
j=1
!
The error correction measure was given by the summing absolute value of the difference
between the winning cluster’s output on unit k, for the queried feature dimension, z, and the
item’s actual position on this dimension.
vz
E = " | Czkout # I poszk |
(S10)
k=1
where vz, again, equals the number of output units for feature dimension z.
!
Trial-by-trial values for recognition strength and error correction measures were averaged over
1000 simulations.
ALCOVE
Input Stimulus Representation. A stimulus is represented to ALCOVE as a pattern of activation a
along input nodes that code the stimulus’ values v for each feature dimension i. An entire
stimulus is represented by the column vector ain. Input nodes are gated by dimensional attention
strength αi that reflects the relevance of each feature dimension for the task.
Input stimuli activate each hidden node according to the psychological similarity of the input
stimulus to the hidden node, where hidden nodes are each of the previously observed exemplars.
The activation of the jth hidden node is:
"c(
a hid
j =e
!
$ # i |hji"a iin |
j
r q/r
)
(S11)
where c a constant (c > 0) that determines the specificity of the hidden node, and where r and q
are constants that determine the psychological distance metric and similarity gradient,
respectively. Both r and q are set to 1 in the present applications, which corresponds to cityblock distance metric with an exponential similarity gradient.
Response Selection. Hidden nodes are connected to k output nodes that reflect the available
response categories (i.e., hole A or hole B) by a weight wkj that gives the strength of the
association between hidden node j and output node k. Output node activation is given by
akout = " w kj a hid
j
(S12)
hid
j
The probability of classifying a given stimulus into category k on a trial is given by
!
12
out
e"a k
Pr(K) =
out
# e"ak
(S13)
out
k
where is φ a mapping constant.
!
Learning. The dimensional attention strengths αi and association weights wkj are learned via
gradient descent on sum squared error. The error generated by the model on a given trial is given
by
E = 1/2# (t k " akout ) 2
(S14a)
out
k
where tk are teacher values that code the feedback given to ALCOVE such that
!
!
!
tk = {
max(+1,a kout ) if the stimulus is in category K
min("1,a kout ) if the stimulus is not in category K
}
(S14b)
The dimensional attention strengths and association weights are updated on each trial according
to the learning rules
"w kjout = #w (t k $ akout )a hid
j
(S15)
'
*
)
,
in
"# = $ %# &)& (t k $ akout )w kj ,a hid
j c | h ji $ ai |
hid out
,+
j )
(k
(S16)
where λw and λα are learning rate constants that are constrained to be greater than 0.
!
Simulations. ALCOVE was trained using the same procedure as the SUSTAIN simulations, with
obtained values for the free parameters: λα = 0.01, λw = 0.40, c = 0.66, φ = 3.125.
Recognition strength is computed in ALCOVE as familiarity or similarity of an item to all
exemplars stored in memory (i.e., the sum of the hidden node activations).
"a
hid
j
(S17)
hid
j
Error is computed for each trial as in S14a.
!
13