Distributed representations and the bilingual lexicon

Thomas, M.S.C. & de Wet, N.M. (1998). Stochastic double dissociations in
distributed models of semantic memory. In: G. Humphreys & D. Heinke (Eds.),
Proceedings of the 5th Neural Computation and Psychology Workshop. Springer.
Stochastic double dissociations in distributed
models of semantic memory
Michael S. C. Thomas
Neurocognitive Development Unit,
Institute of Child Health,
University College London Medical School,
London, U.K.
Nicola M. de Wet
Department of Psychology, King Alfred’s College,
Winchester, UK
Abstract
We present a preliminary set of connectionist models of impairments
to semantic memory, exploring the conditions under which double
dissociations between knowledge of living and non-living entities
occur. Small et al [1] argue that category specific impairments are a
consequence of semantic feature based representations in a fully
distributed memory system. Farah and McClelland [2] argue that
category specific impairments arise due to modular structure in
semantic memory, albeit structure that is specific to modality; they
hypothesise that living and non-living entities have a differential
reliance on perceptual and functional features. We evaluated these
respective claims by lesioning a simple autoassociative model of
semantic memory, using a 2x2 design: fully distributed architecture
versus partially modular architecture with modality specific channels;
Small et al’s training set versus a training set constructed according to
Farah and McClelland’s perceptual/functional scheme. One thousand
stochastic lesions were applied to each network ‘subject.’ The results
supported Farah and McClelland: on average, double dissociations
required modular structure and differential reliance on modalities.
However, by choosing select (i.e. rare) lesions from each set of 1000,
double dissociations of living versus non-living knowledge were found
in both networks using both training sets. We discuss the idea that
statistical distributions of impairments in patients with similar lesions
are necessary to compare against the predictions of functional models,
and thus that single case studies may be insufficient to distinguish
distributed and modular architectures.
1 Introduction
The functional structure of semantic memory has been explored through impairments
found in patients with brain damage. Of particular interest have been category
specific impairments of semantic knowledge, found across a range of tasks including
reading [3], object naming [4, 5], and comprehension [6, 7], implying that it is
semantic memory itself which has been functionally damaged. Selective impairments
have been found to categories such as animate objects [6], inanimate objects [8],
concrete objects [3], animals, fruits and vegetables [5], fruits and vegetables [4], and
animals [9]. Such category specific impairments were generally caused by bilateral
temporal lobe injury, arising through infection by herpes encephalitis [1]. However,
these selective impairments are rare [1]. In this paper, we will seek to evaluate two
accounts of this pattern of selective impairments, both based on parallel distributed
processing models of semantic memory. First we will examine a model proposed by
Farah and McClelland in 1991, and then a model proposed by Small et al in 1996.
1.1 Farah and McClelland (1991)
Farah and McClelland (F&M) sought to resolve a conflict between two types of
neuropsychological evidence. On the one hand, most selective losses of knowledge
after brain damage appear to be tied to specific modalities. This would imply that
semantic memory is subdivided by modality of knowledge, which F&M claimed is
consistent with neurophysiological data showing that most cortical neurons are
modality specific. On the other hand, there are the more infrequent cases of category
specific impairment. Pursuing an idea proposed by Warrington and colleagues, F&M
resolved this conflict by suggesting that semantic memory is subdivided by modality,
but that categories are differentially reliant on different modalities (for example,
visual information vs motor associations vs auditory information vs tactile
information, and so on). Using subjects to rate dictionary definitions, they showed
that living things on average have 7.7 times as many perceptually based as
functionally based features, whereas non-living things only have 1.4 times as many
perceptual features as functional features. They then put forward a connectionist
model, which used distributed representations to map interactively between verbal
information (words), semantic information, and visual information (pictures).
Semantic information was split into 60 perceptual and 20 functional features.
Representations for 10 living and 10 non-living objects were defined, reflecting the
ratios of perceptual and functional features generated from their empirical study. On
lesioning the network, F&M found that living things were affected worse by lesions
to perceptual features, and non-living by lesions to functional features.
The modality specific hypothesis suffers one inconsistency: if impairment to
knowledge of living things is caused by an impairment to perceptual knowledge
alone, why is it that patients also show an impairment to their knowledge of the
functional attributes of living things [5, 6, 10, 11, 12]? However, F&M’s model also
showed this characteristic. The authors explained this effect in terms of a ‘critical
mass’. The claim was that distributed representations need a certain critical mass to
become activated: if this critical mass (in this case, of perceptual features) is
eliminated in the representations of living things, the preserved part of the
representation (i.e. the functional features) would not become fully activated.
1.2 Small, Hart, Nguyen, and Gordon (1996)
Small et al, on the other hand, resist the idea that category specific impairments
necessitate any explicit partitioning or organisation of semantic memory. Their
hypothesis is that ‘a significant degree of semantic memory organisation could
devolve naturally from the demands of a system that encodes features of objects’ (p.
127). In this view, categories emerge in a fully distributed semantic memory from the
structure inherent within the information being encoded. Small et al explored this idea
using connectionist models which either autoassociated semantic features of objects,
or mapped between semantic features and a (localist) name for each object. They
defined semantic codes for 57 objects, over a set of 77 semantic features. The objects
were taken from pictures used by Hart and Gordon [9] in their study of a patient with
a category specific deficit. Small et al’s feature set was initially based on the features
used by Hinton and Shallice [13] in their connectionist model of deep dyslexia, but
also ‘derived from several studies of patients with category specific anomias, where
explicit knowledge of the features of items were assessed [4, 9]. The feature set was
chosen to depict adequately the specific items in the study based upon their
perceptual qualities, physical attributes, motor associations, and functional/
associative attributes deemed pertinent to describe the objects verbally.’ (p. 111).
Although Small et al performed no lesions on their networks, they established that
their feature set generated internal representations which demonstrated precisely
those categories which are found in specific impairments after brain damage.
Hierarchical cluster analysis, principal components analysis, and dependency analysis
all revealed emergent categories of animals, fruits and vegetables, tools, and vehicles.
Small et al further found that a unsupervised network trained on the pattern set
developed clusters of units responsible for representing each category. Selectively
removing such clusters of units would cause category specific impairments1,2. Small
et al concluded that their computational study provided ‘one explanation for the
cognitive neurological manifestations of diseases such as herpes simplex encephalitis
that affect primarily the temporal neocortex and limbic system’ (p. 127).
1
However, such a model would imply that category specific deficits can only be generated by focal
lesions. Patient data suggests that they also regularly occur through diffuse damage, such as in
Alzheimer’s disease [23]
2 In a similar vein, French and Mareschal [24] demonstrated that an autoassociative network trained on
Living and Non-living patterns but using a training algorithm that progressively compressed hidden unit
representations, led to certain hidden units specialising in representing certain categories. Removal of
these units would cause selective deficits for that category.
1.3 Comparison
We thus have two competing hypotheses. The first accepts the notion that the double
dissociation of living and non-living categories must derive from some structural
partitioning of semantic memory, albeit by modality rather than category. The second
maintains that a fully distributed semantic memory will generate category specific
impairments simply through latent structure in the knowledge that the system is
representing. Although in the absence of lesioning evidence from their models, the
Small et al claim must be considered the weaker, we were nevertheless interested in
comparing these approaches. It seemed to us important to establish whether the
organisation of distributed representations established by network analyses
techniques such as PCA could ever appear as selective impairments. If not, then it
seems reasonable to accept Farah and McClelland’s line that selective impairments
must reveal structural partitioning of the system in question. Given the reliance of
standard cognitive neuropsychology on deducing structure from single case studies,
we were also interested in exploring whether there is any difference in specific case
studies and the average effect of lesioning networks. Thus, following an approach
recently suggested by Juola and Plunkett when lesioning a connectionist model of
inflectional morphology [14], we lesioned our models many times and examined the
distribution of the pattern of results.
2 Simulations
2.1 Design
We based our simulations around Small et al’s autoassociative semantic memory
network, shown in Figure 1a. While autoassociation can only be a component of a
semantic memory system that must be accessed in different ways, it nevertheless
allowed us to develop a distributed system representing the knowledge base and
investigate the effect of lesions to this system3. We trained the network to
autoassociate Small et al’s training set, 57 objects (23 living, 34 non-living)
represented using 59 named perceptual features (e.g. colour, size, texture) and 18
named functional features (e.g. makes_noise, propelled, for_cooking). Features were
binary, 1 if the feature was present, 0 if not. We will then lesion this network.
Lesioning was performed by damaging the connections at various points in the
network (see Bullinaria and Chater [15] for a discussion of the implications of using
different forms of damage). We examined the effect of lesioning the Small et al
network at 3 different sites: between the input layer and the first hidden layer,
between the two hidden layers, and between the second hidden layer and the output
layer (see Fig 1a). We used 6 network ‘subjects’, trained using different random
3
If semantic memory were accessed in different ways, this would change the nature of the computations
required of the distributed model. For example, a naming network might take a featural representation as
input and map to localist word nodes as output [1]. Mappings of this sort would alter the similarity
structure established over internal representations: the nature of the mappings is thus likely to change the
patterns of deficits which arise through damaging such representations. In this preliminary work, we chose
to start with the more simple case of autoassociation.
seeds. For each subject, we lesioned each site 1000 times, and took the mean
impairment at each site. We then averaged across our six subjects to find the results at
each site. We repeated this with 30% impairment to the connections at each site,
50%, 75%, and 100%. The different degrees of damage were implemented by
subjecting each weight to a .30, .50, .75, or 1.00 chance of being set to zero. In effect,
this schedule of lesionings is equivalent to assessing 72,000 patients with damage to
their semantic memory systems.
Secondly, we modified the original Small model to implement the assumptions
employed by F&M. F&M proposed modality specific structure for the semantic
memory system. We implemented this by splitting the first hidden layer in two, and
having channels specialised for processing perceptual and functional knowledge. The
first hidden layer was split in proportion to the ratio of the number of perceptual and
functional input units. This architecture is shown in Figure 1b. We now needed a
training set which embodied F&M’s assumptions about the ratio of perceptual
features to functional features for living and non-living things. Accordingly, binary
patterns were generated at random under the following constraints: there were 23
living patterns, 34 non-living patterns; 58 perceptual features and 19 functional
features; patterns were generated around prototypes, where each exemplar had a 20%
chance of each of its features being different from that of its prototype; living
exemplars were based around 4 prototypes, non-living exemplars were based around
(a)
Perceptual Features (59)
(b)
Functional
Features (18)
Perceptual Features (59)
Functional
Features (18)
Site 5
Site 3
Hiddens (12)
Hiddens (12)
Site 2
Site 4
Site 3
Hiddens (36)
(27)
Site 1
Perceptual Features (59)
(9)
Site 1
Functional
Features (18)
Perceptual Features (59)
Site 2
Functional
Features (18)
Figure 1. Architecture of (a) the fully distributed Small et al model, and
(b) the semi-modular ‘Farah and McClelland’ model.
6 prototypes; the overall ratio of perceptual to functional features in the living
exemplars was 7.8:1, in the non-living exemplars it was 1.8:1. Living exemplars had
on average 33.9 perceptual features active, and 4.3 functional features active; nonliving exemplars had on average 15.1 perceptual features active and 8.5 functional
features active. (The equivalent values for the Small et al training set were for living
were 1.6:1, 8.9 perceptual features and 5.5 functional features; for non-living 2.9:1,
10.3 perceptual features and 3.6 functional features). As before, we took 6 network
‘subjects’, trained using different random seeds. For each subject, we lesioned each
of the 5 sites 1000 times (compared to F&M’s 5 times), and took the mean
impairment each of the sites in this semi-modular network. We then averaged across
our six subjects. We will repeat this at 30%, 50%, 75%, and 100% impairment. This
is equivalent to assessing 120,000 patients with damage to their semantic memory
systems.
Finally, we trained the Small et al architecture on the F&M training set, and the
F&M architecture on the Small et al training set, yielding 192,000 further ‘patients’.
This design allowed us to investigate separately the effect of training set and of
architecture.
Simulation details: Network employed sigmoid processing units; weights were
initially randomised between +/- 0.5; networks trained using a learning rate of 0.01
and no momentum, using backpropagation with cross-entropy [16]. Training was
terminated when all the features for all patterns fell the correct side of 0.5. For fully
distributed networks this took on average 1514 epochs for semantic features, 1400
epochs for ratio features; for semi-modular networks, this took 1556 epochs and 832
epochs respectively. Piloting of the above scheme revealed that the F&M training set,
with less internal structure, required more hidden units to learn to criterion.
Accordingly, the distributed and semi-modular networks trained on this set used
double the number of hidden units in each layer (see discussion).
2.2 Assessment of Performance
We evaluated the performance of the networks under lesioning in terms of the mean
increase in RMS error across the living and non-living patterns. In terms of the
response of the network, it may seem more appropriate to perform a nearest
neighbour transform on the network output, and determine whether the output most
resembles the target from among the training set; that is, is the network still closest to
the correct pattern? However, to perform such a transform on the raw activations
risks warping the pattern of results: it discriminates against those patterns which are
tightly clustered in the pattern space, and in favour of those which have few close
neighbours. Since we are interested in the effects on the underlying representations,
we will concentrate on changes in output accuracy. Note, it may well be that some
dissociations in the patient data result from differences in the density of the pattern
space. However, here our emphasis remains on underlying representations.
2.3 Results
2.3.1 Analyses of the training sets
A principal component analysis was used to explore the structure of the two training
sets. A plot of the first two principal components showed that in both the Small et al
pattern set and the F&M ratio training set, living and non-living patterns were
separated. In the Small et al set, there were additional clusters for animals, fruit and
vegetables, cooking ingredients, tools, and vehicles. In the F&M ratio training set
there was loose clustering of the exemplars generated around each prototype.
2.3.2 Mean population results
Table 1 shows the mean error on Living and Non-living patterns in each combination
of network architecture and training set, as increasing levels of lesioning were
applied. Figures are averaged over sets of 1000 lesions within subjects, and then
across the set of 6 network subjects. Underlined numbers show dissociations between
Living and Non-living patterns at each site. Figure 2 shows these data graphically for
3 sites in the semi-modular network trained on the F&M ratio pattern set.
Table 1: Mean error for Living and Non-living patterns for each network and
training set after lesions at each lesion site. Scores are averaged over the 6
network ‘subjects’. Underlined figures show where a dissociation has occurred
between the two types of patterns, defined here as a difference greater than 0.01.
Distributed Network, Small et al Training Set
%
0
30
50
75
100
L
Site 1
NL
0.02
0.05
0.07
0.12
0.18
0.02
0.05
0.07
0.11
0.17
L
Site 2
NL
0.02
0.08
0.11
0.16
0.21
0.02
0.08
0.11
0.16
0.21
L
Site 3
NL
0.02
0.11
0.16
0.22
0.26
0.02
0.11
0.16
0.21
0.26
Distributed Network, F&M Ratio Training Set
%
0
30
50
75
100
Site 1
L
NL
0.05
0.20
0.26
0.35
0.48
0.04
0.18
0.24
0.30
0.41
Site 2
L
NL
0.05
0.23
0.29
0.37
0.49
0.04
0.23
0.28
0.34
0.42
Site 3
L
NL
0.05
0.22
0.29
0.37
0.47
0.04
0.23
0.30
0.38
0.48
Semi-Modular Network, Small et al Training Set
Perceptual Channel
Functional Channel
Site 1
Site 3
Site 2
Site 4
L
NL
L
NL
L
NL
L
NL
Common
Site 5
L
NL
0.02
0.04
0.05
0.07
0.11
0.02
0.10
0.15
0.21
0.25
0.02
0.04
0.05
0.08
0.13
0.02
0.06
0.08
0.10
0.13
0.02
0.06
0.07
0.10
0.13
0.02
0.03
0.04
0.05
0.08
0.02
0.02
0.03
0.03
0.04
0.02
0.04
0.04
0.05
0.06
0.02
0.03
0.04
0.05
0.06
0.02
0.10
0.15
0.21
0.26
Semi-Modular Network, F&M Ratio Training Set
Perceptual Channel
Functional Channel
Site 1
Site 3
Site 2
Site 4
L
NL
L
NL
L
NL
L
NL
Common
Site 5
L
NL
0.04
0.13
0.20
0.32
0.50
0.04
0.21
0.28
0.37
0.48
0.04
0.09
0.12
0.18
0.25
0.04
0.15
0.20
0.28
0.37
0.04
0.14
0.19
0.25
0.31
0.04
0.06
0.06
0.08
0.09
0.04
0.06
0.07
0.10
0.11
0.04
0.07
0.09
0.10
0.12
0.04
0.07
0.09
0.11
0.13
0.04
0.21
0.28
0.37
0.48
Notes: a) % refers to the percentage of connections removed at the relevant site during lesioning. b) scores are averaged over 6 network
‘subjects’; the standard errors of these means are all  0.01.
Four main points may be taken from these results. (1) The distributed network
trained on the Small et al pattern set showed no dissociations. Small et al’s claim that
latent structure in the training set may explain category specific deficits is not
supported by these data. (2) The distributed network trained on the F&M ratio
training set showed a single dissociation, in which knowledge of living objects was
more vulnerable to damage. This may reflect the additional hidden units used in this
network, or more likely, that the structure of the ratio pattern set itself leaves Living
patterns more open to damage. Specifically, Living patterns are much more similar to
each other over their perceptual features, and thus harder to distinguish in compressed
internal representations. (3) In the semi-modular network trained on the Small et al
pattern set, a double dissociation between knowledge of living objects and non-living
objects was found at high levels of damage. Damage to the perceptual channel
impaired Non-living more than Living. Damage to the functional channel damaged
Living more than Non-living. In this model, we see that damage to modular structure
did lead to a double dissociation of category specific knowledge, (albeit in the
opposite direction to Farah and McClelland’s hypothesis, since Living patterns
Figure 2. Increases in error with increasing lesions to network connections for the
semi-modular network trained on the F&M ratio training set. (a) lesions at Site 1,
in the perceptual channel; (b) lesions at Site 2, in the functional channel; (c)
lesions at Site 5, in shared representations.
(a)
(b)
Living vs Non-living
0.5
0.4
RMS Error
0.4
RMS Error
Living vs Non-living
0.5
0.3
0.2
Living
Non-living
0.2
0.1
Living
Non-living
0.1
0.3
0
0
None
30%
50%
75%
100%
None
30%
50%
75%
100%
Level of Impairment
Level of Impairment
(c)
Living vs Non-living
0.5
RMS Error
0.4
0.3
0.2
Living
Non-living
0.1
0
None
30%
50%
75%
100%
Level of Impairment
appeared to rely more on the functional channel than the perceptual channel).
However, these results come at high levels of damage at sites close to the input.
Performance after such damage reflects a fixed response, irrespective of varying
input, driven only by the resting activations of hidden unit and output units. Thus the
double dissocation reflects specialisation in the resting activation of the hidden units
in the perceptual and functional channels. (4) The semi-modular network trained on
the F&M ratio pattern set replicated the results of Farah and McClelland’s initial
study. Double dissociations were found between categories of knowledge when
modality specific structures were damaged. No such dissociation was found when the
common processing channel was damaged. The pattern of damage reflected the
asymmetry found in patient data, with Living patterns suffering most impairment. For
Living patterns, damage to the perceptual channel also impaired functional
knowledge, due to the common processing stage.
In summary, the results supported the idea that the pattern of impairments found in
patient data reflect an interaction between a modality specific modular structure, and
the latent structure of a particular problem domain. Averaged over populations,
distributed networks cannot demonstrate double dissociations.
2.3.3 Individual ‘case study’ results
In using multiple stochastic lesions to a given network, some lesions will cause
greater impairment than others. Moreover, some lesions by chance will lead to
greater impairment of Living patterns than Non-living patterns, and some to greater
impairment of Non-living patterns than Living patterns. If we allow ourselves to pick
out the maximal dissociations in each direction, as interesting ‘case studies’, we may
explore whether the findings for populations of ‘patients’ hold for individual cases.
Figure 3 shows plots of these selected dissociations from sites 1, 2, and 3 of the
distributed network trained on the Small et al pattern set. Picking any ‘patient’ from
the left hand column and any ‘patient’ from the right hand column, we may
demonstrate a double dissociation between the Living and Non-living categories of
knowledge. This characteristic emerged for all lesions to all networks and sites,
except for the semi-modular network trained on the F&M pattern set. In this latter
network, damage to the functional channel never caused Living patterns to be
impaired worse than Non-living; and in the perceptual channel, damage to Site 1
never caused Non-living patterns to be impaired worse than Living patterns.
Nevertheless, dissociations in both directions could be found in the common hidden
layer in the network. Figure 4 demonstrates that the ability to find dissociations
depends on the distribution of deficits that each model can generate under stochastic
lesioning of weights. Note that each such distribution is determined both by network
architecture and by the nature of the training set.
In sum, these results show that, thus far, our dismissal of distributed models as
candidates to generate double dissociations depends on averaging over populations of
lesions (equivalent to populations of patients). However, cognitive neuropsychology
traditionally relies upon single case studies rather than population data to deduce
functional structure. The results from this analysis suggest that such an approach
would not exclude distributed models as candidate functional structures, even in the
face of double dissociation evidence. It is therefore important for us to consider
whether the presence of stochastic double dissociations in these distributed models of
semantic memory is a central or irrelevant feature.
Figure 3. Selected ‘case studies’ from network lesions. All lesions are from the
distributed network trained on the Small et al pattern set. (a) and (d) are lesions at
Site 3, (b) and (e) are lesions at Site 2, and (c) and (e) are lesions at Site 1. The left
hand column plots the instances of a network lesion from each set of 1000 where
damage to Living patterns was maximally greater than that to Non-living patterns.
The right hand column plots the opposite dissociation.
(a)
(d)
Living vs Non-living
0.3
Living vs Non-living
0.3
0.2
RMS Error
RMS Error
0.2
0.1
0.1
Living
Non-living
Living
Non-living
0
0
None
30%
50%
75%
100%
None
Level of Impairment
30%
(b)
0.2
RMS Error
RMS Error
100%
0.1
Living vs Non-living
0.3
Living
Non-living
0.2
0.1
Living
Non-living
0
0
None
30%
50%
75%
100%
None
Level of Impairment
30%
50%
75%
100%
Level of Impairment
(c)
(f)
Living vs Non-living
Living vs Non-living
0.3
0.3
0.2
0.2
RMS Error
RMS Error
75%
(e)
Living vs Non-living
0.3
50%
Level of Impairment
0.1
0.1
Living
Non-living
0
Living
Non-living
0
None
30%
50%
75%
Level of Impairment
100%
None
30%
50%
75%
Level of Impairment
100%
Figure 4. Increases in error under 30% lesioning for Living and Non-living patterns,
for a 1000 lesions of a single network. (a) Site 2 in the distributed model trained on
the Small et al pattern set; (b) Site 1 in the perceptual channel of the semi-modular
network trained on the F&M ratio set.
(a)
(b)
Distributed network, Small et al
pattern set, Site 2, 30%
Semi-modular network, F&M ratio
pattern set, Site 1, 30%
0.2
Increased error on Non-living
Increased error on Non-living
0.2
0.1
0.0
0.1
0.0
0.0
0.1
Increased error on Living
0.2
0.0
0.1
0.2
Increased error on Living
3. Discussion.
The generation of double dissociations from distributed networks is an established
finding. Woods [17] and Sartori [18] have shown that lesions to small networks may
independently impair performance on separate patterns stored in the network.
However, these were simple theoretical demonstrations rather than models, and the
dissociations at stake here are not of individual patterns but categories. Nevertheless,
Shallice [19] and Plaut [20] have argued that individual dissociations of categories
like those found in the current set of models are artefacts of scale. In proper sized
models, such dissociations will disappear. For example, double dissociations in the
common layer of the semi-modular network would arise because, with only 24 hidden
units, certain units may end up playing significant roles in generating Living or Nonliving patterns. Chance elimination of connections from these important units to
outputs would cause dissociations. With many more hidden units, no significant roles
should be attributed to hidden units.
How many hidden units must a model contain? Juola and Plunkett [14] trained a
model with 130 inputs, 200 hidden units, and 160 outputs to inflect 3200 nouns and
verbs to plurals and past tenses. Using 1065 stochastic lesions per subject and
selected case studies, they found double dissociations between regular and irregular
inflections for both nouns and verbs. They also found dissociations between regular
noun plurals and irregular verb past tenses, regular verbs and irregular nouns, and
arbitrarily defined categories, although the distribution of deficits was different in
each case. Two hundred is thus still too few hidden units for a realistic model.
Bullinaria and Chater [15] examined such double dissociations between regular and
irregular mappings in three layer networks in some detail. Their view was that unless
many more hidden units were used than were necessary to learn the training set,
representations would not be “fully distributed” and spurious double dissociations
would be found. Given that the brain has many neurons and no a priori way to
establish the right number to use for a given problem, they suggested that realistic,
‘psychologically and biologically relevant’ models should employ large numbers of
hidden units. Bullinaria and Chater trained a network with 9 input units and 9 output
units to learn 256 mappings. Ten hidden units were sufficient to learn the problem.
Using 20 lesions per subjects, they determined that spurious double dissociations did
not disappear until 600 hidden units were employed. Using this multiple, Juola and
Plunkett’s model should have used 12,000 hidden units to avoid spurious
dissociations.
We would like to conclude with the following points. Firstly, the models used in the
current simulations are small and unrealistic. In this preliminary work, our aim was to
compare the claims of Small et al and Farah and McClelland as directly as possible.
In future work we will explore the implications of using many more hidden units and
where possible, more realistic training sets.4
Secondly, while it seems likely Bullinaria and Chater’s finding will extend to the
domain of models of semantic memory, it is worth pointing out that their findings
relate to quasi-regular domains. In models of such domains, single dissociations of
irregular patterns are easy to find, and damage to rules is characteristic of small
networks without “fully distributed” representations. In our study, double
dissociations were found between two sub-clusters of patterns distinguished by
different regularities.
Thirdly, Shallice, Plaut, and Bullinaria and Chater [19, 20, 15] argue that
distributed networks will not show double dissociations so long as they have fully
distributed internal representations. Bullinaria and Chater claim that realistic models
will have fully distributed representations because (i) they will have many hidden
units, and (ii) a learning algorithm will develop representations that are fully
distributed, so that hidden units will seek to participate in as many representations as
possible. However, Bullinaria and Chater do not justify (ii), and in fact there is
neurobiological evidence [21, 22] that argues against it. This work suggests that
neural representations are sparsely coded, so that units tend to participate in few
representations. If more realistic representations are indeed sparse, then clusters of
processing units may end up playing an exaggerated role in producing outputs
(particularly if they are organised topographically), thus increasing the likelihood of
atypical double dissociations in distributed networks. In short, the addition of more
units does not necessarily lead to more fully distributed representations.
Lastly, analyses such as those presented here suggest that in any stochastic lesioning
of a distributed or partially distributed system, a distribution of deficits will be
created. The exact nature of the distribution (and thus the likelihood of finding
chance dissociations) will depend on the architecture of the system and the
knowledge it is storing (as shown in Fig.4). Only in certain types of model will the
distribution be very tight (such as the ‘realistic’ models proposed by Bullinaria and
Chater which employ fully distributed representations). If various functional models
4
We are currently engaged in this work. Provisional results show that when the Small et al architecture
employs 1000 hidden units in each hidden layer, stochastic double dissociations between semantic
categories still occurred. Indeed we found no tightening in the distribution of deficits.
of the cognitive system which use distributed representations are distinguished by
their different distributions of deficits, then it might seem appropriate to test the
predicted deficit distributions of models against distributions of deficits in the
population of patients with similar lesions. We may draw two conclusions. The
current use of single case studies to derive functional structure in the cognitive system
relies heavily on a certain class of models turning out to be the correct one, namely
the class that generates very tight distributions of deficits, (tight enough, that is, such
that they rule out chance double dissociations). Secondly, if the correct models turn
out to be those that generate wide distributions of deficits, the relevant empirical data
to test competing models may be difficult to come by: we must find patients who have
similar damage, yet who can be diagnosed independently of their pattern of
behavioural deficits (because such models predict that the behavioural deficits may
vary widely). This would place a heavy onus on techniques that can diagnose patients
according to the type of damage they have sustained, before we can collect the
relevant empirical data to distinguish functional models.
References
1.
Small SL, Hart J, Nguyen T, and Gordon B. Distributed representations of semantic
knowledge in the brain: Computational experiments using feature based codes. In Reggia
J, Ruppin E, and Berndt RS, Neural modelling of brain and cognitive disorders. World
Scientific, 1996.
2. Farah MJ and McClelland JL. A computational model of semantic memory impairment:
Modality specificity and emergent category specificity. Journal of Experimental
Psychology 1991, Vol. 120(4), 339-357.
3. Warrington EK. Concrete word dyslexia. British Journal of Psychology 1981, 72, 175196.
4. Hart J Jr, Berndt RS, and Caramazza A. Category-specific naming deficit following
cerebral infarction. Nature 1985, 316, 439-440.
5. Sartori G and Job R. The oyster with four legs: A neuropsychological study on the
interaction of visual and semantic information. Cognitive Neuropsychology 1988, 5(1),
105-132.
6. Warrington EK and Shallice T. Category specific semantic impairments. Brain 1984, 106,
859-878.
7. Warrington EK and McCarthy R. Categories of knowledge: Further fractionation and an
attempted integration. Brain 1987, 110, 1273-1296.
8. Warrington EK and McCarthy R. Category specific access dysphasia. Brain 1983, 106,
859-878.
9. Hart J Jr and Gordon B. Neural subsystems for object knowledge. Nature 1992, 359, 6064.
10. Silveri MC and Gainotti G. Interaction between vision and language in category-specific
semantic impairment. Cognitive Neuropsychology 1988, 5, 677-709.
11. Basso A, Capitani E, and Laiacona M. Progressive language impairment without
dementia: A case with isolated category specific semantic defect. Journal of Neurology,
Neurosurgery and Psychiatry 1988, 51, 1201-1207.
12. Farah MJ, Hammond KH, Mehta Z, and Ratcliff G. Category-specificity and modalityspecificity in semantic memory. Neuropsychologia 1989, 8, 1-19.
13. Hinton GE and Shallice T. Lesioning an attractor network: Investigations of acquired
dyslexia. Psychological Review 1991, 98(1), 74-95.
14. Juola P and Plunkett K. Why double dissociations don’t mean much. In Gernsbacher MA
and Derby SJ (Eds.), Proceedings of the 20th Annual Conference of the Cognitive
Science Society. Lawrence Erlbaum Associates, 1998, 561-566.
15. Bullinaria JA and Chater N. Connectionist modelling: Implications for cognitive
neuropsychology. Language and Cognitive Processes 1995, 10(3/4), 227-264.
16. Hinton GE. Connectionist learning procedures. Artificial Intelligence 1989, 40, 185-234.
17. Wood CC. Variations on a theme of Lashley: Lesion experiments on the neural model of
Anderson, Silverstein, Ritz & Jones. Psychological Review 1978, 85, 582-591.
18. Sartori G. From neuropsychological data to theory and vice versa. In Denes G, Bisiacchi
P, Semenza C and Andrewsky E (Eds.), Perspectives in cognitive neuropsychology.
Hillsdale NJ: Erlbaum, 1988.
19. Shallice T. From neuropsychology to mental structure. Cambridge: Cambridge University
Press, 1988.
20. Plaut DC. Double dissociation without modularity: Evidence from connectionist
neuropsychology. Journal of Clinical and Experimental Neuropsychology 1995, 17(2),
291-321.
21. Baddeley RJ, Abbott LF, Booth MJA, Sengpiel F, Freeman T, Wakeman EA and Rolls
ET. Responses of neurons in primary and inferior visual cortices to natural scenes.
Proceedings of the Royal Society 1998, B, in press.
22. Rolls ET and Treves A. Neural networks and brain function. Oxford University Press,
1998.
23. Gonnerman LM, Andersen ES, Devlin JT, Kempler D, and Seidenberg MS. Double
dissociation of semantic categories in Alzheimer’s disease. Brain and Language 1997, 57,
254-279.
24. French RM and Mareschal, D. Could category-specific semantic deficits reflect
differences in the distributions of features within a unified semantic memory? In
Proceedings of the 20th Annual Conference of the Cognitive Science Society, 1998,
NJ:LEA, 374-379.