A Constructivist Neural Network Learns the Past Tense of English

In: Proceedings of the GALA ’97 Conference on Language Acquisition (1997), pp 393–398. Edinburgh, UK: HCRC
A Constructivist Neural Network Learns the Past Tense of English Verbs
Gert Westermann
Centre for Cognitive Science
University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW
[email protected]
Abstract
A constructivist neural network is presented that models the acquisition of the past tense of English verbs.
The network constructs its architecture in response
to the learning task, corresponding to neurobiological
and psychological evidence. The model outperforms
other connectionist and symbolic models in learning
and in displaying psychologically realistic learning
and generalization behavior. It is argued that the success of the network is due to its constructivist nature,
and that the distinction between fixed architecture and
constructivist models is fundamental. Given this distinction, constructivist systems constitute better models of cognitive development.
1. Introduction
The acquisition of the English past tense has in the
past years become a touchstone for different theories of language acquisition and of cognition in general. Different theories and models have not only been
used in the debate between proponents of symbolic
and connectionist accounts of language learning, but
have also raised the question whether regular and irregular past tense forms necessarily rely on different
mechanisms for their formation or whether a single
mechanism can account for both. While most connectionist accounts (Rumelhart & McClelland 1986,
MacWhinney & Leinbach 1991, Plunkett & Marchman 1993) generally argue that a homogeneous architecture is sufficient for both forms, hybrid theories (Pinker 1991) explain regular cases with a rule
and irregulars with an associative memory. There exist, however, modular connectionist (Westermann &
Goebel 1995) and homogeneous symbolic (Ling &
Marinov 1993) models of inflection acquisition.
What is common to most of these models is that
they rely on a fixed, pre-defined architecture, an assumption which, as argued below, is unrealistic and
poses severe problems for their usefulness as models of cognitive development. In this paper, a constructivist neural network for learning the past tense
is described that builds its architecture in response
to the learning task. The network is compared with
three other implemented models of past tense acquisition: the original pattern associator (Rumelhart & McClelland 1986, R&M), the improved backpropagation
network by MacWhinney & Leinbach (1991) (M&L)
which took the extensive criticism of the R&M model
into account, and the Symbolic Pattern Associator
(SPA, Ling & Marinov 1993), which took up the challenge posed by M&L to present an implemented symbolic system (rather than just a theory) for past tense
acquisition. It is shown that the constructivist neural network presented here outperforms the existing
connectionist and symbolic models both in learning
the task and in the display of psychologically realistic
learning and generalization behavior. It will argued
that such a model can help bridging the gap between
symbolic and connectionist, and modular and homogeneous theories of inflection acquisition.
The rest of this paper is organized as follows: in
section 2 the argument is made that constructivist development is a necessary condition for realistic models of cognitive development. In section 3 a specific
constructivist neural network algorithm, Supervised
Growing Neural Gas, is described that was used for
the simulations in this paper. Section 4 is concerned
with the experimental setup, and in sections 5, 6, and
7, the results of the simulations are analyzed with respect to learning, a U-shaped learning curve, and generalization performance, respectively. These results
are then discussed in section 8.
2. Why Constructivist Learning?
Cognitive development has recently been argued to
closely correlate with the structural development of
the cortex, with an increase in structural complexity
leading to an increase in cognitive capacities (Quartz
& Sejnowski 1998, Johnson 1997). In order to understand the principles of cognitive development it is
therefore important to take the mechanisms of brain
development into account. Recent work in this area
has provided evidence that the development of cortex is activity dependent on different levels (see e.g.,
Van Ooyen 1994): activity can determine the rate and
direction of dendritic and axonal growth and the formation of synapses (e.g., Quartz & Sejnowski 1998).
Stabilization and loss of these synapses is also activity
dependent (Fields & Nelson 1992). It has further been
shown in transplantation and rewiring studies that cortical areas are not innately prespecified to assume a
certain functionality, but readily adapt to process afferent signals from different domains (O’Leary 1989).
Further, cortex remains flexible to a certain degree
3. The Supervised Growing Neural Gas
Algorithm
There exist now a great number of constructivist neural network algorithms, most of which have been designed to overcome the shortcomings of fixed architecture networks (the need for choosing a predefined
architecture, slow learning time, uniform allocation of
resources for tasks of varying complexity). For the
cognitive simulations described here, a modified version of the Supervised Growing Neural Gas (SGNG)
algorithm (Fritzke 1994) was used because it incorporates constructive and regressive events which depend
on the learning task and because it provides mechanisms to produce outputs based on both the structure
and on the identity of input signals, conforming to
both neurobiological and psychological evidence.
Activation
throughout life, with dendritic density increasing until
a late age (Uylings et al. 1978). These results indicate
that neural development proceeds in a constructivist
way, with the neural organization of the brain being
modified through constructive and regressive events
by complex interactions between genetic predispositions and environmental inputs.
Cognitive development which is based on cortical development will thus proceed in the same constructivist way, where activity dependent architectural
modifications lead to increasingly complex cognitive
representations.
Most significantly, research in learning theory
(Baum 1989, Quartz 1993) has shown that incorporating activity dependent structural modification into
a learning system is not just a way to tune performance, but leads to entirely different learning properties of that system, evading many of the problems that
are associated with fixed-architecture systems. Such
constructivist systems can overcome Fodor’s paradox
(Fodor 1980) which claims that no new representations can be learned and thus argues for innate representations (Quartz 1993), and they can overcome the
prohibitive time complexity of even simple learning
tasks in fixed-architecture systems (Baum 1989).
Any model which aims to capture the essential
properties of human cognitive development must take
these results into account: cognitive models should
therefore employ neural networks which, like the
brain, adapt their architecture in a way specific to the
learning task. These models can be called constructivist networks, reflecting their proximity to the constructivist developmental theories of Piaget in which
structural modification of the learning system occurs
in response to environmental input.
In this paper a constructivist neural network model
is employed for the simulation of past tense acquisition and compared with previous past tense models.
This will allow to empirically assess the suitability of
constructivist networks in the modeling of cognitive
development.
Input x Position wc
Figure 1: A Gaussian activation function which acts as
a receptive field to near inputs (viewed from the side
and from the top).
The SGNG algorithm constructively builds the hidden layer of a radial basis function (RBF) network.
Such an RBF network is different from the more
common backpropagation networks in that the hidden units do not have a sigmoid but a Gaussian, ‘bellshaped’ activation function (see figure 1). This allows
each hidden unit to be active only for inputs within a
certain range (as opposed to being active for all inputs
above a certain threshold, as with sigmoidal units) and
it can thus be viewed as a receptive field for a region
of the input space. All input vectors obtain a position in this space (determined by their values), and the
hidden units are placed at different positions to cover
the whole space. Hidden units will be activated by an
input if it falls within their receptive fields, and the
closer the input is to the center of the field, the more
the unit will be activated.
The problem in building RBF networks is to decide
on the number and positions of the hidden units, because inputs falling into a common receptive field will
lead to similar outputs. The SGNG algorithm solves
this problem by building the hidden layer and adding
units when and where they are needed. The algorithm
starts with just two hidden units. When an input is presented to the network, the hidden unit which is closest to this input (i.e., the winning unit) together with
its direct topological neighbors are moved towards the
input signal – this prevents hidden units to remain in
regions of the input space where no inputs occur; hidden units which never win eventually die off. The
activation from the hidden units is propagated to the
output and the output error is calculated. This error
is added onto a local counter variable of the winning
unit, and the weights between the hidden and output
units are adjusted (e.g., with the delta rule). A new
hidden unit is inserted when the performance of the
network no longer improves in the current architecture (i.e., when the error decreases less than a certain
value within a certain number of epochs). The new
unit is inserted next to the hidden unit which has accumulated the highest error (note that only winning units
can accumulate error). The idea here is that a winning
unit which produces a high output error is inadequate
(because it covers inputs with conflicting outputs), and
more structural resources are needed in that area. On
insertion of a unit the sizes of the receptive fields are
shrunk so that they slightly overlap with each other;
this in effect leads to a more “fine-grained” resolution
in that area of the input space.
Initial
Final
00111100
0011
00111100
0011
0011
11001100
00111100 11001100
111
0011 000
111 000
00111100
00111100
00111100
Figure 2: Covering of the two-dimensional input
space by receptive fields at the beginning (left) and
the end (right) of learning.
Figure 2 shows a hypothetical start and end state in
a two-dimensional input space. While initially only
two receptive fields cover the whole of the space, at
the end hidden units have been inserted with different
densities across the space to account for the specific
learning task.
brought
b
r
O
t
111111
000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
000000
111111
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000001111111
111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
0000000
1111111
b
r
I
4. Experiments
For our simulations, we borrowed the corpus from
MacWhinney & Leinbach (1991) which consists of
1,404 stem/past tense pairs of English verbs. This corpus was also used by Ling & Marinov (1993) in their
SPA to allow direct comparisons between models.
The verbs were transcribed using UNIBET and, following MacWhinney & Leinbach (1991), represented
in a templated format containing slots for consonants
and vowels. Table 1 shows examples for the templated
phonological encoding of some verbs. Each phoneme
was represented by ten features, such as voiced, labial,
dental for consonants, and front, center, high for vowels. A template consisted of 18 slots, resulting in a
180-bit featurevector for the representation of each
verb.
Output Layer
(Template)
Hidden Layer with
Gaussian Units
111111
000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
000000
111111
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000001111111
111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
00000001111111
1111111
0000000
1111111
When similar verbs lead to similar outputs, however,
(e.g., look and cook with looked and cooked),
no new receptive field will be inserted there and one
such field will cover different verbs without producing output error. Thus, the internal structure of the
network will adapt to reflect the learning task, and observing this adaptation can lead to insights into the
past tense learning process.
The next section describes the specific simulations
that were undertaken with the SGNG network model.
Input Layer
(Template)
N
Input
bring
Template
explain
Template
point
Template
recognize
Template
shake
Template
br-I-N-----------CCCVVCCCVVCCCVVCCC
---I-ksp--l--e-n-CCCVVCCCVVCCCVVCCC
p--2-nt----------CCCVVCCCVVCCCVVCCC
r--E-k--I-gn-3-z-CCCVVCCCVVCCCVVCCC
S--e-k-----------CCCVVCCCVVCCCVVCCC
Output
brought
explained
pointed
recognized
shook
br-O-t-----------CCCVVCCCVVCCCVVCCC
---I-ksp--l--e-ndCCCVVCCCVVCCCVVCCC
p--2-nt-I-d------CCCVVCCCVVCCCVVCCC
r--E-k--I-gn-3-zdCCCVVCCCVVCCCVVCCC
S--U-k-----------CCCVVCCCVVCCCVVCCC
Table 1: Some examples for the template-encoding of
verb pairs.
bring
Figure 3: The initial SGNG network modified with
direct input-output connections. All layers are fully
connected.
Figure 3 shows the whole SGNG network. For
the cognitive simulations described here, the original
SGNG network was extended with direct connections
between the input and the output layer. These connections allow the past tense to be produced through a
direct structural transformation of the input stem. By
contrast, the (growing) hidden layer acts as a memory:
it produces an output based on the identity and not
the structure of the input verb. Initially, though, similar input verbs fall into the same receptive fields even
when they require different outputs (e.g., hear and
fear requiring heard and feared, respectively).
This problem is overcome in the training of the network through the insertion of new receptive fields in
the area of such verbs, and eventually similar verbs
with dissimilar past tense forms will be discriminated.
From the original corpus of 24,802 tokens, 8,000
tokens were randomly extracted according to the frequency of their past tense forms. The resulting training corpus thus consisted of 8,000 tokens (57.2% regular, 42.8% irregular), corresponding to 1,066 types
(88.4% regular, 11.6% irregular).
Training of the SGNG network proceeded in a nonincremental fashion: the whole training set of 8,000
stem/past tense pairs was presented to the network in
random order at each epoch. Hidden units were inserted depending on the learning progress (see section 3), and the network was tested for its performance
on the training set prior to each insertion.
5. Learning Results
After 912 epochs, the network produced 100% of the
irregular and 99.8% (all but two) of the regular past
tense forms correctly. At that point the network had
inserted a total of 400 hidden units. On average, there-
fore, each of the 400 hidden unit receptive fields covered 2.67 verbs.
Verb types
R&M
420
M&L
1,650
SPA
1,038
Constructivist
1,066
Percentage correct
Total
Regulars
Irregulars
97.0
98.0
95.0
99.3
100.0
90.7
99.2
99.6
96.6
99.8
99.8
100.0
Table 2: Performance on training of the four compared
models (extended from Ling & Marinov 1993).
Table 2 shows a comparison of the training results
of the different models. While all models performed
almost equally well on regular verbs, the constructivist network clearly outperformed the other models
on irregular verbs. This result indicated that the ability to add structure where needed was advantageous
in that it allowed the network to allocate resources
specifically for learning the irregular past tense forms.
This was confirmed by an analysis of the hidden layer:
while on average 2.7 regular verbs clustered to the
same hidden unit (with the maximum of 16 regular
verbs in one receptive field), this number was only 1.1
for irregular verbs. This result clearly shows the advantage of constructivist learning over the learning in
fixed-architecture systems: resources are not evenly
distributed a priori to handle all cases, but they can be
specifically allocated for the more difficult, or exceptional cases, whereas fewer resources are needed for
the easy, regular cases.
6. U-shaped Learning Curve
A plausible model of past tense acquisition should
follow the documented course of acquisition in children, that is, the U-shaped learning curve which has
been extensively studied (see e.g., Marcus et al. 1992):
while children initially produce a number of irregular verbs correctly, they subsequently overregularize
the same verbs and only in a final step produce them
correctly again. This phenomenon has theoretically
been explained with the inappropriate application of a
linguistic rule, but connectionist theories have argued
that it can arise due to regularities in a subtly changing
speech environment of the child.
The constructivist model described in this paper displayed a U-shaped learning curve for many of the irregular verbs: A period of overregularization was preceded by a phase of correct production of the past
tense form; this was the case e.g., for knew, sat,
made, took and said. Often, all forms were produced (irregular past, stem + regular ending, irregular past + regular ending; e.g., knew – knowed
– knewed – knew). For other, less frequent verbs
(e.g., wet, sell, cost), the first sampled past tense
form was overregularized. This corresponds to data
on overregularization in children (Marcus et al. 1992):
While in the corpora of Adam, Eve, and Sarah several
past tense forms were produced correctly before their
first overregularization, this was not always the case.
It is unclear, however, whether this is due to a lack of
speech samples, but presumably a child would overregularize an infrequent verb in its first usage.
The network displayed psychologically plausible
behavior in more specific aspects as well: on average,
the more frequent verbs were less often overregularized than less frequent ones. Further, clusters of irregular verbs acted as protection from overregularization:
the verbs ring, sing, and spring were overregularized only in 2.9% of all cases. By contrast, the
verbs hang, slide, and bear, which had a comparable token frequency, had an average overregularization rate of 15.4%.
The constructivist network model was thus successful in modeling the left side of the U-shaped learning
curve, i.e., a correct production of past tense forms
before their subsequent overregularization, and its performance corresponded to the details of children’s past
tense learning.
How does the U-shaped learning in the constructivist network occur? Since the verb set was held constant throughout training, the change in network performance could only be a consequence of the internal
reorganization of the network architecture. Initially,
the network had only two hidden units which were
of little use since they each covered about half of all
verbs with their varied past tense forms, and the network therefore had to rely on the direct input-output
connections for producing the past tense forms. Given
these restrictions the network initially learned to produce the past tense forms of the frequent irregulars
(because they were so frequent) and of many regular
verbs (because there were so many of them). When
learning stagnated and no more past tense forms could
be produced correctly, the network gradually grew its
hidden layer, adding more receptive fields which were
then used to account for more past tense forms. The
output forms were produced through a ‘collaboration’
of the direct connections with the newly established
hidden unit connections. The growing process lead
to a phase in which representations were already relocated to the hidden layer, but the few receptive fields
were large and included verbs that required different past tense forms, thereby leading to errors even
for verbs that had initially been produced correctly
through just the direct input-output connections. This
phase corresponds to the overregularization stage in
children. It is evident that with this mechanism, different verbs would be overregularized at different times,
depending on whether they had been allocated an individual receptive field. This process of internal reorganization of the network’s representations becomes
evident in figure 4, which shows the learning curves
for the regular and irregular past tense forms, but also
how many of these forms were still produced correctly
when the hidden layer was lesioned and only the direct
input-output connections were used.
10
9
8
human
constructivist
6
4
3
2
P
% correct
8
7
11001100
111
000
000
111
000
111
000
111
000
111
000
111
000
111
I
D
Irregulars
111
000
000
111
00111100111
000
000
111
11001100
111
000
000
111
6
5
111
000
000
111
000
111
4
3
2
1111
0000
0000
1111
0000
1111
111
000
000
111
111
000
000
111
000
111
1
0
P
I
D
Regulars
Figure 5: Generalization of the constructivist network
to different classes of pseudo-verbs, in comparison
with humans, the SPA, and R&M’s network (extended
from Ling & Marinov 1993). P = Prototypical, I = Intermediate, D = Distant.
40
regulars
regulars with lesioned hidden layer
irregulars
irregulars with lesioned hidden layer
20
9
R&M
111
000
000
111
000
111
0
60
10
SPA
000000000000000000000000000000
111111111111111111111111111111
111
000
000000000000000000000000000000
000111111111111111111111111111111
111
000
111
000000000000000000000000000000
111111111111111111111111111111
000
111
000
111
000000000000000000000000000000
0011111111111111111111111111111111
000
111
000
111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000
111
000000000000000000000000000000
111111111111111111111111111111
000
111
000
111
11001100
000
111
000
111
000
111
000
111
000
111
000
111
1
80
11001100
111
000
000
111
000
111
7
5
100
111
000
000
111
000
111
000
111
000
111
0
0
100
200
300
400
500
Epoch
600
700
800
900
1000
Figure 4: The learning curves for the regular and irregular past tense forms in the intact network and with a
lesioned hidden layer.
Initially, with few hidden units, lesioning the hidden layer did not lead to a strong decrease in the network performance: with or without the hidden layer,
initially about 20% of the irregular and 60% of the
regular past tense forms were produced correctly. As
the hidden layer grew, however, lesioning lead to a
marked decrease in performance, and at around epoch
200, when the network had constructed 91 hidden
units, deletion of these units resulted in no more irregulars and only 7.2% of the regular verbs to be produced correctly. This confirmed the collaboration of
the two pathways, direct input-output and via the hidden units, in producing the past tense forms of most
verbs, and it showed that even the representations of
initially correct verbs were transferred from the direct
connections into the growing hidden layer, leading in
many cases to the temporary wrong production of initially correct past tense forms. The internal reorganization of the network due to a constructivist adaptation
of its structure could thus account for the unlearning
of initially correct outputs and for the U-shaped learning curve in the acquisition of the English past tense.
7. Generalization
The network was also tested on its generalization to
pseudo-verbs. As Ling & Marinov (1993) pointed out,
testing the generalization ability of a model on existing verbs is misleading because irregular verbs are
by their nature unpredictable, and in line with Ling
& Marinov (1993) we therefore used the set of 60
pseudo-verbs which had been devised by Prasada &
Pinker (1993) and had been tested by them on human
subjects. These verbs consisted of blocks of ten which
were prototypical, intermediate and distant with respect to existing regular and irregular verbs.
The results of the generalization experiments are
shown in figure 5. The generalization performance of
the constructivist network was similar to that of human subjects for both regular and irregular cases. It
performed similar to the SPA, but better than the R&M
network model.
8. Discussion
The experiments reported here show empirically that
a constructivist neural network can model the acquisition of the English past tense more closely than other,
fixed-architecture networks. This is due to the fact that
a constructivist network is capable of adding structure
when and where needed, thereby adapting to the specific learning task, and to the resulting internal reorganization of representations which lead to the U-shaped
development which is also found in children’s learning. These results, together with those from learning theory (see section 2), indicate that constructivist
learning is superior to learning in fixed-architecture
systems. In fact this is also true for the symbolic
SPA: this model builds a decision tree in response
to the learning task and therefore also constitutes a
constructivist system. It is likely that the fact that
the SPA outperformed both R&M’s and M&L’s neural
network model is not based on it being symbolic, but
is due to its constructivist nature. It seems therefore
that the dichotomy constructivist/fixed-architecture is
more fundamental than the symbolic/subsymbolic distinction which previous past tense models have aimed
to emphasize. Direct comparisons between symbolic
and subsymbolic models can thus only be made fully
within or without the constructivist framework, and,
as seen in this paper, models within the constructivist
framework conform better to evidence from neural
and cognitive development.
Comparing the constructivist network with the constructivist symbolic SPA indicates, however, that the
network constitutes a more realistic psychological
model: it both learns better than the SPA and it explains the U-shaped learning curve more realistically.
In the SPA, U-shaped learning was achieved by the ex-
plicit manipulation of a learning parameter that controlled how many times a verb had to be seen to be
memorized as an exception – if it occurred less often, it was overregularized. Besides “hard-wiring” the
theory that children possess such a variable parameter and using the resulting U-shaped learning curve
as evidence for just that theory, leading to a circular
argument, this procedure also established an unrealistically direct relationship between the frequency of a
verb and its overregularization. In the constructivist
network, however, U-shaped learning arose as a direct
outcome of the learning algorithm due to the internal
reorganization of the network architecture.
The constructivist network contradicts the view that
connectionist learning implies a homogeneous architecture, which is often held for connectionist past
tense models. Although learning was based, as in conventional fixed-architecture networks, on the complex
interactions of many simple units and on the gradual adjustment of connection weights, the constructivist network developed a “pseudo-modular” architecture where more space was given to the harder,
irregular cases, and where a memory in the form of
hidden unit receptive fields developed in addition to
the direct input-output connections. Goebel & Indefrey (in press) and Westermann & Goebel (1995)
had shown how learning in (fixed architecture) modular connectionist networks modeled cognitive development more closely than homogeneous architectures,
and the present paper shows how a similar modular architecture can develop in a constructivist framework.
The results obtained with the present and with
previous past tense models thus suggest an extension of the common symbolic/connectionist distinction by the dimensions of modular/homogeneous and
fixed-architecture/constructivist. Given this threedimensional classification matrix, the present paper
indicates that connectionist modular constructivist
systems constitute the most realistic models of cognitive development in the child.
Future work will address an extension to the SGNG
algorithm: in its present form, it only learns to discriminate between similar inputs requiring different
outputs (such as hear and fear) but has in its hidden layer no mechanism of integrating different inputs requiring similar outputs, such as note and decide. Further research will also involve assessment
of the neurobiological plausibility of the constructivist
growth process and its modification to that end.
Such research might then contribute to the understanding of the connection between neural and cognitive development, an area which is currently only
beginning to be addressed.
9. Acknowledgements
This research was supported by the ESRC (award
no. R00429624342) and by the Gottlieb Daimler-und
Karl Benz-Stiftung (grant no. 02.95.29).
References
Baum, E. B. (1989), ‘A proposal for more powerful learning
algorithms’, Neural Computation 1, 201–207.
Fields, R. D. & Nelson, P. G. (1992), ‘Activity-dependent
development of the vertebrate nervous system’, International Review of Neurobiology 34, 133–214.
Fodor, J. (1980), Fixation of belief and concept acquisition, in M. Piattelli-Palmarini, ed., ‘On Language and
Learning: The Debate between Jean Piaget and Noam
Chomsky’, Routledge & Kegan Paul, London and Henley, pp. 143–149.
Fritzke, B. (1994), ‘Fast learning with incremental RBF networks’, Neural Processing Letters 1, 2–5.
Goebel, R. & Indefrey, P. (in press), The performance of
a recurrent network with short term memory capacity
learning the German s-plural, in P. Broeder & J. Murre,
eds, ‘Cognitive Models of Language Acquisition’, MITPress, Cambridge, MA.
Johnson, M. H. (1997), Developmental Cognitive Neuroscience, Blackwell, Oxford, UK; Cambridge, MA.
Ling, C. X. & Marinov, M. (1993), ‘Answering the connectionist challenge: A symbolic model of learning the past
tenses of English verbs’, Cognition 49, 235–290.
MacWhinney, B. & Leinbach, J. (1991), ‘Implementations
are not conceptualizations: Revising the verb learning
model’, Cognition 40, 121–157.
Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen,
T. J. & Xu, F. (1992), ‘Overregularization in language
acquisition’, Monographs of the Society for Research in
Child Development, Serial No. 228, Vol. 57, No. 4.
O’Leary, D. D. M. (1989), ‘Do cortical areas emerge from a
protocortex?’, Trends in Neuroscience 12, 400–406.
Pinker, S. (1991), ‘Rules of language’, Science 253, 530–
535.
Plunkett, K. & Marchman, V. (1993), ‘From rote learning to
system building: Acquiring verb morphology in children
and connectionist nets’, Cognition 48, 21–69.
Prasada, S. & Pinker, S. (1993), ‘Generalization of regular and irregular morphological patterns’, Language and
Cognitive Processes 8(1), 1–56.
Quartz, S. R. (1993), ‘Neural networks, nativism, and the
plausibility of constructivism’, Cognition 48, 223–242.
Quartz, S. R. & Sejnowski, T. J. (1998), ‘The neural basis of
cognitive development: A constructivist manifesto’, Behavioral and Brain Sciences 21.
Rumelhart, D. E. & McClelland, J. L. (1986), On learning past tenses of English verbs, in D. E. Rumelhart &
J. L. McClelland, eds, ‘Parallel Distributed Processing,
Vol. 2’, MIT Press, Cambridge, MA, pp. 216–271.
Uylings, H. B. M., Kuypers, K., Diamond, M. C. & Veltman,
W. A. M. (1978), ‘Effects of differential environments on
plasticity of dendrites of cortical pyramidal neurons in
adult rats’, Experimental Neurology 62, 658–677.
Van Ooyen, A. (1994), ‘Activity-dependent neural network
development’, Network 5, 401–423.
Westermann, G. & Goebel, R. (1995), Connectionist rules of
language, in ‘Proceedings of the 17th Annual Conference
of the Cognitive Science Society’, Erlbaum, pp. 236–241.