Full Text

Neural
Networks
PERGAMON
Neural Networks 13 (2000) 149–183
www.elsevier.com/locate/neunet
Contributed article
A neural network theory of proportional analogy-making
Nilendu G. Jani a, Daniel S. Levine b,*
a
b
Iconoci, Inc., 2528 Oak Brook Drive, Bedford, TX 76021-7223, USA
Department of Psychology, University of Texas at Arlington, 501 South Nedderman Drive, Arlington, TX 76019-0528, USA
Received 28 May 1999; received in revised form 29 November 1999; accepted 29 November 1999
Abstract
A neural network model that can simulate the learning of some simple proportional analogies is presented. These analogies include, for
example, (a) red-square:red-circle < yellow-square:?, (b) apple:red < banana: ?, (c) a:b < c:?. Underlying the development of this network
is a theory for how the brain learns the nature of association between pairs of concepts. Traditional Hebbian learning of associations is
necessary for this process but not sufficient. This is because it simply says, for example, that the concepts “apple” and “red” have been
associated, but says nothing about the nature of this relationship. The types of context-dependent interlevel connections in the network
suggest a semilocal type of learning that in some manner involves association among more than two nodes or neurons at once. Such
connections have been called synaptic triads, and related to potential cell responses in the prefrontal cortex. Some additional types of
connections are suggested by the problem of modeling analogies. These types of connections have not yet been verified by brain imaging, but
the work herein suggests that they may occur and, possibly, be made and broken quickly in the course of working memory encoding. These
working memory connections are referred to as differential, delayed and anti-Hebbian connections. In these connections, one can learn
transitions such as “keep red the same”; “change red to yellow”; “turn off red”; “turn on yellow,” and so forth. Also, included in the network
is a kind of weight transport so that, for example, red to red can be transported to a different instance of color, such as yellow to yellow. The
network instantiation developed here, based on common connectionist building blocks such as associative learning, competition, and
adaptive resonance, along with additional principles suggested by analogy data, is a step toward a theory of interactions among several
brain areas to develop and learn meaningful relationships between concepts. q 2000 Elsevier Science Ltd. All rights reserved.
Keywords: Analogy; Synaptic triads; Working memory; Associative memory; Adaptive resonance theory; Concept learning; Weight transport
1. Introduction
1.1. Analogy-making
The capacity for making and learning analogies is clearly
at the heart of advanced human cognitive capabilities and of
creativity (Holyoak & Thagard, 1995; Indurkhya, 1991;
Lakoff & Johnson, 1980). The ability to perform analogical
reasoning seems to arise earlier in child development than
was previously supposed, in some cases as early as three
years of age (Goswami, 1991).
Analogy simply means sameness or resemblance of two
objects or processes at some level of abstraction. Analogymaking is being able to see familiar things in a different
manner than usually, thus enabling some unfamiliar things
to look similar to some familiar ones. This process depends
on the ability to combine, split and rearrange existing
concepts, and on the ability to reason about relationships.
* Corresponding author. Tel.: 11-817-272-3598; fax: 11-817-272-2364.
E-mail address: [email protected] (D.S. Levine).
Analogical reasoning can take on an incredible variety of
forms, all of which have different developmental histories.
First of all, there is the classic type of analogy which dates
back to Aristotle and is familiar on college entrance examinations: the proportional analogy of the type “A is to B as C
is to ?.” Second, there are simple statements that something
is “analogous” to something else, without specifying the
nature of the analogy; this type of analogy is akin to a
metaphor. Finally, there is drawing of conclusions in one
domain based on results in another domain to which that is
analogous. If, for example, an analogy is drawn between an
atom (electrons surrounding a nucleus) and a solar system
(planets surrounding a sun), it may be possible to conclude
details about the atom from corresponding details about the
solar system.
In an attempt to circumscribe this problem, attention here
is restricted to the classical, or proportional, type of analogy.
Solving a proportional analogy “A:B < C:D” requires that
the organism, or network, characterize the relationship
between entities A and B, and in many cases also the relationship between entities A and C.
0893-6080/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved.
PII: S0893-608 0(99)00106-9
150
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
This poses a challenge for traditional connectionist
theories (for example, Grossberg, 1988; Hertz, Krogh &
Palmer, 1991; Levine, 1991; Rumelhart & McClelland,
1986), because many connectionist networks have used
some variety of a Hebbian learning rule, whereby an association between two entities is strengthened by co-occurrence. Strengthening or weakening of an association
typically gives no clue as to the nature of the association,
or the relationship between the two entities. For example, a
strong Hebbian connection between nodes representing
“Paris” and “France” tells us only that Paris and France
are somehow related: it does not tell us that Paris is a part
of France, that it is “in” France, or that it is the capital of
France.
This shortcoming of networks based on simple Hebbian
learning rules is one cause for the criticism leveled at
connectionist networks by some cognitive scientists such
as Fodor and Pylyshyn (1988) who claim that such networks
lack “compositionality” and “systematicity.” Hence, this is
one reason there have been few neural networks so far that
have simulated complex reasoning and inference processes.
Yet our brains constitute an “existence proof” that a network
constructed using non-linear dynamics, rather than heuristic
programming, can perform complex reasoning, including
analogical reasoning. This has inspired us to search for a
way to model analogy learning and solving by means of a
network incorporating principles that have already proved
successful in modeling a wide variety of other cognitive and
mental processes, including sensory perception (Grossberg,
1987; Grossberg & Mingolla, 1985), pattern categorization
(Carpenter & Grossberg, 1987; Cohen & Grossberg, 1987),
sequence learning (Bapi & Levine, 1994, 1997; Dehaene,
Changeux & Nadal, 1987), and motor control (Bullock &
Grossberg, 1988).
Some preliminary connectionist models of analogy have
already appeared. Mitchell (1993) developed a system
called COPYCAT, a hybrid of connectionist and symbolic
systems, to model proportional analogies involving strings
of letters of the alphabet. Hummel and Holyoak (1997)
developed a fully connectionist model for some semantic
analogies, based on a previous neural network model of
property binding (Shastri & Ajjanagadde, 1993). Both of
these networks have yielded valuable insights into the organization of the cognitive processes involved in analogymaking. The premise here, though, is that further advances
in connectionist modeling are needed to better approximate
the process by which humans actually perform analogyrelated tasks.
2. Experimental data
Today there is more psychological (Vosniadou & Ortony,
1989) and developmental (Diamond, 1991; Goswami, 1991,
1992, 1998; Mandler, 1990; Meltzoff, 1990; Simon &
Klahr, 1995; Simon, Hespos & Rochat, 1995) data available
than neurophysiological (Wharton and Grafman, 1998;
Wharton et al., 1998) data on analogy-making and related
cognitive skills. This is partly because of the difficulty
involved in studying the brain in vivo and localizing the
representation of concepts.
2.1. Psychological data
A review of data on the development of the analogymaking capacity in young children is given by Goswami
(1991). These data hint that the capacity for learning analogies occurs sooner in development than is often thought,
frequently between the ages of 2 and 3 years old. The
theories of Piaget (see for example, Gruber & Vonèche,
1995) hint that analogical reasoning should not become
well established until the stage of formal operational
thought, which Piaget believed did not start until about
11 years of age. Yet the data Goswami reviews shows that
while analogical reasoning ability increases steadily with
age, some proportional analogies that are sufficiently natural
can be solved in both semantic and pictorial domains by
many 4-year-olds.
The data that Goswami reviewed hint that the formal
operational development Piaget talked about facilitates
analogical reasoning but is not necessary for all cases of
it. Yet there are a variety of other factors mediating how
early children learn a particular type of analogy (either a
proportional analogy or an analogy between problem
domains). For example, if the type of associations being
mapped are relatively simple ones (e.g. “functional” ones
like shoes: feet, or antonyms like cold:hot), or if the knowledge domain is one with which the children are familiar, the
analogies are easier to learn.
Other data reviewed by Vosniadou and Ortony (1989)
hinted that young children do less well than older children
or adults on analogies mainly because of a lack of domain
knowledge, not because of an inability to reason relationally. In transferring properties from a source to target
domain, children are as apt to transfer relational properties
as descriptive ones. For example, when told that white blood
cells are like soldiers, they will not infer that these cells
wear uniforms, but might infer that the cells can die from
an infection.
Analogical thinking has also been found to occur in chimpanzees. Thompson, Oden and Boysen (1997) tested chimpanzees on a matching to sample task that required that the
animals learn relations among relations. Their results did
not support a previous conjecture that such relations could
only be learned by animals previously trained in language.
Rather, chimpanzees who had some experience with the test
apparatus and with abstract thinking, but not language per
se, could learn such relations readily. Yet there are also
interesting analogies that language-trained chimpanzees
make when defining words. For example, one languagetrained chimp described a cucumber as a green banana,
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
and another referred to an Alka-Seltzer as a listen drink
(Goodall, 1990).
All these developmental results lead to a metasuggestion
that analogical reasoning is not as sharp a break from lowerorder processes like inter-concept association as is often
thought. This suggests further that modeling of analogymaking may be able to utilize some of the same network
architectures involved in modeling lower-level thought,
with greater complexity and a few additional specialized
mechanisms.
2.2. Neurophysiological data
Preliminary results from physiological (PET) studies
(Wharton et al., 1998) implicate inferior frontal cortex and
inferior parietal cortex, in the left hemisphere, as brain
regions mediating analogy making. This is consistent with
the suggested role of prefrontal cortex and related circuitry
in working memory, reasoning, generating and searching
through alternatives (e.g. Fuster, 1997; Goldman-Rakic,
1987).
These PET results do not, however, yet provide enough
data to suggest brain pathways for analogical reasoning
processes. This means that models such as ours could
provide suggestions for cognitively significant pathways
involving parts of the prefrontal and parietal cortices.
Perhaps, as suggested elsewhere for other reasoning
processes (Levine, 1996), these pathways are hard wired
for general analogical capabilities, but the specific content
of analogies is learned through modifiable connections with
other parts of the cortex.
3. Previous models
There has been a variety of computational models
proposed for different aspects of analogy learning and
formation. Some of them have used a full neural network
structure, whereas others have combined partial neural
network (connectionist) realizations with elements of
symbolic programming from traditional artificial intelligence (Barnden & Holyoak, 1994; Blank, 1997; Burns,
1996; Hofstadter et al., 1995; Holyoak & Barnden, 1994;
Hummel & Holyoak, 1997; Long & Garigliano, 1994;
Mitchell, 1993; Plate, 1998). There is considerable literature on analogy models that are based on symbolic
programming alone, but in the interests of brevity any
approaches to analogy that do not include a neural
network component are not discussed here: for example,
models due to Vosniadou and Ortony (1989), Hammond
(1989), Jani (1991), Sun (1991), Gentner, Ratterman and
Forbus (1993), and Cook (1994) are not covered. Also,
these models have differed widely both in the general cognitive problems they were attempting to solve and in the
specific problem domains they were investigating.
151
3.1. Connectionist models of analogy
Several of the analogy models in the network literature,
starting with Barnden and Srinivas (1992), have been based
in problem domains that involve elaborate semantic structures and relationships between sentences. In particular, the
LISA model of Hummel and Holyoak (1997) relied on
previous models of the process whereby particular entities
were bound to particular roles in a sentence (Shastri &
Ajjanagadde, 1993). The LISA model was designed to
account for the two analogical processes of “access,” that
is, how potential analogs in both the source and target
domains are retrieved from memory, and “mapping,” the
working memory process by which relationships between
source and target elements are discerned.
Hummel and Holyoak’s model can account for various
psychological data on the differential factors influencing
access and mapping. For this reason it can reproduce
many characteristic human patterns of analogical inference,
such as learning close and natural analogies better than it
learns logically consistent but contrived analogies. The
limitations of this model are that it relies heavily on the
assumed previous learning of very high-level abstract
concepts. Also, its structure does not appear to be based in
any way on biologically realistic models of simpler mental
processes.
A variation of this type of semantic model is due to Plate
(1998). Plate used a form of holographic vector representations to map parts of sentences in a semantic source domain
to their closest analogs in the target domain. The changes
from one to the other are then treated as a mapping and
applied to other elements in the source domain.
Blank (1997) developed a model called Analogator that
can learn analogies between visual scenes including different geometric objects that could be light or dark. These were
not proportional analogies but questions such as “what is the
analog of Figure A in Domain B?” In other words, Analogator had to learn to distinguish figure from ground in a
novel visual situation, given a prespecified figure–ground
distinction in the source scene.
3.2. Hybrid models of analogy
Mitchell (1993) utilized COPYCAT, an architecture that
is part connectionist and part symbolic, to perform analogies
on a circumscribed domain—strings of letters of the alphabet. Characteristic transitions such as successor and predecessor mappings could be learned within this domain. Some
of the analogies COPYCAT was designed for were obvious,
such as “abc:abd < rst:?.” There were others that involved
generalizing mappings from one concept to a related one (a
process the author called “conceptual slippage”), such as
“abc:abd < kji:?,” or even “abc:abd < abbccc:?.” This
network, while severely domain-restricted, captured some
mappings that are characteristic of human analogical
152
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 1. Adaptive Resonance Theory (ART) network.
inference processes. Also, it found the same analogies difficult or ambiguous that human solvers do.
3.3. Summary of analogy models
Previous analogy models have differed enough in their
objectives that it is hard to compare them either with each
other or with the model presented here. All of them possess
detailed capabilities in particular problem domains that the
proposed model, which is not designed yet to deal with
those domains, lacks. On the other contrary, none of these
other models does what the model in here is designed to do:
learn some low-level analogies between simple, natural
concepts using generic architectural principles that have
previously been applied to lower-level mental processes
and to other aspects of concept formation, as discussed next.
3.4. Adaptive resonance models of concept learning
Neural network classifiers, such as Adaptive Resonance
Theory (ART) (Carpenter & Grossberg, 1987), Back Propagation (Rumelhart & McClelland, 1986; Werbos, 1993),
Self-Organizing Maps (Kohonen, 1984), and Brain-Statein-a-Box (BSB) (Anderson, Silverstein, Ritz & Jones,
1977), accept exemplar vectors or patterns as input defined
over different sensory modality fields and learn to put them
into an existing perceptual category or into a new category if
they do not belong to any existing category. A network that
has learned to classify fruits using their color, taste, and
shape would classify an input vector representing red,
sweet, and round as an apple, or yellow, sweet, and cylindrical as a banana. Conversely when asked what an apple is,
the network can readily describe it as red, sweet and round.
Our network is based partly on ART, with some additional processes that we believe facilitate analogy-making.
We utilize ART because it is one of the few neural network
architectures today that explicitly attempt to model all three
basic cognitive processes: sensation, perception, and attention. ART and its extensions are also better equipped than
other networks at functioning in a realistic environment, as
they make minimal assumptions about their inputs and
require minimal preprocessing. Their input patterns can be
either spatial with different intensities and orientations, or
temporal with asynchronous elements having different
duration.
ART (Fig. 1) consists of two kinds of nodes in two different layers: one representing sensory features such as red,
yellow, sweet, round, and cylindrical in the feature layer,
and the other depicting classes of perceptual objects such as
apple and banana in the category layer. Nodes in the category layer hold together their corresponding features
through self-organized learnable reciprocal connections
(also referred to as weights) between the two layers.
In ART, as in most other neural networks, short-term and
long-term memory involve different methods of pattern
storage. Short-term memory is achieved by transiently
increasing the activation of certain nodes in the feature
layer on receiving the input. Bottom–up signals from the
feature layer in turn transiently activate nodes at the category layer. These category nodes then compete via recurrent
lateral inhibition, and the input is tentatively interpreted as
being in the category coded by the node with largest activation. Then other processes, not essential to the current
discussion, are utilized to compare the input with a
previously stored category prototype to see if there is sufficient match to make this a permanent classification.
Long-term memory is achieved via changes in connections. A pattern is learned, or stored in long-term memory,
by gradually setting the strength of the mutual connections
between feature and category layers, over repeated exposures to various exemplars. Pattern formation takes place
during each presentation of every exemplar, according to
an associative learning rule that detects the co-occurrence
of specific pattern elements in terms of presence and
absence, or different proportions, of these elements, and
gains strength with the number of repetitions. Patterns are
considered as classified when input through bottom–up
(from feature to category layer) connections resonate with
top–down (from category to feature layer) expectancies. For
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
153
Fig. 2. Proportional analogy-making: task description.
example, after substantial exposure to various kinds of
apples, the network would unequivocally activate the node
for apple on receiving sensation in the feature nodes “red,”
“sweet,” and “round.” Conversely, on exciting the apple
node it would activate its corresponding features in an
anticipation of what the input should be at the feature
layer. If the input does not resonate with the top–down
expectations then a mismatch occurs, which in turn triggers
either the search for a new category node or the modification
of an existing category through further learning.
There are several noteworthy attempts at extending ART
to model higher level cognition. For example, Nigrin’s
(1993) model called SONNET simulates learning of synonyms, while Kant’s (1995) model called Categ_ART and
Kant and Levine’s (1998) model called RALF simulate rule
formation.
Yet analogy-making relies on a variety of cognitive
processes that are not captured by other recent extensions
of ART. These are processes involved in discerning and
mapping relationships and transitions between pairs of
concepts. Hence our network, described in the next section,
combines the basic building blocks of ART (including associative learning and lateral inhibition) with additional neural
structures that are specifically designed to model these types
of interconcept relationships.
4. Network architecture
Now consider the proportional analogies which are of the
form A:B < C:D, where A, B, C, and D could represent any
concepts. Typically this kind of analogy is posed as the
question: A is to B as C is to what? The answer, D, is arrived
at by applying or transferring the relationship defined over a
set of attributes of A and B to the relevant attributes of C.
The first step in producing the right answer to an analogy
question is finding and remembering the relationship
defined over a set of attributes of A and B. Rephrased this
means finding what transformed A into B. The second step
in analogy-making is applying or transferring the relationship to the relevant attributes of C. This can be rephrased as
asking what the outcome would be if the same transformation (as the one that took A to B) were applied to C. A more
detailed high-level description including other steps in the
proportional analogy-making task is given in Fig. 2.
The network proposed here accomplishes these steps in
analogy making by superimposing on the vector classification structure of ART some additional structure that incorporates relationships between attribute vectors that
represent concepts. A description of this additional structure
is now given.
4.1. Building blocks
The analogy-making neural network developed here is
based on the recurrent use of a few fundamental “building
blocks.” This design philosophy in cognitive and neural
modeling is also described in other books and articles
(e.g. Grossberg, 1982; Hestenes, 1998; Levine, 1991;
Nigrin, 1993). The first two building blocks are competitive-cooperative interactions (also known as lateral inhibition) and associative learning. These are already in
extensive use in modeling and are based on well-established
psychological and neurophysiological findings. The rest of
154
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Table 1
Cognitive functions and building blocks
Cognitive functions
Building blocks
Analogy-making and semantic
memory: relation learning,
transformation rule learning in
WM and rule application
Synaptic/conceptual triads
Non-Hebbian connections
Contextual modulation of weights
Weight transport
Perception: object recognition in
STM, applying categorization in
LTM, perceptual binding
Competitive–cooperative networks
Associative (Hebbian) learning
Adaptive resonance
Opponent processing
Sensation and attention: feature
and boundary detection, figure–
ground separation in STM
Competitive–cooperative networks
the blocks—relation learning, transition encoding, and rule
application—are proposed here for the purposes of the
analogy-making network, but could also potentially explain
some other cognitive and neural phenomena. These are
discussed in more detail, with generic equations, in the
next few subsections. The complete detailed analogymaking network equations are discussed later in Appendix
A.
Table 1 summarizes the utility of different building
blocks in modeling different cognitive phenomena. It
describes simpler basic cognitive functions at the bottom
and climbs up to more complex functions such as
analogy-making at the top.
4.2. Nodes, clusters, and layers
The analogy-making network is organized in five layers:
feature layer F1, category (or perceptual binding or HAS-A)
layer F2, abstract category (generalizations or IS-A) layer
F3, relation layer F4, and modulator (context or task-specific) layer F5. Each layer in turn is organized in clusters. Each
cluster consists of individual nodes representing concepts or
percepts of similar significance. Characteristic structure of
each layer is illustrated independently in Fig. 3. The first
two layers are carried over from standard ART networks.
The other three layers are additions made herein and not
present in ART.
The sensory feature layer F1 is divided into clusters of
nodes which detect specific classes of features. For example,
each node in the form cluster represents a distinct shape
such as round, cylindrical, or square. Each node in the
color cluster represents a distinct color such as red, yellow,
or orange. Each node in the taste cluster represents a distinct
taste such as sweet, sour, or tart. Each node in the word
cluster represents a distinct label such as “square-word” or
“apple-word.”
The category layer F2 is divided into clusters of nodes
which bind together separate attributes of coherent recognizable objects. This includes a cluster for fruits (binding
form, color, taste, and word) and one for geometric figures
(binding just form and word).
The abstract category layer F3 consists of nodes representing abstract generalizations such as color, taste, shape,
and fruit. Clusters represent types of edibles (fruit or vegetable) or of senses (color, form, taste, or word. Our network
reflects the fact that the nature of relations between layers F2
and F3 (IS-A) is not the same as between layers F1 and F2
(HAS-A). For example, while it is appropriate to think of an
apple as made up of “red,” “sweet,” and “round,” it is not
appropriate to think of a fruit as made up of “apple,”
“banana,” and “orange.”
The relation layer F4 has one cluster that consists of
relation nodes such as “has,” “is,” “forward” and “reverse.”
These nodes encode the nature of relationships between F3
and F2 and between F2 and F1. For example, the connection
between “apple” and “fruit” is mediated by the node “is,”
and the connection between “apple” and “red” by the
node “has.” (In ART, by contrast, all relations between
nodes in consecutive layers are implicitly of the nature
“HAS-A.”)
Another cluster in the F4 layer consists of relation nodes
representing generic transition categories: “activate,”
“suppress,” “maintain,” and “change.” These nodes
describe one role for working memory in proportional
analogy-making, namely, to temporarily remember the
changes in specific features in going from one input pattern
to the next. For example, in the transition from apple to
banana, “yellow” is activated, “red” is suppressed, “fruit”
and “color” are maintained, and “red” is changed to
“yellow.”
Finally, the modulator layer F5 has one cluster that
consists of nodes that represent transition stages in the
analogy task. For example, the node that represents the
transition “1–2” between first and second items modulates
working memory connections between attributes of item 1
and corresponding attributes of item 2. Another cluster of F5
consists of nodes that encode other types of markers for
particular contexts within the task, that also modulate specific inter-item connections. An example is the marker for a
situation that involves weight transport (see below).
Input patterns: As shown in Fig. 4, input pulses to the
network are represented as square blocks (indicating the
duration for which a particular item is presented to the
network). Combined with the exponential decay they
become “hat-shaped” pulse activities in the nodes. Duration
of each input pulse (on time) and length of an individual
presentation step are defined as in the bottom part of Fig. 4.
For simplicity, all input pulses are assumed to be of the
same duration.
The following sections rely on caricatures such as the
one given above to explain time courses of several variables on one diagram. These are idealized depictions
intended to help better understand the concepts
discussed here, and should not be interpreted as actual
network outputs.
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
155
Fig. 3. Types of nodes, clusters and layers: (A) Sensory feature layer F1, and clusters. (B) Category or binding layer F2, and clusters. (C) Abstract category layer
F3, and clusters. (D) Relation layer F4, and clusters. (E) Modulator (context or task specific) layer F5, and clusters. Input to the feature detector nodes in layer F1
represents an external pattern, whereas direct input to nodes in other layers represents an internally generated pattern. All layers allow working and long-term
memory modifiable weights within and across clusters. Short-term memory interactions are allowed only within individual clusters except in specific cases, for
example between form and word clusters in F1.
156
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 4. Input pulses and corresponding node activation.
4.3. Node interactions (weights)
The analogy-making network requires not only shortterm and long-term memory as modeled in ART-like
networks, but also semantic spatio-temporal pattern processing in both working and long-term memory. Tree diagrams
in Fig. 5 summarize all the different types of weights used in
the analogy-making network.
Non-specific working (Fig. 5C) and long-term (Fig. 5B)
memory connections are either spatial (that is, between
node activities at the same time) or temporal (that is,
between node activities at different times). Short-term
memory (Fig. 5A) connections are only spatial, whereas
task specific working memory (Fig. 5D) connections are
only temporal. Spatial connections are node-to-node and
obey Hebbian associative learning laws. Temporal
connections are either triadic (weight-to-node/node-toweight) or dyadic (node-to-node). Triadic connections
follow triadic learning laws (to be explained below).
Dyadic connections follows various learning laws that
are non-Hebbian in that they do not strictly detect simultaneity in node activities. Klopf (1988) and Kosko
(1986) have proposed similar non-Hebbian laws in their
models.
Decay rates are highest to lowest from short-term to
working to long-term memory. Working memory connections use both passive and active decay, whereas long-term
memory only uses active decay. Specifically, working
memory triadic weights follow additive learning laws.
That is, they gain strength during presentation of an item
but passively decay as soon as it is removed. Minai and
Levy (1993) have studied similar rapid single trial generalizations in hippocampus.
Interactions between the weights (to be described in
Section 4.4) are summarized along with the types of weights
in Fig. 5E.
4.3.1. Triadic temporal weights and relation learning
Concept triplets such as “apple, has, and red,” or “red, is,
and color,” along with their corresponding connections, are
described here as conceptual triads. Conceptual triads are
useful in learning the nature of relations. These structures
are similar in spirit to synaptic triads that are used in modeling learning of temporal sequences in bird songs by
Dehaene et al. (1987).
Fig. 6 illustrates a possible use of triads in modeling longterm semantic memory. Clockwise from the top-left quadrant, these triads can be read as “apple has red,” “red is a
color,” “color has instance red,” and “red is part of apple.”
Alternatively, they can be read as responses to the following
queries: (1) what color does an apple have? (red); (2) what is
red? (a color); (3) what is the relation between red and
color? (is); (4) what is the relation between apple and red?
(has).
Motivation for conceptual triads: To understand specific
connections and learning within a triad, consider the triad
“red, is, and color.” The first pair, “red and is,” can be
thought of as the query “red is ?” The network is expected
to produce “color” as the answer in this case. The second
pair, “red and color,” can be thought of as being asked “how
is red related to color?” In this case “red” followed by
“color” should produce the answer “is.” The third pair, “is
and color,” is not uniquely relevant to this triad, in that “is”
followed by “color” should not produce “red,” except in a
specific context. The relevant relationships are temporal and
directional. That is, simultaneous occurrence of “red” and
“color” should not produce “is,” and “color” followed by
“is” should not produce “red.”
Conceptual triads in our network follow a semilocal associative rule that facilitates learning the behavior just
described. As shown in Fig. 7, a triad consists of three
nodes and three connections (node-to-node, node-to-weight,
and weight-to-node). The first three subsections to follow
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
157
Fig. 5. Types of node connections: (A) STM interactions. (B) LTM weights. (C) WM weights. (D) WM task-specific modulator weights. (E) Weight
interactions.
explain the behavior of each component in a partial triad
(that is, a triad without the weight-to-node connection). The
last two subsections explain another partial triad (that is, a
triad without the node-to-weight connection). In the following discussion, the general roles attributed to the nodes B, A,
and C (refer to Fig. 7), are, respectively, source, relation and
target. The three connections wBC, wA,BC, and wBC,A, respectively, depict the weights source-to-target, relation-to{source-to-target} and {source-to-target}-to-relation. The
latter two weights obey triadic learning rules.
Node-to-weight connection: The first step in understanding the triadic construction is to see how the node-to-weight
connection is created. Imagine an associative connection
wAB 0 between nodes A and B 0 as shown in Fig. 7A. The
rate of change of wAB 0 is proportional to the product of the
activities of A and B 0 .
Now substitute for the activation of node B 0 the activation
of the entire associative node-to-node assembly consisting
of nodes B, C and weight wBC as shown in the dotted bubble.
This substitution is defined as the product of the activities of
158
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 6. Example of relation nodes.
B and C and the weight wBC. One way to think of this
definition is as the “energy level” of the entire node-tonode assembly. (This sort of substitutions can be carried
out recursively to build a network of triads, which lead to
representation of relation of relations.) The weight wAB 0 is
renamed wA,BC and increases in presence of A, B, C and wBC
but not otherwise:
dwAB 0
/ xA x 0B
dt
(
where :
dwA;BC
/ xA …wBC xB xC †
dt
x 0B ; wBC xB xC
wAB 0 ; wA;BC
…1†
…2†
This type of learnable connection weights from one node
to a weight between two nodes is rarely seen in neural
network models because there does not seem to be a biological basis for such a connection at the neuronal level.
However, we can suggest a more biologically plausible
mechanism whose mathematical dynamics approximate
those of Eq. (2) for our node-to-weight connection. As
shown in Fig. 8, our suggested mechanism involves adding
to the network of nodes A, B, and C an axon collateral to an
inter-neuron and some gating interactions between nodes.
Guigon, Dorizzi, Burnod & Schlutz (1995) included in their
model of sequence learning in the prefrontal cortex some
matching neurons (we rename them matching nodes) that
combine inputs from two sources via multiplicative gating.
Guigon et al. review evidence for such multiplicative
combinations occurring in various higher-order sensory
and motor areas of cortex, such as multiplication between
arm position and visual trajectory in the motor and premotor
cortex (Burnod, Grandguillaume, Otto, Ferraina, Johnson &
Caminiti, 1992). As explained in the caption of Fig. 8, the
node xM2 has dynamics that approximate those of the nodeto-weight connection.
The weight-to-node connection (see later in this section),
weight transport (see Section 4.4), and inter-weight competition (also see Section 4.4) can also be approximated using
suitable combinations of collateral pathways and matching
nodes. For ease of exposition, though, we are using the
shorthand of representing such complex networks by direct
interactions between weights.
As discussed earlier, a triad typically involves temporal
connections from nodes B to C. Specifically the activity in
node B is at a previous time. Substituting “red” for B, “is”
for A and “color” for C, the node-to-weight connection
wA,BC represents the event “red” followed by “is” and
“color.” wA,BC gains strength only when this precise event
is repeated. That is, it will not gain strength when “red” is
followed by either “color” or “is” alone. Nor will it
strengthen when both “is” and “color” are simultaneously
present but without “red” in the previous time step.
Node-to-node connection: To understand the behavior of
the node-to-node connection in a triad, imagine an associative connection wBC between nodes B and C as shown in Fig.
7B. Particularly note the peculiar placement of node A 0
beside the weight wBC. The rate of change of wBC is in this
case proportional to the product of the activities of B and C
plus the activity of A 0 .
Now replace the activity of node A 0 with the activity of
the assembly consisting of node A and weight wA,BC as
shown in Fig. 7B in the larger dotted bubble on the right,
and defined as the product of the node activation xA and the
weight wA,BC. The assembly activation can again be thought
of as representing the assembly’s “energy level.” The
weight wBC can now be seen as receiving contribution
from node A via wA,BC as well as from the simultaneous
activities in B and C:
dwBC
/ xB xC 1 x 0A
dt
where : x 0A ; wA;BC xA
dwBC
/ xB xC 1 wA;BC xA
dt
…3†
…4†
The additive contribution of the assembly node A and
weight wA,BC is important in working memory encoding.
When the node-to-node connection wBC follows an additive
learning law while the node-to-weight connection wA,BC
observes a multiplicative learning law, wBC is learnt and
forgotten as soon as the inputs to B and C are removed,
whereas wA,BC is remembered for a longer time. The utility
of this becomes clear in the following working memory
scenario: replace B with “red” at the previous time step, A
with “1–2,” and C with “yellow.” The function of “1–2”
(A) and its triadic connection wA,BC is to remember for
future use what happened in moving from item 1—”red”
(B) at the previous time step—to item 2—“yellow” (C),
while the actual connection from “red” to “yellow” (wBC)
is forgotten after “yellow” is removed. Suppose that the
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
159
Fig. 7. Conceptual triads: (A) Node-to-weight connection, wA,BC. (B) Node-to-node connection, wBC. (C) To-node activation, xC. (D) Weight-to-node
connection, wBC,A. (E) Relation node activation, xA. (F) Summary.
triadic weight from “1–2” has learnt this transition and at
some later time the node “1–2” becomes active again. At
this point the node “1–2” starts contributing to the weight
“red-to-yellow” via its triadic connection, helping that
weight gain strength even when neither “red” nor “yellow”
is present. The effect of this is to prepare the weight “red-toyellow” in anticipation of “red” becoming active soon. If that
happens then the weight “red-to-yellow” will immediately
160
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
assembly here is presynaptic as opposed to postsynaptic in
the previous case. Learning in this weight can be interpreted
as the detection of the simultaneity in activation of the
assembly (the dotted bubble in Fig. 7D) and the node A.
The motivation behind connections such as wBC,A is to learn
events for example, “red” (B) followed by “color” (C) and
“is” (A).
( 0
x B ; wBC xB xC
dwB 0 A
0
/ x B xA
where :
…7†
dt
wB 0 A ; wBC;A
dwBC;A
/ …wBC xB xC †xA
dt
Fig. 8. One possible way to approximate effects of a node-to weight connection by the activity of a node, xM2 in the figure. xC 0 is the terminus of an axon
collateral and has activity proportional to xB times the weight wBC. xM1 is a
matching node (see text) that multiplicatively gates inputs from xC 0 and xA.
xM2 is another matching node that in turn gates inputs from xM1 and xC;
hence its activity is proportional to xC 0 xM1 / xC 0 xA xC / xA …wBC xB xC †
(with suitable time delays) as in Eq. (2) of the text. This node modulates
the weight wBC.
follow “red” up with “yellow.” In doing so the triad would
have successfully reproduced at a later time what it had experienced previously in going from item 1 to 2.
To-node activation: To understand the behavior of the
“to-node” activation, imagine an associative connection
w 0 BC between nodes B and C as shown on the left in Fig.
7C. The rate of change of the node activity C is proportional
to the product of w 0 BC and the sigmoid of node activity B.
Now replace the weight w 0 BC with the assembly consisting of node A, node-to-weight weight wA,BC and node-tonode weight wBC as shown in the dotted bubble in Fig. 7C,
and defined as the product of weight wA,BC, sigmoid of node
activity A, and weight wBC. Node C can now be seen as
receiving contribution from node B via this assembly.
This means that the rate of change in node activity C is
proportional to the product of sigmoidal node activities of
A and B, and weights wA,BC and wBC:
dxC
/ w 0BC f …xB †
dt
where : w 0BC ; wA;BC f …xA †wBC …5†
dxC
/ wA;BC f …xA †wBC f …xB †
dt
…6†
Substituting “red” for B, “is” for A and “color” for C, and
considering the temporal nature of the connection wBC as
discussed earlier, the behavior of the “to-node” just
described leads to the desirable activation of “color” when
“red” is followed by “is.”
Weight-to-node connection: The construction of the triadic weight-to-node connection in Fig. 7D is similar to the
construction of the node-to-weight connection described
earlier. The behavior of this connection is the same as the
other, except for the change in direction. That is, the
…8†
Relation node activation: Activation of the relation node
A is aptly understood by replacing B 0 in Fig. 7E with the
assembly in the dotted bubble. Node activity A is proportional to the product of the presynaptic assembly activity
and the weight wBC,A. The behavior results in the activation
of “is” when “red” is followed by “color.”
( 0
f …x B † ; wBC f …xB †f …xC †
dxA
0
/ wB 0 A f …x B †
where :
dt
wB 0 A ; wBC;A
…9†
dxA
/ wBC;A wBC f …xB †f …xC †
dt
…10†
All these relations are summarized in Fig. 7F.
4.3.2. Transition learning and dyadic temporal weights
The analogy network makes use of various types of
dyadic (node-to-node) temporal connections, which employ
non-Hebbian associative rapid learning laws that detect and
compare the activation of nodes across different time steps.
These laws make use of temporal integration (or a moving
average) to represent past activity and to accommodate
asynchronous inputs. The integration operation spreads
out the node activities over a time window. A significant
value for a temporally integrated node activity indicates that
the particular node was active sometime in the near past.
The comparison of past and present is facilitated by quenching and saturating the node activities with respect to certain
thresholds.
The following defines the operations of quenching,
saturation, inversion, temporal integration, and differential
activation. Then we introduce each non-Hebbian learning
law that specifically encodes a particular kind of transition—activation, suppression, maintenance, or change.
Quenching, saturation, and inversion: Quenching sets the
node activity to a pre-specified value if it is below a quenching threshold. Otherwise, the node activity is not changed.
Saturation sets the node activity to a pre-specified value if it
is above a saturation threshold. Otherwise, the node activity
is not changed. Inversion is defined by quenching and saturating node activity, then subtracting this activity from a prespecified value. In Fig. 9A, the first row shows two “regular”
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
161
Fig. 9. (A) Quenching: r…x; u r ; c r † ˆ x if x $ u r ; c r otherwise; Saturation: j…x; u j ; c j † ˆ x if x , u j ; c j otherwise; Quenching and saturation: x^ ˆ
j…r…x; u r ; c r †; uj ; c j †; where: u r ˆ uj ; c r , c j ; Inversion:
^ c h † ˆ ux^ 2 c h u where: c h ˆ c j : Here u s represent thresholds and c s represent
Rt 2 ton x~ ˆ h…x;
^ c h † ˆ
u r ; c r †; u j ; c j † where: u r ˆ u j ; x~ ˆ h…x;
node activity levels. (B) Temporal integration: x ˆ … t 2…t on 1 tavg † x^ dt†=tavg where: tavg . ton ; x^ ˆ j…r…x;
^
ux^ 2 c h u: (C) Differential activation: x_ ˆ ux^ 2 xu:
consecutive node activity pulses. The second row shows the
effects of applying both quenching and saturation to the node
activity pulses in the first row. The third row shows the effects
of applying inversion to the pulses in the second row.
Temporal integration: Temporal integration (or moving
window average) is shown in Fig. 9B. It is defined in terms
of a definite integral over the window starting at current
time minus the on time of the pulse minus the length of
the averaging window, and ending at current time minus
the on time of the pulse. That is, the only activities being
averaged are those before the start of the current pulse. This
prevents the current pulse from being considered as past
activity. The duration of the averaging window determines
how much past is considered relevant. By averaging the
quenched and saturated node activity, it becomes easier to
compare the past activity with the present. In Fig. 9B, the
first row shows two quenched and saturated consecutive
node activity pulses. The second row shows the effects of
applying temporal integration to the node activity pulses in
the first row. The third row shows the effects of applying
quenching and saturation to the pulses in the second row.
The fourth row shows the effects of inversion on the pulses
in the third row.
Differential activation: Differential node activation is
defined as the absolute difference between the quenched,
saturated and temporally integrated node activity and the
quenched and saturated “regular” node activity. This essentially represents the comparison of the past node activity
with the present. In Fig. 9C, the second row shows two
quenched and saturated consecutive node activity pulses.
The first row shows the effects of applying temporal integration, quenching and saturation to the node activity pulses
in the second row. The third row shows the absolute difference between the first and second rows.
Motivation for non-Hebbian associative rapid learning
laws: Consider a network that has built-in sensors for red,
yellow, round and cylindrical, but has no prior experience
with apples or bananas, and has learned the generalizations
such as color and shape (but obviously not fruit). Suppose
that network is exposed for the first time ever to an apple
and after some time to a banana. Even if the network has not
learned the individual concepts of either apple or banana, it
should still be able to interpret the transition from apple to
banana in terms of temporal changes in individual features.
That is, the network should be able to learn local individual
transitions in each modality cluster, such as, “red is
suppressed,” “yellow is activated,” “color-instance is changed from red to yellow,” “round is suppressed,” “cylindrical
is activated,” and “shape-instance is changed from round to
cylindrical.” This kind of learning is characteristic of working memory, which includes both retrieval of semantic
stores and encoding of new relations that are most pertinent
to the present context (Baddeley, 1986; Banquet, Gaussier,
Contreras-Vidal, Gissler, Burnod & Long, 1998).
Transition categories: The analogy-making network
models four kinds of generic working memory transitions:
maintenance, activation, suppression, and change. Each one
of these transitions is encoded in a separate working
memory dyadic temporal connection. The first three
transition types are self-weights, that is, they encode
transitions for the same node. The last type, change,
is encoded in weights between different nodes within the
same layer.
Relation nodes (including the four generic transition category nodes) in layer F4 mediate dyadic temporal connections through triadic learning laws. Learning in these triadic
weights is enabled during every transition by activating the
generic transition category nodes.
162
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 10. Transition categories and corresponding conceptual triads: (A) Connections for sameness (maintenance). (B) Time course for sameness. (C)
Connections for activation (turning on). (D) Time course for activation. (E) Connections for suppression (turning off). (F) Time course for suppression.
(G) Connections for change. (H) Time course for change. Here v represents working memory weights, t represents triadic weights, m represents sameness, o
represents activation, s represents suppression and x represents change. The time courses of learning in node-to-node temporal connections are similar but not
shown here.
Maintenance: The first transition category is “maintenance,” or detection of what remained the same from
previous to current presentation. An example is detecting
that “red” is on during the presentations of both “apple” and
“red,” in the event “apple followed by red.”
Fig. 10A shows a conceptual triad consisting of node
activity at previous presentation (source), node activity at
present (target) and the “maintenance” node (relation). It
also shows a dyadic temporal weight between source and
target nodes, which is mediated by two temporal triadic
weights connecting it to the relation node. All three weights
follow a delayed Hebbian learning law. That is, the rate of
change in weights is proportional to the product of the
previous node activity with the present. The weights gain
strength only when a node is on during both its past and
present, but not otherwise. The formulations are based on
the generic triadic formulations introduced earlier (see Figs.
9 and 10 and their captions for definitions of the following
terms):
dwvt
m
^
/ x^m …wvm x^x†
dt
…11†
dwvm
/ x^x^ 1 wvt
m x^m
dt
…12†
In Fig. 10B, the second row shows two quenched and
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
163
Fig. 11. Task-specific rule learning and application: (A) 1–2 to transition weight connections. (B) Time course for 1–2 to transition weight connections. Here
vt
vt
wvt
1–2 represents the triadic working-memory weight from task-specific node 1–2 to a non-task-specific triadic working-memory weight w : Learning in w1–2
takes place only when there is activation in task-specific nodes 1–2 …x^1–2 † but no activation in 3–4 …x~3–4 †. 1–2 receives input when item 2 is presented and item
4 is expected, but not any other time. In contrast 3–4 receives input only when item 4 is expected.
saturated consecutive node activity pulses. The first row
shows the effects of applying temporal integration, quenching and saturation to the node activity pulses in the second
row. The third row shows the quenched and saturated pulse
activities in the “maintenance” relation node. The fourth
row shows the pulse-like learning in the triadic connections
(node-to-weight and weight-to-node) only when there is all
three: past, present and “maintenance” node activity.
Activation: The second transition category is “activation”, or detecting what turned on during the current presentation that was off before. An example is detecting that
“circle” turns on when “red circle” follows “red square.”
All weights in an “activation” triad, shown in Fig. 10C,
observe a delayed anti-Hebbian learning law. That is,
their rate of change is proportional to product of the inverted
past activity and the quenched and saturated current activity.
dwvt
o
^
/ x^o …wvo x~x†
dt
…13†
dwvo
/ x~x^ 1 wvt
o x^o
dt
…14†
In Fig. 10D, the first row shows inverted past activity. The
second row shows present node activity. The third row
shows the quenched and saturated pulse activities in the
“activation” relation node. The fourth row shows the
pulse-like learning in the triadic connection.
Suppression: The third transition category is “suppression,” or detecting what turned off during current presentation that was on before. An example is detecting that
“round” turns off when “red” follows “apple.” All weights
in a “suppression” triad, shown in Fig. 10E, observe a different form of delayed anti-Hebbian learning law. That is, their
rate of change is proportional to product of past activity and
inverted current activity:
dwvt
s
~
/ x^s …wvs x^x†
dt
…15†
In Fig. 10F, the second row shows the inverted present
activity. The first row shows the past activity. The third row
shows the quenched and saturated pulse activities in the
“suppression” relation node. The fourth row shows the
pulse-like learning in the triadic connection. Note that learning in the triadic weights only occurs in the presence of all
three activities: past, inverted present and “suppression”.
Change: The fourth transition category is “change,” or
detecting a change of instance within the same category. An
example is detecting that color changes from “red” to
“yellow” when “banana” follows “apple”.
All weights in a “change” triad, shown in Fig. 10G,
observe a differential Hebbian learning law. That is, their
rate of change is proportional to product of the differential
activities of two different nodes. Further, the differential
activity of the source node is multiplied by its past activity,
while the differential activity of the target is multiplied by
its present activity. This ensures that while the activity in
source node is turning off, the activity in the target is turning
on:
dwvt
x
/ x^x wvx …x^A x_A †…x^B x_B †
dt
…16†
dwvx
/ …x^A x_A †…x^B x_B † 1 wvt
x x^x
dt
…17†
In Fig. 10H, the first three rows, respectively, show
source node A’s past, present, and differential activities.
164
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
The next three rows show similar activities for target node
B. The seventh row shows the “change” relation node activity. The last row shows the pulse-like learning in the triadic
connection. Learning in the triadic weights happen only
when there is activity in the “change” relation node.
weight interactions: weight transport and competition
between weights. Competition in turn takes two forms: a
non-specific kind between three self-weights and a definite
kind in which “change” weight competes with “activation”
and “suppression” self-weights.
4.3.3. Task specific rule learning and application
Generic transition rules are remembered in working
memory weights only for the duration of the current item.
That is, they capture the transition from the previous to the
current item only while the current item is present.
Task-specific nodes such as “1–2” and “3–4” prevent
forgetting of the transition after items are removed, and
still allow single trial transition rule learning. These nodes
and their connections remember item to item transitions
beyond single steps. For example, node “1–2” remembers
the transition from item 1 to 2, even after item 2 is removed.
As illustrated in Fig. 11A, task-specific nodes have triadic
weights to all working memory temporal connections which
include the triadic weights to relation nodes such as “maintain,” “activate,” and so on. These weights obey a multiplicative learning law with an active decay. That is, even
after the working memory weights are forgotten, these triadic weights continue to remember particular transition rules
to be applied during a later time. Although these weights
learn in a similar manner to long-term memory weights,
they unlearn more rapidly by using higher decay rates.
That is, at a later time when there is activity in “1–2” but
no corresponding activity in the original target assembly,
triadic weights from “1–2” quickly forget earlier associations and learn newer ones. This makes it possible for a
network not to be entrenched in the past.
Input to these nodes is task-specific and only during
specific item presentations. For example, node “1–2”
receives input when item 2 is presented and item 4 is
expected, but not any other time. In contrast, node “3–4”
receives input only in the latter case.
The triadic weight w1–2,{A,BC} from “1–2” to another
working memory triadic weight wA,BC is shown in Fig.
11A. Learning in this weight is only enabled when “1–2”
is on but not “3–4.” That is, when both “1–2” and “3–4” are
on, this weight only facilitates rule application but no learning:
4.4.1. Weight transport
Weight transport is suggested here as a mechanism for the
effect analogical transfer in analogy-making. That is, in
some analogies the transition from item 1 to item 2 be not
applied literally to item 3 but generalized.
Weight transport is controversial in neural network
research because of the difficulty involved in implementing
such a mechanism in networks in a biologically plausible
manner. However, like the node-to-weight and weight-tonode connections discussed in Section 4.3, weight transport
might be approximated through suitable local networks that
include “matching nodes” that multiplicatively gate inputs
from different sources (Guigon et al., 1995; Levine, 1996;
see Fig. 8). Also this kind of weight interaction is distantly
analogous to what occurs in back-propagation networks
(Rumelhart & McClelland, 1986). Levine (1996) has speculated about the utility of weight transport in explaining the
role of prefrontal cortex in drawing inferences about relationships among abstract concepts.
Motivation for weight transport: To understand why
weight transport is needed in analogy-making and how it
works, consider the typical learning and recognition
episodes between parents and verbal children. During one
episode the parent shows the child a set of objects and while
pointing at an individual object, utters the word describing
that object. For example, on pointing to an apple the parent
utters the word “apple.” The child as expected imitates the
parent by repeating the same word after the parent. The
underlying rule here is “to verbalize the object descriptor.”
Now consider a later episode in which the parent is showing the same set of objects but utters words corresponding to
the objects’ colors. For example, on showing an apple, the
parent utters the word “red.” As before the child imitates the
parent and says the word “red.” Following this, the parent is
now pointing to a banana. Suppose that at this moment the
child spontaneously generates the response word “yellow”
(even before the parent utters it), and not the word “banana.”
That is, the child had figured out the new rule for the current
episode without any more repetitions: “verbalize that
object’s color.”
Such episodes are also common even in preverbal children and may occur spontaneously without intentional
parental interaction or supervision. This phenomenon is
known as “deferred imitation” in developmental psychology
(Mandler, 1990).
To understand how such rapid rule generalization and
application becomes possible, break “verbalize the object’s
color” rule in two: first “find the object’s color” and then
“verbalize it.” The first rule, finding the color, can be reinterpreted as “maintaining the same color” from the time the
dwvt
1–2
/ x~3–4 x^1–2 …x^A wvt †…x^B x^C wv †
dt
…18†
In Fig. 11B, the first row shows quenched and saturated
pulse activities in the “1–2” node. The next row shows
similar activity for the “3–4” node. The third row shows
pulse-like learning in the working memory triadic weight.
The fourth row shows the learning in the “1–2” triadic
connection.
4.4. Types of weight interactions
The analogy-making network makes use of two types of
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
165
Fig. 12. (A) Weight transport. (B) Time course for weight transport. Here x^z represents activity in task-specific weight-transport node that is turned on during
3–4 transitions and kept off otherwise. That is, this node enables weight transport when item 4 is expected. Weight transport is restricted to temporal working
memory self-weights; specifically only to maintenance and suppression weights.
object is shown till the time its color is actually found.
In case of “apple” followed by the word “red,” this can
be stated as “maintaining (or keeping) color red alive”
at least prior to the verbalization. Now consider the
application of this rule, which is a “transport” to other
color instances, for example to yellow in case of the
banana. That is, the maintenance of red has spread
across the color cluster, resulting in the maintenance
of any color instance that is experienced during subsequent
presentations.
Weight transport mechanism: Weight transport is
implemented in our network by transporting one weight
to another, that is, by quickly “pulling” all analogous
weights to a common value under task-specific modulation. Weight transport is restricted to temporal working
memory self-weights as shown in Fig. 12; specifically
only to “maintenance” and “suppression” weights. This
is because transporting “activation” to all nodes within
a cluster lights up the entire cluster. Working memory
connectivity dictates that the transport of a “maintenance” weight is only to other “maintenance” weights,
and a “suppression” weight only to other “suppression”
weights.
To understand the weight transport formulation,
suppose that all weights are initially zeroed except
one. Considering fixed-point asymptotic behavior, that
is, the rate of change of each individual weight is
zero in the steady state, and a weight is set to the
average of the rest of the weights only if it is less
than the average, otherwise it remains unchanged.
Hence weights with zero initial value quickly approach
the non-zero weight, achieving the stated intent of weight
transport, that is to “pull” all weights to the same value as
the non-zero weight:
1
31
20 N
X v
w
C
7
6B
7
6B j±i j C
dwvi
C
6B
v7
/ x^z 6B
C 2 wi 7
7
6B N 2 1 C
dt
A
5
4@
…19†
In Fig. 12B, the first row shows quenched and saturated
pulse activities in the “1–2” node. The next row shows
similar activity in the task specific “transport” node. Input
to this modulation node enables weight transport when item
4 is expected. The third row shows pulse-like learning in the
ith working memory weight (either “maintenance” or
“suppression”). The fourth row shows “transport” from
the ith to the jth working memory weight.
4.4.2. Competition between weights
An important feature of working memory temporal
weights is the competition between them. Other researchers
(e.g. von der Malsburg, 1973; Nigrin, 1993) have used similar constructs. Specific competition between “change” and
“activation” weights and between “change” and “suppression” weights are examined here (see Fig. 13):
dwvs
/ 2wvx
dt
…20†
dwvo
/ 2wvx
dt
…21†
Motivation for competition between weights: Observe
that a “change” is always accompanied by a corresponding
“suppression” in the source activity and an “activation” in
the target. For example, the change “red to yellow” in steps
166
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
weights within its modality cluster, that is, potentially a
node can “change” to any other node in the cluster. Conversely a node can also serve as target of multiple “change”
weights. The connectivity in working memory dictates that
inhibition from a “change” weight is only applied to source’s “suppression” and target’s “activation,” but no other
temporal weights.
Fig. 13. Specific competition between weights: change weight …wvx † inhibits activation …wvo † and suppression …wvs † weights.
1 and 2 is accompanied by suppression of red and activation
of yellow. If all three transition weights are learned then this
could lead to undesirable effect. For example, that could
lead to absurd analogies such as “apple is to banana as
square is to yellow-square” or “apple is to banana as redsquare is to square.” To prevent such occurrences, we introduce competition between change weights and activation or
suppression weights. That is, the rate of change in node-tonode temporal weights representing “suppression” and
“activation” is negatively proportional to the node-to-node
temporal weight representing “change.”
Note that a node can serve as source to many “change”
4.5. Intralayer and interlayer connectivity
Figs. 14 and 15 summarize the types of connections both
within and between layers. The competitive and cooperative
interactions in short-term memory are strictly spatial and
intralayer. The connections in working and long-term
memory are both spatial and temporal (i.e. time-delayed),
and both interlayer and intralayer. Fig. 14A shows interlayer
working and long-term spatial connectivity. Note that there
are no spatial learnable connections between other layers
and F5, or layers F1 and F4. As shown in Fig. 14B, nodeto-node LTM temporal connections are mediated by triadic
weights from layer F4. Similar connections in WM in Fig.
14C are mediated by triadic weights from generic transition
categories in layer F4 and task-specific nodes in layer F5.
Fig. 15 gives examples of three kinds of spatial and
temporal connections.
Fig. 14. Types of layer connectivity: (A) Interlayer spatial LTM and WM connectivity. (B) Interlayer temporal LTM connectivity. (C) Interlayer temporal WM
connectivity. Here l represents long-term memory weights.
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
167
Fig. 15. Examples of network connectivity: (A) Long-term memory weights. (B) Working memory weights. (C) Task specific modulator weights. Connections
within one time-step are spatial while across time-steps are temporal. Temporal weights are mediated by triadic weights from relationship nodes (for example,
IS-A or Change), or task specific node (for example, 1–2 or 3–4). Here mqj represents node at previous time-step, { } represents any weight.
5. Simulations
This section presents the results of three analogy-making
experiments. Simulations of the network equations (given in
Appendix A) were carried out in MATLAB using its ode45
function. This subroutine numerically solves a system of
ordinary differential equations, based on an explicit
Runge–Kutta (4, 5) formula (Dormand & Prince, 1980). It
is a “one-step” solver. That is, to compute values at the
current time it needs only the solution at the immediately
preceding time point.
The primary objective of the simulations here is to show
that the network can solve proportional analogies. That is,
the simulations here individually neither demonstrate ARTlike long-term learning of perceptual objects such as
“apple” and “banana,” nor triadic long-term learning of
generalizations such as “color” and “fruit,” and relations
such as “has” and “is.” A corollary of this is that the
long-term memory connections are assumed to exist a
priori, and their values are “caricatures” of what the
“real” weights would be, if they had been acquired in
real-time using ART-like and triadic learning. The initial
values of all long-term memory weights here are set to 1.
(It should be noted that triadic learning in working memory
168
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 16. SCRY network: LTM spatial weights.
Fig. 17. SCRY node activation. (A) Step 1 and 2. (B) Step 3 and 4.
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
169
Fig. 18. SCRY weight vectors: “Maintain,” “Activate,” and “Suppress.”
is still demonstrated, for example in generalizing “apple has
red” to “fruit has color” in the analogy “apple:red <
banana:?.”)
The analogy-making task in each simulation is carried out
in two different sessions: rule learning and application. The
first session includes presentations of item 1 and 2, while the
second session includes the presentation of item 3 followed
by an expectation of item 4, which is typically the answer of
a proportional analogy. The working memory rules learnt
during the first session are made available during the
second.
The simulations presented here explicitly implement the
network constructs given in Section 4, except for a few
simplifications for ease of computation. The task-specific
nodes in layer F5, namely, “1–2,” “3–4,” and “transport,”
and the nodes in F4 representing the four generic transition
categories “change,” “maintenance,” “activation,” and
“suppression,” are not represented as literal nodes in the
simulations. Rather, they are represented as direct modulations of corresponding weights, modulations that occur at
appropriate time steps in the analogy task.
5.1. Red-square:red-circle < yellow-square:?
The first proportional analogy experiment highlights the
utility of weight transport and competition between weights
in producing the answer “yellow-circle” to the analogy “redsquare:red-circle < yellow-square:?” (to be described by its
abbreviation, SCRY).
As shown in Fig. 16, only layers F1 and F3 are implemented here. Layer F1 is made up of two clusters: “form” and
“color.” Form cluster consists of “square” and “circle”
nodes while color cluster consists of “red” and “yellow.”
Layer F3 is made up of one cluster which has abstract category (or generalization) nodes “form” and “color.”
The rest of the diagrams in this subsection depict the
actual simulation runs. Similar figures are presented for
each succeeding experiment. The y-axis displays activation
in individual nodes, while the x-axis shows time increments.
Fig. 17A shows node activities during presentation of
items 1 and 2. In this experiment “red-square” is followed
by “red-circle.” As can be seen during the first time step
there is activity in “red” as well as “square” representing
170
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 19. SCFW Network: LTM spatial weights.
Fig. 20. (A) SCFW node activation: Step 1 and 2. (B) SCFW node activation: Step 3 and 4.
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
171
Fig. 21. SCFW weight vectors: “Maintain,” “Activate,” and “Suppress.”
“red-square.” There is simultaneous activity in the nodes for
the abstractions “form” and “color.” Note that the pulse
shapes during the second time step are slightly different.
This is because of learning in working memory weights,
which starts to contribute to node activities.
Fig. 17B shows node activities during the presentation of
item 3 and an expectation of item 4. As can be seen,
“yellow-square” is presented during the third time step.
Weights learned previously are transferred during the
expectation of item 4. Due to weight transport of “maintenance” from “red” to “yellow” (see Section 4.4.1), and
“change” from “square” to “circle,” the network produces
the answer “yellow-circle.” Note the different pulse shapes
during time step 4, which is due to the “ready” availability
of working memory weights.
Fig. 18 shows learning during presentation of item 2 in
temporal node-to-node working-memory self-weights. As
seen here “red,” “form,” and “color” have learned to
“maintain.” That is, these nodes had significant activity
during presentations of both items 1 and 2. The only working memory “change” weight that gained strength during
the presentation of item 2 was the one from “square” to
“circle.”
5.2. Square-form:square-word < circle-form:?
The second proportional analogy experiment highlights
the utility of rule abstraction, weight transport and competition between weights in producing the answer “circle-word”
to the analogy “square-form:square-word < circle-form:?”
(abbreviated SCFW).
In addition to layers F1 and F3 similar to the ones used in
the previous experiment, Fig. 19 shows the use of layer F2 in
this experiment with nodes “square-bind” and “circle-bind.”
Fig. 20A shows node activities of “square-form”
followed by “square-word.” As can be seen during the
first time step there is not only activity in “square-form”
but also (almost) simultaneous activity in “square-bind”
and “form.”
Fig. 20B shows node activities during the presentation of
item 3 “circle-form,” and an expectation of item 4 during
which weights learned previously are transferred. Due to
172
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 22. ARBY network: LTM spatial weights.
combined effects of weight transport of “maintenance” to
“circle-bind,” and “form-to-word” change, “circle-word” is
produced as the answer to this analogy. It should be noted
that during step 4 activity is not only confined to “circleword” but is also present in “circle-bind” and “word.” This
is due to the long-term memory spatial connections.
Fig. 21 shows learning during presentation of item 2 in
temporal node-to-node working-memory self-weights. The
network learns here that “square-bind” is “maintained.”
Note the slight gain in “activate” self-weights of “squareword” and “word.” Similarly a small gain occurs in the
“suppress” self-weights of “square-form” and “form.”
These gains are in the “suppression” and “activation” selfweights that accompany the “change” weights. The network
Fig. 23. ARBY network: “Has” LTM temporal weights.
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
173
Fig. 24. ARBY node activation: (A) Step 1 and 2. (B) Step 3 and 4.
not only learned the change weight from “square-form” to
“square-word,” but also from the abstract node representing
“form” to the one representing “word.” It should be noted
that without this generalized rule, the network cannot make
the analogy under consideration.
5.3. Apple:red < banana:?
The third proportional analogy experiment highlights the
utility of triadic learning, rule abstraction, weight transport
and competition between weights in producing the answer
“yellow” to the analogy “apple:red < banana:?.” (ARBY).
As shown in Fig. 22, the layer F3 has two clusters (Edible
Objects and Abstract Senses). Temporal (triadic) long-term
memory connections to and from the relation node “has” are
shown in Fig. 23.
Fig. 24 shows node activities of “apple” followed by
“red.” Note the simultaneous significant activity in
“apple,” “red,” “round,” and “fruit” during time step 1.
During time step 2, in addition to activity in “red” and
“color,” there is also significant activity in the relation
node “has.” This is due to the triadic contribution of the
temporal long-term memory weight “apple has red.”
Fig. 25 shows learning in working memory self-weights.
Note that “red” is maintained, while “apple” and “round”
are suppressed.
The only change weight learnt in this experiment is from
“fruit” to “color.” Note that a direct weight from “apple” to
“red” is not learnt here because of working memory connectivity which restricts learning of “change” only within a
layer but not across layers. The node-to-node weight “fruit
to color” is mediated by a triadic weight from the relation
174
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
Fig. 25. ARBY weight vectors: “Maintain,” “Activate,” and “Suppress.”
node “has.” Along with the weights described previously,
triadic weights to and from “has” and from “fruit” to “color”
are also learnt during the presentation of item 2. This is as if
the network has inferred that the same relation as in “apple
to red” applies to the generalization “fruit to color.”
During the expectation of item 4, maintenance is spread
to “yellow” and suppression to “banana” and “cylinder” in
the color, fruit, and shape clusters, respectively. This weight
transport combined with the generic rule “fruit has color”
activates the “yellow” node following the presentation of
“banana.”
5.4. Additional experiments
We did two more proportional analogy experiments of
which the descriptions are given next. The architectures
and simulation results are not shown here for brevity, but
reflect the same general principles of network organization
as do the architectures and results of our first three experiments.
The fourth proportional analogy experiment highlights
the utility of relation nodes, weight transport and competition between weights in producing the answer “shape” to the
analogy “red:color < round:?” First, “red” is presented to
the network. This activates “apple” and “color.” Activation
of “apple” is relatively high here because the network does
not know of many “red-colored things,” which would have
otherwise competed with the apple node and kept its activity
low. In absence of other red-colored things, the threshold for
layer F2 is chosen so that the activity in apple is regarded as
“off.” The “IS-A” relation node becomes active when
“color” follows “red.” This is due to the long-term triadic
weight contribution to “IS-A” from “apple” and “color.”
The network produces the answer “shape,” when presented
“round” as item 3 followed by the learned working memory
triadic weight activating “IS-A.”
Among the working memory node-to-node self-weights,
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
color is “maintained,” “IS-A” is activated, and “red” is
suppressed. No change weights were learnt during the
presentations of items 1 and 2.
The last proportional analogy-making experiment highlights the utility of relation and transition learning in generic
categories, in producing the answer “D” to the classical
analogy “A:B < C:?.” The network here is made up of
layers F2 and F4 only. Layer F2 consists of alphabet nodes
such as “A,” “B,” “C,” and so on. Although the sensory
layer F1 is not modeled in this experiment, the node “A”
in layer F2 can be thought of as a binding node that holds
together features such as “shape-A” and “word-A” in layer
F1.
The network does not have any spatial weights. Instead
there are two sets of temporal (triadic) long-term memory
weights from relation nodes: “forward” and “reverse.”
Activation of the node representing item 1, “A,” is
followed by activation of item 2, “B.” Through triadic
connections, the relation node “forward” is also activated
during step 2. The application of previously learned working memory rules to item 3, “C,” produces “D” during step 4
as the answer to the analogy under consideration. In this
experiment the only self-weight learned is “activate
forward.” In this experiment the only node-to-node working
memory weight that encodes “change” is “change A to B.”
6. Discussion
6.1. Issues in analogy-making
Modeling proportional analogies of the type
“A:B < C:D” poses several challenges. Some but not all
of these challenges have been met herein.
The first challenge is that the path connecting A to B
needs to be generalized. This requires nodes symbolizing
generalizations of A and B. For example, the abstract
concepts of “color” and “fruit” are respective generalizations of the sensory feature “red” and perceptual object
“apple.” The analogy-making network proposed here
directly addresses this issue, and thus can solve the analogy
“apple:red < banana:?,” by generalizing the transformation
“apple: red” to “fruit: color.”
The second challenge is to not only remember the literal
temporal path from A to B, but also to capture the nature of
this transition in terms of relations, such as “has,” “is,”
“forward” and “reverse.” The model here resolves this
issue by use of relation learning and conceptual triads. It
can solve, for example, the analogies “red:color <
round:shape,” and “a:b < c:d,” by abstracting the relation
“is” in the former case and “forward” in the latter.
The third challenge is that the network may not have any
prior knowledge of entities A and B. It may have knowledge
about the individual features of these entities but no direct
experience of them as such. In such cases, it is required to
interpret the transformation A to B not in terms of their
175
generalizations but in terms of their components. The
network proposed here provides such a mechanism in
terms of encoding generic working memory transitions:
activation, suppression, maintenance and change. This
makes it possible to make the analogy “red-square:redcircle < yellow-square:yellow-circle.”
The fourth challenge, for analogies more complex than
those studied herein, is that there may exist more than one
direct path from A to B. Because the nature of C cannot be
anticipated before its presentation, it cannot be decided a
priori which path will be the most effective in producing D.
Sometimes this is context driven, as in “red:green <
yellow:red” (rotation of colors in a traffic signal light)
versus “red:green < tomato:cucumber.” The third item
“yellow” or “tomato” determines which of the transformations embodying “red:green” is most relevant. To some
extent the proposed analogy network provides this capability (also called “conceptual slippage” by Mitchell, 1993). It
can make analogies such as “square-form:square-word <
circle-form:circle-word,” and “square-form:square-word <
red:red-word.” Because the third item is a shape in the first
case versus a color in the second, the network has to relax
the initial rule “verbalize the shape” into just “verbalize.”
The fifth challenge arises when no direct connection
between A and B exists. This requires tracing a path
comprising more than one link. This sort of multi-link
traversal (or search) becomes particularly challenging in
networks where nodes have only local but no global visibility. This is because at every node along a given path there
could be several possible links to explore. Without some
global visibility or guidance a local search may be futile.
The network proposed here does not provide a resolution for
this issue. For this reason the network here would not be
able make the analogy “apple:spoken-word-red <
banana:spoken-word-yellow,” which requires a sequence
of two transformations: the first to “produce the color of
the fruit,” and the next to “verbalize that color.”
6.2. Limitations of this model
The simulation of analogy-making herein is in some
sense a contrived version of how it is presumably done in
real life. Arguably, humans do not perform analogies the
way the simulation proceeds about this task, which is to
serially go from presentation of item 1 to item 2 to item 3
and not be able to revisit any previous items. A more plausible scenario is where items are presented to the subject all
at once and left there for some definite period of time, or
presented one-by-one but re-presented on demand. In most
real-life situations subjects benefit from the facility of attention, and also from what can be referred to as “mental
tagging.” Although our model employs lateral inhibition
to “attend” to the active item at current time and also can
“remember” what item was presented in which order, it does
not “consciously” go back and forth between items.
In our network, the analogy-making task is hard-coded.
176
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
That is, although the network can learn on its own the individual transitions between items 1 and 2, it is explicitly told
when exactly to apply them to item 3. Learning how to selforganize this task-specific behavior would require extending
the network to include generalization across multiple
analogy-making episodes (Burns, 1996).
As suggested earlier, analogy-making is considered to be
a working memory process that may lead to long-term
memory depending on the utility of the results. Our network
does not address the consolidation of working memory into
long-term memory; this is considered in other neural
network models (e.g. Banquet et al., 1998).
Another limitation of the current stage of our network is
its inability to model simple analogies with repetitions (for
example “aa:bb < cc:?”). Also, it cannot model analogies
with transformations traversing more than one link (see the
“fifth challenge” in Section 6.1).
6.3. Conclusions
A neural network theory has been introduced that leads to
solutions of several commonsense proportional analogies
among elementary concepts. The network introduced herein
follows an established tradition within the neural network
community of breaking a complex cognitive task into its
constituent operations and seeking to model those operations. The operations involved in analogy-making include
reasoning about relations and combining and splitting
concepts, and both of these have been modeled here.
The network implementations of relations or mappings
between concepts have led to the introduction of several
types of connections that are unconventional in the current
neural network literature. These include connections that
represent activation or suppression of a particular feature,
change from one feature to a related one, maintenance of a
feature, and transport or generalization of maintenance or
suppression weights.
Is there a possible way that such unconventional connections might be represented in the actual brain? We have
suggested that they might be based on the matching-node
implementation shown in Fig. 8, which has been postulated
to occur in various areas of sensory, motor, and association
cortex (Guigon et al., 1995). The ability of even young children to learn, and to reason based on, some simple mappings of
these types suggest that aspects of some of these processes in
the association cortex might be hard-wired instead of learned
from experience. Or it may be that there are hard-wired circuits
for general operations like “activation,” but that learning via
long-term memory is necessary to make these circuits represent specific operations like “add yellow.” Further ideas about
how this works might be obtained from brain imaging (PET or
fMRI) studies that investigate which brain areas are active
while people are thinking about particular abstract concepts.
There have been some preliminary imaging studies of cognitive tasks, but few that have dealt with thinking about highlevel abstractions.
Further extensions of our analogy network can be
suggested that deal with related but different cognitive
tasks. Instead of proportional analogies, a similar network
might be constructed that deals with geometric analogies, as
studied by Blank (1997), or analogies from one narrative
domain to another, as studied by Vosniadou and Ortony
(1989). Also, the network could be varied to deal with
similes or metaphors. Finally, the relational and mapping
aspects of the network study can be brought to bear on
potential neural network analyses of a variety of problems
that have traditionally been part of symbolic artificial intelligence. One of these is property inheritance: how do we infer
that a general category possesses the properties of a more
specific subcategory, or vice versa? Furl (1999) describes a
neural network model of property inheritance based on ART
(see Section 3.4), which could be integrated with a later
stage of our analogy model. Another problem that can possibly be addressed (with explicit representations of abstract
categories) is how to implement the “axiom of choice,” that
is, to be able to answer queries such as “give me all colors,”
“give me a color,” or “give me a different color.”
Hence, our network does not solve all problems in
analogy learning and solving, nor does it yet point to a
testable theory of how the human brain performs these
tasks. It is, however, an advance in the direction of forming
a plausible connectionist network model of these tasks,
based on non-linear dynamics. We believe that our model
captures better than previous network models the qualitative
essence of results from infants (Goswami, 1998; Vosniadou
& Ortony, 1989) and non-human primates (Thompson et al.,
1997) suggesting that analogical processes occur earlier in
cognitive development than was previously supposed. Other
models (e.g. Barnden & Srinivas, 1992; Hummel &
Holyoak, 1997) have tended more than ours to base their
learned analogical relationships on the complexity of
English semantic structure.
Moreover, many of the building blocks of our network
model have previously been used in models of simpler
mental processes such as pattern classification and conditioning. It is thus a step toward the dynamic multilevel
unification of our understanding of the mechanistic basis
of human cognition.
Appendix A. Network equations
A.1. Glossary of mathematical symbols used
Variables
t
time
x
node activation
x^
quenched and saturated node activation
x~
inverted node activation
x
integrated node activation
x^
quenched integrated node activation
x~
inverted integrated node activation
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
x_
differential node activation
w
learnable weight
I
input
Qualifiers
v
working memory
l
long-term memory
t
triadic connection
x
change
m
maintenance
o
activation
s
suppression
1–2
transition from step 1 to step 2
3–4
transition from step 3 to step 4
z
weight transport modulator
Indexes
i, j, k, A, B, C node
l, m, n layer
p, q, r cluster
mqj
node at previous time-step
{}
weight
{mqj, lpi}, {AB} node-to-node connection
{nrk, {mqj, lpi}}, {C,{AB}} node-to-weight connection
{{mqj, lpi}, nrk}, {{AB}, C} weight-to-node connection
Functions
f
sigmoid
r
quench
j
saturate
h
invert
A.1.1. Node parameters
A
B
C
u lr
ujl
u lr
u jl
cr
cj
ch
node activity decay
maximum node activity
minimum node activity
quenching threshold for nodes in layer l, used
during learning working memory weights
saturation threshold for nodes in layer l, used
during learning working memory weights
quenching threshold for integrated node activities
in layer l
saturation threshold for integrated node activities
in layer l
quench down node activations to this value
saturate up node activations to this value
invert node activations from this value, used during
learning self working memory weights
A.1.2. Weight parameters
D
E
F
working memory weight decay, D , A
minimum working memory weight
maximum working memory weight
A.1.3. Interaction and coupling parameters
a mqj;lpi excitatory interaction coefficient from node mqj to
lpi, where: lpi ˆ mqj
177
bmqj;lpi inhibitory interaction coefficient from node mqj to
lpi, where: l ˆ m; pi ± qj
gmqj;lpi coupling coefficient from node mqj to lpi, where:
l ˆ m; pi ± qj
dnrk;mqj;lpi coupling coefficient from node nrk to weight
wvmqj;lpi ; where: l; m ± n
emqj;lpi inhibitory interaction coefficient from weight
o
wvx
to weight wvlpi;lpi
and weight wvs
; where:
lpi;lpi
mqj;lpi
l ˆ m; pi ± qj
f1–2;nrk;mqj;lpi coupling coefficient from node “1–2” to
; where: l; m ± n
weight wvt
nrk;{mqj;lpi}
A.1.4. Temporal parameters
ton
tavg
tstep
i
Ilpi;step…t†
on time of the input pulses
time interval over which past node activity is integrated, tavg . ton . 0
time duration of the ith step input, tstep
. ton . 0
i
input value to node lpi during step (t)
A.1.5. Network parameters
L
Gl
Nlp
total number of layers
number of clusters in layer l
number of nodes in cluster p of layer l
A.1.6. Functions
f …x†
node activation function (e.g. sigmoid, linear,
faster-than-linear)
zero, if x negative, otherwise x
…x†1
r…x; u r ; c r † quench x down to c r if x , u r ; otherwise x
j…x; u j ; c j † saturate x up to c j if x . u j ; otherwise x
h…x; c h † invert x to ux 2 c h u
step…t† item present at time t
onflag…t†1 if input pulse is on at time t, otherwise 0
A.1.7. Activation variables
x lpi
xrlpi
xjlpi
x^ lpi
x~lpi
xlpi
x^lpi
x~lpi
x_lpi
Ilpi
xx
xm
activation of the ith node in cluster p of layer l at
time t
quenched activation of the node lpi
saturated activation of the node lpi
quenched and saturated activation of the node lpi
inverted activation of the node lpi
moving window average of the node lpi
quenched and saturated moving window average of
the node lpi
inverted moving window average of the node lpi
differential activation of the node lpi
input to the node lpi
activation of the “change” relation node in layer F4
activation of the “maintenance” relation node in
layer F4
178
xo
xs
x1–2
x3–4
xz
l
ylpi
ylt
lpi
yvlpi
yvx
lpi
yvm
lpi
yvlpio
yvs
lpi
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
activation of the “turn on” relation node in layer F4
activation of the “suppress” relation node in layer
F4
acitvation of the “1–2” task specific node in layer
F5
activation of the “3–4” task specific node in layer
F5
activation of the “transport” task-specific node in
layer F5
contribution of the spatial long-term memory
weights to the node lpi
contribution of the triadic long-term memory
weights to the node lpi
contribution of the spatial working memory
weights to the node lpi
contribution of the triadic working memory
weights encoding “change” to the node lpi
contribution of the triadic working memory
weights encoding “maintenance” to the node lpi
contribution of the triadic working memory
weights encoding “turning on” to the node lpi
contribution of the triadic working memory
weights encoding “suppression” to the node lpi
A.1.8. Long-term memory weight variables
wlmqj;lpi long-term memory weight from node mqj to node
lpi, where: pi ± qj
wlmqj;lpi long-term memory weight from node mqj at
previous time-step to node lpi at current time-step
triadic long term memory weight from node
wlt
nrk;{mqj;lpi}
nrk at current time-step to wlmqj;lpi where: l; m ± n
triadic long term memory weight from weight
wlt
{mqj;nrk};lpi
wlmqj;nrk to node lpi at current time-step where:
m; n ± l
A.1.9. Working memory weight variables
wvmqj;lpi spatial working memory weight from mode mqj to
node lpi, where: l ± m; pi ± qj
wvmqj;lpi temporal working memory weight from node mqj
at previous time-step to node lpi at current time; wvm
;
step (could stand for any of the wvx
mqj;lpi
lpi;lpi
vo
vs
wlpi;lpi ; wlpi;lpi )
working memory weight, encoding “change” from
wvx
mqj;lpi
node mqj at previous time-step to node lpi at
current time-step, where: l ˆ m; pi ± qj
o
working memory weight, encoding “turning on” of
wvlpi;lpi
node lpi from previous to current time-step
working memory weight, encoding “maintenance”
wvm
lpi;lpi
of node lpi from previous to current time-step
working memory weight, encoding “suppression”
wvs
lpi;lpi
of node lpi from previous to current time-step
summation of working memory weights encoding
w4
lpi;lpi
“maintenance”, “turning on” and “suppression” of
node lpi from previous to current time-step
wvt
triadic working memory weight from node
nrk;{mqj;lpi}
nrk at current time-step to weight wvmqj;lpi ; where:
l; m ± n
triadic working memory weight from weight
wvt
{mqj;nrk};lpi
wvmqj;nrk ; to node lpi at current time-step, where:
m; n ± l
triadic working memory weight from
wvt
1–2;{nrk;{mqj;lpi}}
node “1–2” at current time-step to weight
wvt
nrk;{mqj;lpi}
vt
w{nrk;{mqj;lpi}};1–2 triadic working memory weight from
weight wvt
to node “1–2” at current timenrk;{mqj;lpi}
step
A.2. Node activity equations
Node activities in the analogy-network behave according
to Eq. (22). The first term on the right-hand side represents
passive exponential decay. The second term represents
external input. The third term represents excitatory influences (in the first square bracket) that are shunted by how
far the node activity is from its maximum. The fourth term
represents inhibitory influences (in the second square
bracket) that are shunted by how far the node activity is
from its minimum
dxlpi
1 l
v
ˆ 2Axlpi 1 Ilpi 1 …B 2 xlpi † ylpi 1 ylt
lpi 1 ylpi
dt
1
yvx
lpi
1
yvm
lpi
1
yvlpio
1
Gm N
mq
L X
X
X
amqj;lpi f …xmqj †
mˆ1 qˆ1 jˆ1
Gm N
mq
L X
X
X
1
b
f
…x
†
2 …xlpi 2 C†1 yvs
lpi
mqj;lpi
mqj
…22†
mˆ1 qˆ1 jˆ1
For the immediate purposes of the analogy-making
network, the external input can be regarded as a time-varying user-specified quantity. But for the sake of complete
description of the network, Eqs. (23)–(25) show the specific
computations for converting an initial scalar input parameter into a time-varying quantity
where : Sk # t , Sk11 ; Sk ˆ
step…t† ˆ k
k
X
tstep
…23†
i
iˆ0
(
onflag…t† ˆ
1 if Sk # t , …Sk 1 ton †
0 otherwise
Ilpi …t† ˆ onflag…t† × Ilpi;step…t†
…24†
…25†
Seven excitatory influences in Eq. (22), respectively,
represent contributions from the following weights: spatial
LTM, temporal (triadic) LTM, spatial WM, temporal (triadic) “change” WM, temporal (triadic) “maintenance” WM,
temporal (triadic) “activation” WM, and fixed spatial STM
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
cooperation. Two inhibitory influences represent contributions from temporal (triadic) “suppression” WM weights
and fixed spatial STM competition. Each one of these
contributions (except fixed STM interactions) is individually explained next.
Contribution of spatial LTM weights to the current node
is defined in
Gm N
mq
L X
X
X
yllpi ˆ
wlmqj;lpi f …xmqj †
…26†
mˆ1 qˆ1 jˆ1
f …x† ˆ
1
1 1 e2qx1w
…27†
Gm N
Gn X
Nnr
mq X
L X
L X
X
X
mˆ1 qˆ1 jˆ1
×
1
wlmqj;lpi
mˆ1 qˆ1 jˆ1
wlt
f …xnrk †
nrk;{mqj;lpi}
1
Gm N
Gn X
Nnr
mq X
L X
L X
X
X
Gn X
Nnr
L X
X
nˆ1 rˆ1 kˆ1
nˆ1 rˆ1 kˆ1
…wvt
wvx †f …xnrk †
{mqj;nrk};lpi mqj;nrk
…30†
Contribution of temporal (triadic) “maintenance” WM
weights is defined in Eq. (15) as an addition of two terms.
The first term reflects the current node considered to be the
“to-node” in two different kinds of triads where the “relation
nodes“ are, respectively, “maintenance” and “1–2.” The
second term reflects considering the current node as the
“relation node” of a triad. Note again the use of past activity
1
Gm N
mq
L X
X
X
mˆ1 qˆ1 jˆ1
wvt
; wvm
f …xmqj †f …x^mqj †
m{mqj;mqj};lpi
mqj;mqj
…31†
l
…wlt
w
†f
…x
†
nrk
{mqj;nrk};lpi mqj;nrk
nˆ1 rˆ1 kˆ1
(28)
Contribution of spatial WM weights to the current node is
defined in Eq. (29):
Gm N
mq
L X
X
X
f …xx †
wvt
x;{mqj;lpi}
vx
vt
^mqj †
wvt
f
…x
†
1
w
f
…x
†
×
w
f
…
x
nrk
1–2
nrk;{mqj;lpi}
1–2{mqj;lpi}
mqj;lpi
f …x^mqj †
mˆ1 qˆ1 jˆ1
yvlpi ˆ
Gm N
mq L X
X
X
vm
vt
vt
^
yvm
lpi ˆ …wm;m{lpi;lpi} f …xm † 1 w1–2;m{lpi;lpi} f …x1–2 †† × wlpi;lpi f …xlpi †
nˆ1 rˆ1 kˆ1
Gm N
Gn X
Nnr
mq X
L X
L X
X
X
f …x^mqj †
yvx
lpi ˆ
×f …x^mqj †
Contribution of temporal (triadic) LTM weights to the
current node is defined in Eq. (28) as an addition of two
terms. The current node is considered to play the role of “tonode” in a triad (see Section 4.3.1) The second term considers the current node as the “relation node” of a triad
ylt
lpi ˆ
past activity in Eq. (30)
mˆ1 qˆ1 jˆ1
where f is the sigmoid function
179
wvmqj;lpi f …xmqj †
…29†
mˆ1 qˆ1 jˆ1
Contribution of temporal (triadic) “change” WM weights
is defined in Eq. (30) as an addition of two terms. The
second term reflects the current node as the “relation
node” of a triad. The first term reflects the current node
considered to play the role of “to-node,” but in three different kinds of triads. This is shown as the three additive terms
inside the first square bracket. The first of which considers
current node as the “to-node” of a triad that has “change” as
its “relation node.” The second term concerns a triad that
has any other “relation node” such as “has,” “is,” “forward,”
and so on. The third term is from a triad that has the taskspecific node “1–2” as its “relation node.” Note the use of
Contribution of temporal (triadic) “activation” WM
weights is defined in Eq. (32) It can be explained as in
Eq. (31) by replacing “maintenance” by “activation.” Note
the use of inverted past activity (see Section 4.3.2.)
o
f …xo † 1 wvt
f …x1–2 †† × wvlpi;lpi
f …x~lpi †
yvlpio ˆ …wvt
o;o{lpi;lpi}
1–2;o{lpi;lpi}
1
Gm N
mq
L X
X
X
o
wvt
; wvmqj;mqj
f …xmqj †f …x~mqj †
o{mqj;mqj};lpi
mˆ1 qˆ1 jˆ1
…32†
Contribution of temporal (triadic) “suppression” WM
weights is defined in Eq. (33). It can be explained as in
Eq. (31) by replacing “maintenance” by “suppression.”
vt
vt
vs
^
yvs
lpi ˆ …ws;s{lpi;lpi} f …xs † 1 w1–2;s{lpi;lpi} f …x1–2 †† × wlpi;lpi f …xlpi †
1
Nmq
Gm X
L X
X
mˆ1 qˆ1 jˆ1
wvt
; wvs
f …xmqj †f …x^mqj †
mqj;mqj
s{mqj;mqj};lpi
…33†
The various node activity operations given in Eqs. (34)–
(41) are similar to one another but modified to reflect individual node indexes pertinent to different clusters and layers
of a network. They also reflect additional constrains on
different thresholds, and saturation, quenching and inverting
180
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
values:
dwvt
nrk{mqj;lpi}
(
xrlpi ˆ r…xlpi ; ul r ; c r † ˆ
x lpi
if xlpi $ ul r
cr
otherwise
dt
…34†
where : c r ˆ C
xjlpi ˆ j…x lpi ; ulj ; c j † ˆ
8
< x lpi
if xlpi , ulj
:
otherwise
cj
…35†
h
x~ lpi ˆ h…x^lpi ; c † ˆ ux^ lpi 2 c u;
x lpi ˆ
t 2…t on 1 tavg †
t avg
…43†
1f1–2;nrk;mqj;lpi wvt
x^ }
1–2;{nrk{mqj;lpi}} 1–2
dwvt
{mqj;lpi};nrk
dt
x^ lpi ˆ j…r…x lpi ; ulr ; c r †; ujl ; c j †
Zt 2 ton
1…F 2 wvt
†1
nrk{mqj;lpi}
×{dnrk;mqj;lpi x^nrk wvmqj;lpi x^mqj x^lpi
where : c j ˆ B; u lr ˆ ujl
h
ˆ 2D…wvt
2 E†1 wvt
nrk{mqj;lpi}
nrk{mqj;lpi}
…36†
h
j
where : c ˆ c …37†
ˆ 2D…wvt
2 E†1 wvt
{mqj;lpi};nrk
{mqj;lpi};nrk
1…F 2 wvt
†1
{mqj;lpi};nrk
…44†
× {dmqj;lpi;nrk wvmqj;lpi x^mqj;lpi x^mqj x^lpi x^nrk
x^lpi dt
…38†
x^lpi ˆ j…r…x lpi ; u lr ; c r †; u jl ; c j †;
where : u lr ˆ u jl …39†
x~ lpi ˆ h…x^lpi ; c h † ˆ ux^ lpi 2 c h u;
where : c h ˆ c j …40†
x_lpi ˆ ux^lpi 2 x^lpi u
…41†
A.3. Long-term memory equations
No LTM weights are changed in this network. That is, the
rate of change is set to zero for all four kinds of LTM
l
; temporal node-to-node wlmqj;lpi ;
weights: spatial wmqj;lpi
; and temporal
temporal (triadic) node-to-weight wlt
nrk;{mqj;lpi}
:
(triadic) weight-to-node wlt
{mqj;nrk};lpi
A.4. Working memory equations
The rate of change of spatial WM weights is defined in
Eq. (42). It is similar to the standard Hebbian law, with the
modifications that the negative influence is shunted by how
far the weight is from its minimum, while the positive influence is shunted by how far the weight is from its maximum
dwvmqj;lpi
ˆ 2D…wvmqj;lpi 2 E†1 wvmqj;lpi 1 gmqj;lpi
dt
…42†
x^ }
1 f1–2;mqj;lpi;nrk wvt
1–2;{{mqj;lpi};nrk} 1–2
The triadic weights in Eqs. (43) and (44) are learning
“changes,” that is, they are connected to weights that have
different source and target nodes. For brevity, triadic weights
from “maintenance,” “activation,” and “suppression” relation
nodes are not given here, but their rates of change are similar to
Eqs. (43) and (44). For the temporal node-to-node WM weight
encoding “change” as defined in Eq. (45), decay and product of
negative and positive influences are similar to Eq. (43). Positive influences are from three sources. The first two terms in
the curly bracket depicts the influence of differential Hebbian
learning in a triad made up of source, target, and “change”
nodes. The third term in the curly bracket represents influence
of triadic weights from relation nodes such as “has” or “is.”
The fourth term is due to the triadic weight from the taskspecific node “1–2.”
dwvx
mqj;lpi
dt
2 E†1 wvx
1 …F 2 wvx
†1
ˆ 2D…wvx
mqj;lpi
mqj;lpi
mqj;lpi
n
× gmqj;lpi …x^mqj x_mqj †…x^lpi x_lpi †
x^
1 dx;mqj;lpi wvt
x;{mqj;lpi} x
…F 2 wvmqj;lpi †1 x^mqj x^lpi
The rate of change in temporal triadic WM weights is
defined in Eqs. (43) and (44), which, respectively, describe
the behavior of node-to-weight and weight-to-node triadic
weights. In addition to the positive influence due to triadic
learning (as shown in the first term in the curly bracket),
they are also influenced by the triadic weight from the taskspecific “1–2” node. This is shown as the second term in the
curly bracket and is analogous to a regular triadic weight’s
influence on its node-to-node weight (see Section 4.3.1)
1
Gn X
Nnr
L X
X
dnrk;mqj;lpi wvt
x^
nrk;{mqj;lpi} nrk
nˆ1 rˆ1 kˆ1
x^
1 d1–2;mqj;lpi wvt
1–2;{mqj;lpi} 1–2
o
…45†
For convenience, the summation of three self-WM
weights is defined in Eq. (46) and is used in Eqs. (47)–
(49). This quantity exerts an inhibitory influence in Eq.
(47), and thereby represents non-specific competition
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
dwvs
lpi;lpi
between self-weights
o
ˆ wvm
1 wvlpi;lpi
1 wvs
w4
lpi;lpi
lpi;lpi
lpi;lpi
…46†
The rate of change in temporal node-to-node WM selfweight encoding “maintenance” is defined in Eq. (47).
Decay and multiplication of negative and positive
influences are similar to Eq. (43) except that the negative
influence is from all self-weights (46) instead of just “maintenance” self-weight. The first two terms in the curly bracket
depicts the influence of delayed Hebbian learning in a triad
made up of source, target and “maintenance” nodes (see
Section 4.2). The third term is due to the triadic weight
from the task-specific node “1–2.” The fourth term represents influence of “weight transport” from other “maintenance” self-weights within the same cluster (see Section
4.4.2)
dwvm
lpi;lpi
x^ 1 dz;lpi;lpi x^z
1 d1–2;lpi;lpi wvt
1–2;{lpi;lpi} 1–2
Nlp
X
11 9
>
>
C
C =
C
C
A
A >
>
2 wvm
;
lpi;lpi
1
wvm
lpj;lpj
BB
BB
@@ jˆ1;j±i
Nlp 2 1
…47†
The rate of change in temporal node-to-node WM
self-weight encoding activation is defined in Eq. (48).
This equation is similar to Eq. (47) with three modifications: additional negative influence from “change”
weight (see Section 4.3.2), delayed anti-Hebbian learning
(see Section 4.3.2), and absence of “weight transport.” The
rate of change in temporal node-to-node WM self-weight
encoding “suppression” is defined in Eq. (49), which is
similar to Eq. (47) with two changes: competition from
“change” weights and delayed anti-Hebbian learning
o
dwvlpi;lpi
dt
ˆ
o
2D…wvlpi;lpi
1
2 E† w4
lpi;lpi
Gm N
mq
L X
X
X
mˆ1 qˆ1 jˆ1
1
w4
lpi;lpi
1
Nmq
Gm X
L X
X
mˆ1 qˆ1 jˆ1
!
elpi;mqj wvx
lpi;mqj
†1
1 …F 2 wvs
lpi;lpi
8
>
>
<
x^
× glpi;lpi x^lpi x~lpi 1 ds;lpi;lpi wvt
s;{lpi;lpi} s
>
>
:
1 d1–2;lpi;lpi wvt
x^ 1 dz;lpi;lpi x^z
1–2;{lpi;lpi} 1–2
00
Nlp
X
11 9
>
>
C
C =
C
C
A
A >
>
2 wvs
;
lpi;lpi
1
BB
BB
@@ jˆ1;j±i
Nlp 2 1
wvs
lpj;lpj
…49†
The primary difference between the task-specific WM
weights described here and the temporal WM weights
described in the last subsection is that the former use active
decay (see Section 4.3) whereas the latter use passive decay.
Task-specific weights are only triadic, that is they are not
from a node to another node but rather to an “assembly” and
vice versa. The definitions here are given for connection to
and from “1–2” node. Definitions for connections from
other nodes such as “3–4” are similar but not shown here
for brevity. Eq. (50) describes the rate of change of a weight
between presynaptic “1–2” and a postsynaptic “assembly”
which is another triadic(not task-specific) weight. Eq. (51)
similarly describes the rate of change of a task-specific
weight in the reverse direction, that is, with switched presynaptic and postsynaptic roles
dwvt
1–2;{nrk;{mqj;lpi}}
dt
ˆ x~3–4 x^1–2
× { 2 D…wvt
2 E†1 wvt
1–2;{nrk;{mqj;lpi}}
1–2;{nrk;{mqj;lpi}}
1 ‰…F 2 wvt
†1 f1–2;nrk;mqj;lpi
1–2;{nrk;{mqj;lpi}}
x^ x^ †Š}
x^ wv
…wvt
nrk;{mqj;lpi} nrk mqj;lpi mqj lpi
dwvt
{nrk;{mqj;lpi}};1–2
o
emqj;lpi wvx
†1
1 …F 2 wvlpi;lpi
mqj;lpi
dt
…48†
…50†
ˆ x~3–4 …wvt
x^ x^ †
x^ wv
nrk;{mqj;lpi} nrk mqj;lpi mqj lpi
2 E†1 wvt
× { 2 D…wvt
{nrk;{mqj;lpi}};1–2
{nrk;{mqj;lpi}};1–2
1 …F 2 wvt
†1 fnrk;mqj;lpi;1–2 x^1–2 }
{nrk;{mqj;lpi}};1–2
x^
× {glpi;lpi x~lpi x^lpi 1 do;lpi;lpi wvt
o;{lpi;lpi} o
x^ }
1 d1–2;lpi;lpi wvt
1–2;{lpi;lpi} 1–2
ˆ 2D…wvs
2 E†1
lpi;lpi
A.5. Task-specific working memory equations
ˆ 2D…wvm
2 E†1 w4
1 …F 2 wvm
†1
lpi;lpi
lpi;lpi
lpi;lpi
dt
8
>
>
<
× glpi;lpi x^lpi x^lpi 1 dm;lpi;lpi wvt
x^
m;{lpi;lpi} m
>
>
:
00
dt
181
…51†
The definitions in Eqs. (52) and (53) are the same as in Eqs.
(50) and (51) except that the “assembly” in here represents a
node-to-node weight instead of a triadic weight as in the
182
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
References
previous case
dwvt
1–2;{mqj;lpi}
dt
ˆ x~3–4 x^1–2 × { 2 D…wvt
2 E†1
1–2;{mqj;lpi}
wvt
1 …F 2 wvt
†1
1–2;{mqj;lpi}
1–2;{mqj;lpi}
d1–2;mqj;lpi wvmqj;lpi x^mqj x^lpi }
dwvt
{mqj;lpi};1–2
dt
…52†
ˆ x~3–4 …wvmqj;lpi x^mqj x^lpi †
2 E†1 wvt
× { 2 D…wvt
{mqj;lpi};1–2
{mqj;lpi};1–2
1 …F 2 wvt
†1 dmqj;lpi;1–2 x^1–2 }
{mqj;lpi};1–2
…53†
Appendix B. Parameters and initial conditions
Parameters can be classified into five categories: node,
weight, interaction and coupling, temporal and network.
Most of these parameters have values that are common
across all experiments. These values are A ˆ 1:5; B ˆ 1:0;
C ˆ 0:0; D ˆ 1:0; E ˆ 1:0; F ˆ 0:0; u ˆ 20:0; f ˆ 10:0;
u rl ˆ 0:05; u jl ˆ 0:05; c r ˆ 0:0; c j ˆ 1:0; c h ˆ 1:0; a ˆ
1:0; b ˆ 2:0; e ˆ 5:0; g ˆ 1:0; d ˆ 1:0; f ˆ 5:0; ton ˆ 1:0;
tstep ˆ 25; tavg ˆ 25; L ˆ 4: Input values are 1 (on) during
on time and 0 (off) otherwise. Maximum and minimum
values for both node activities and weights are 1 and 0.
Choices for other parameters such as short-term and working memory decay and interaction and coupling coefficients
are made by considering the steady state values of the
system variables. Short-term decay is typically higher than
working memory decay. Saturation and quenching values
are 1 and 0, respectively. Inversion value is chosen as 1.
Quenching and saturation thresholds for temporally integrated activities are kept the same, with value typically
lower than the threshold for regular node activities. These
quenching and saturation values for the different analogy
experiments are as follows: for SCRY, 0.4 in Layer 1 and
0.2 in Layer 3; for SCFW, 0.5 in Layer 1, 0.3 in Layer 2, and
0.4 in Layer 3; for ARBY, 0.4 in Layers 1, 3, and 4, and 0.5
in Layer 2; for RCRS, 0.5 in Layers 1 and 2 and 0.4 in
Layers 3 and 4; for ABCD, 0.5 in Layer 2 and 0.4 in
Layer 4. Triadic coupling coefficients are higher than regular weight coupling coefficients, because they involve
products of four rather than two variables (all valued
between 0 and 1), which needs a higher boost to produce
similar effects. Inhibitory interaction strengths are higher
than excitatory ones.
All variables are initially set to zero except for spatial and
temporal long-term memory weights, which are a mixture of
0s and 1s varying with each experiment as described in each
of the network diagrams.
Anderson, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R. S. (1977).
Distinctive features, categorical perception, and probability learning:
some applications of a neural model. Psychological Review, 84, 413–
451.
Baddeley, A. (1986). Working memory. Oxford, UK: Oxford University
Press.
Banquet, J. -P., Gaussier, P., Contreras-Vidal, J. L., Gissler, A., Burnod, Y.,
& Long, D. (1998). A neural network model of memory, amnesia, and
cortico–hippocampal interactions. In R. W. Parks, D. S. Levine & D. L.
Long, Fundamentals of neural network modeling neuropsychology and
cognitive neuroscience (pp. 77–119). Cambridge, MA: MIT Press.
Bapi, R. S., & Levine, D. S. (1994). Modeling the role of the frontal lobes in
sequential task performance. I. Basic structure and primacy effects.
Neural Networks, 7, 1167–1180.
Bapi, R. S., & Levine, D. S. (1997). Modeling the role of the frontal lobes in
sequential task performance. II. Classification of sequences. Neural
Network World, 1, 3–28.
Barnden, J. A. & Holyoak, K. J. (1994). Analogy, metaphor, and reminding
Advances in connectionist and neural computation theory. Vol. 3.
Norwood, NJ: Ablex.
Barnden, J. A., & Srinivas, K. (1992). Overcoming rule-based rigidity and
connectionist limitations through massively-parallel case-based reasoning. International Journal of Man–Machine Studies, 36, 221–246.
Blank, D.S. (1997). Learning to see analogies: a connectionist exploration.
Unpublished doctoral dissertation, Indiana University.
Bullock, D., & Grossberg, S. (1988). Neural dynamics of planned arm
movements: emergent invariants and speed–accuracy properties during
trajectory formation. Psychological Review, 95, 49–90.
Burnod, Y., Grandguillaume, P., Otto, I., Ferraina, S., Johnson, P. B., &
Caminiti, R. (1992). Visuo-motor transformations underlying arm
movements toward visual targets: a neural network model of cerebral
cortical operations. Journal of Neuroscience, 12, 1435–1453.
Burns, B. D. (1996). Meta-analogical transfer: transfer between episodes of
analogical reasoning. Journal of Experimental Psychology: Learning,
Memory and Cognition, 22, 1032–1048.
Carpenter, G. A., & Grossberg, S. (1987). ART 2: Self-organization of
stable category recognition codes for analog input patterns. Applied
Optics, 26, 4919–4930.
Cohen, M. A., & Grossberg, S. (1987). Masking fields: a massively parallel
architecture for learning, recognizing, and predicting multiple groupings of data. Applied Optics, 26, 1866–1891.
Cook, D. J. (1994). Defining the limits of analogical planning. In S. J.
Hanson, G. A. Drastal & R. L. Rivest, Computational theory and
natural learning systems (pp. 65–80). Vol. 1. Cambridge, MA: MIT
Press.
Dehaene, S., Changeux, J., & Nadal, J. (1987). Neural networks that learn
temporal sequences by selection. Proceedings of the National Academy
of Sciences, 84, 2727–2731.
Diamond, A. (1991). Frontal lobe involvement in cognitive changes during
the first year of life. In K. Gibson, M. Konner & A. Patterson, Brain and
behavioral development. Chicago, IL: Aldine Press.
Dormand, J. R., & Prince, P. J. (1980). A family of embedded Runge–
Kutta formulae. Journal of Computing: Applied Mathematics, 6,
19–26.
Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive
architecture: a critical analysis. In S. Pinker & J. Mehler, Connections
and symbols (pp. 3–71). Cambridge, MA: MIT Press.
Furl, N.O. (1999). Category induction and exception learning. Unpublished
master’s thesis, University of Texas at Arlington.
Fuster, J. M. (1997). The prefrontal cortex, (3rd Ed.). New York: Raven.
Gentner, D., Rattermann, M. J., & Forbus, K. D. (1993). The roles of
similarity in transfer: separating retrievability and inferential soundness. Cognitive Psychology, 25, 524–575.
Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and
N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183
regulation of behavior by representational memory. Handbook of
Physiology, 5, 373–417.
Goodall, J. (1990). Through a window: my thirty years with the
chimpanzees of Gombe. Boston, MA: Houghton Mifflin.
Goswami, U. (1991). Analogical reasoning: what develops? A review of
research and theory. Child Development, 62, 1–22.
Goswami, U. (1992). Analogical reasoning in children. Hove, UK: Lawrence Erlbaum Associates.
Goswami, U. (1998). Cognition in children. Hove, UK: Psychology Press/
Erlbaum.
Grossberg, S. (1982). Studies of mind and brain: neural principles of learning, perception, development, cognition and motor control. Dordrecht:
Reidel.
Grossberg, S. (1987). Competitive learning: from interactive activation to
adaptive resonance. Cognitive Science, 11, 23–63.
Grossberg, S. (1988). Neural networks and natural intelligence.
Cambridge, MA: MIT Press.
Grossberg, S., & Mingolla, E. (1985). Neural dynamics of form perception:
boundary completion, illusory figures and neon color spreading.
Psychological Review, 92, 173–211.
Gruber, H. E. & Vonèche, J. J. (1995). The essential Piaget. Northvale, NJ:
Jason Aronson.
Guigon, E., Dorizzi, B., Burnod, Y., & Schultz, W. (1995). Neural correlates of learning in the prefrontal cortex of the monkey: a predictive
model. Cerebral Cortex, 5, 135–147.
Hammond, K. J. (1989). Case-based planning: viewing planning as a
memory task. Boston, MA: Academic Press.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of
neural computation. Reading, MA: Addison-Wesley.
Hestenes, D. (1998). Modulatory mechanisms in mental disorders. In D. J.
Stein & J. Ludik, Neural networks and psychopathology (pp. 132–164).
New York: Cambridge University Press.
Hofstadter, D. R. Fluid Analogies Research Group (1995). Fluid concepts
and creative analogies: computer models of the fundamental mechanisms of thought. New York: Basic Books.
Holyoak, K. J. & Barnden, J. A. (1994). Analogical connections. Advances
in connectionist and neural computation theory, Vol. 2. Norwood, NJ:
Ablex.
Holyoak, K. J., & Thagard, P. (1995). Mental leaps: analogy in creative
thought. Cambridge, MA: MIT Press.
Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of
structure: a theory of analogical access and mapping. Psychological
Review 104, 427–466.
Indurkhya, B. (1991). Metaphor and cognition. Cambridge, MA: MIT
Press.
Jani, N.G. (1991). Through the eyes of metaphor: analogy based learning.
Unpublished master’s thesis, University of Texas at Arlington.
Kant, J.-D. (1995) Categ_ART: a neural network for automatic extraction
of human categorization rules. In ICANN’95 Proceedings, Vol. 2, (pp.
479–484).
Kant, J.-D., Levine, D.S. (1998). RALF: A simplified neural network model
of rule formation in the prefrontal cortex. Presentation at the 3rd International Conference on Computational Intelligence and Neuroscience,
Research Triangle Park, NC.
Klopf, A. H. (1988). A neuronal model of classical conditioning. Psychobiology, 16, 85–125.
Kohonen, T. (1984). Self-organization and associative memory. Berlin:
Springer.
Kosko, B. (1986). Differential Hebbian learning. In J. S. Denker, Neural
networks for computing, AIP Conference Proceedings (pp. 265–270).
Vol. 151. New York: American Institute of Physics.
183
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL:
University of Chicago Press.
Levine, D. S. (1991). Introduction to neural and cognitive modeling. Hillsdale, NJ: Lawrence Erlbaum Associates Second edition to appear in
2000.
Levine, D. S. (1996). Modeling dysfunction of the prefrontal executive
system. In J. A. Reggia, E. Ruppin & R. S. Berndt, Neural modeling
of brain and cognitive disorders (pp. 413–439). Singapore: World
Scientific.
Long, D., & Garigliano, R. (1994). Reasoning by analogy and causality: a
model and application. London: Routledge/Chapman and Hall.
von der Malsburg, C. (1973). Self-organization of orientation sensitive cells
in the striate cortex. Kybernetik, 14, 85–100.
Mandler, J. (1990). Recall of events by preverbal children. Annals of the
New York Academy of Sciences, 85, 485–516.
Meltzoff, A. (1990). Towards a developmental cognitive science: the implications of cross-modal matching and imitation for the development of
representations and memory in infancy. Annals of the New York Academy of Sciences 85, 1–37.
Minai, A. A., & Levy, W. B. (1993). Sequence learning in a single trial,
INNS World Congress on Neural Networks, Vol. 2. Hillsdale, NJ: Lawrence Erlbaum Associates (pp. 505–508).
Mitchell, M. (1993). Analogy-making as perception: a computer model.
Cambridge, MA: MIT Press.
Nigrin, A. (1993). Neural networks for pattern recognition. Cambridge,
MA: MIT Press.
Plate, T. (1998). Analogy retrieval and processing with distributed representations. In K. J. Holyoak, D. Gentner & B. Kokinov, Advances in
analogy research: integration of theory and data from the cognitive,
computational, and neural sciences. NBU Series in Cognitive Science.
Rumelhart, D. E., & McClelland, J.PDP Research Group (1986). Parallel
distributed processing: explorations in the microstructure of cognition,
Vols. 1 and 2. Cambridge, MA: MIT Press.
Shastri, L., & Ajjanagadde, V. (1993). From simple associations to
systematic reasoning: a connectionist representation of rules, variables
and dynamic bindings using temporal synchrony. Behavioral and Brain
Sciences, 16, 417–494.
Simon, T., & Klahr, D. (1995). A computational theory of children’s learning about number conservation. In T. Simon & G. S. Halford, Developing cognitive competence: new approaches to process modeling.
Mahwah, NJ: Lawrence Erlbaum Associates.
Simon, T., Hespos, S. J., & Rochat, P. (1995). Do infants understand simple
arithmetic, or only physics? Cognitive Development, 10, 253–269.
Sun, R. (1991). Integrating rules and connectionism for robust reasoning: a
connectionist architecture with dual representation. Unpublished
doctoral dissertation, Brandeis University.
Thompson, R. K. R., Oden, D. L., & Boysen, S. T. (1997). Language-naive
chimpanzees (Pan troglodytes) judge relations between relations in a
conceptual matching-to-sample task. Journal of Experimental Psychology: Animal Behavior Processes, 23, 31–43.
Vosniadou, S. & Ortony, A. (1989). Similarity and analogical reasoning.
New York: Cambridge University Press.
Werbos, P. J. (1993). The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. New York: Wiley.
Wharton, C. M., & Grafman, J. G. (1998). Deductive reasoning and the
brain. Trends in Cognitive Science, 2, 54–59.
Wharton, C. M., Grafman, J. G., Flitman, S. K., Hansen, E. K., Brauner, J.,
Marks, A. R., & Honda, M. (1998). The neuroanatomy of analogical
reasoning. In K. J. Holyoak, D. Gentner & B, Kokinov, Advances in
analogy research: integration of theory and data from the cognitive,
computational, and neural sciences, Proceedings of Analogy’98 Workshop, Sofia, Bulgaria (pp. 260–269).