Neural Networks PERGAMON Neural Networks 13 (2000) 149–183 www.elsevier.com/locate/neunet Contributed article A neural network theory of proportional analogy-making Nilendu G. Jani a, Daniel S. Levine b,* a b Iconoci, Inc., 2528 Oak Brook Drive, Bedford, TX 76021-7223, USA Department of Psychology, University of Texas at Arlington, 501 South Nedderman Drive, Arlington, TX 76019-0528, USA Received 28 May 1999; received in revised form 29 November 1999; accepted 29 November 1999 Abstract A neural network model that can simulate the learning of some simple proportional analogies is presented. These analogies include, for example, (a) red-square:red-circle < yellow-square:?, (b) apple:red < banana: ?, (c) a:b < c:?. Underlying the development of this network is a theory for how the brain learns the nature of association between pairs of concepts. Traditional Hebbian learning of associations is necessary for this process but not sufficient. This is because it simply says, for example, that the concepts “apple” and “red” have been associated, but says nothing about the nature of this relationship. The types of context-dependent interlevel connections in the network suggest a semilocal type of learning that in some manner involves association among more than two nodes or neurons at once. Such connections have been called synaptic triads, and related to potential cell responses in the prefrontal cortex. Some additional types of connections are suggested by the problem of modeling analogies. These types of connections have not yet been verified by brain imaging, but the work herein suggests that they may occur and, possibly, be made and broken quickly in the course of working memory encoding. These working memory connections are referred to as differential, delayed and anti-Hebbian connections. In these connections, one can learn transitions such as “keep red the same”; “change red to yellow”; “turn off red”; “turn on yellow,” and so forth. Also, included in the network is a kind of weight transport so that, for example, red to red can be transported to a different instance of color, such as yellow to yellow. The network instantiation developed here, based on common connectionist building blocks such as associative learning, competition, and adaptive resonance, along with additional principles suggested by analogy data, is a step toward a theory of interactions among several brain areas to develop and learn meaningful relationships between concepts. q 2000 Elsevier Science Ltd. All rights reserved. Keywords: Analogy; Synaptic triads; Working memory; Associative memory; Adaptive resonance theory; Concept learning; Weight transport 1. Introduction 1.1. Analogy-making The capacity for making and learning analogies is clearly at the heart of advanced human cognitive capabilities and of creativity (Holyoak & Thagard, 1995; Indurkhya, 1991; Lakoff & Johnson, 1980). The ability to perform analogical reasoning seems to arise earlier in child development than was previously supposed, in some cases as early as three years of age (Goswami, 1991). Analogy simply means sameness or resemblance of two objects or processes at some level of abstraction. Analogymaking is being able to see familiar things in a different manner than usually, thus enabling some unfamiliar things to look similar to some familiar ones. This process depends on the ability to combine, split and rearrange existing concepts, and on the ability to reason about relationships. * Corresponding author. Tel.: 11-817-272-3598; fax: 11-817-272-2364. E-mail address: [email protected] (D.S. Levine). Analogical reasoning can take on an incredible variety of forms, all of which have different developmental histories. First of all, there is the classic type of analogy which dates back to Aristotle and is familiar on college entrance examinations: the proportional analogy of the type “A is to B as C is to ?.” Second, there are simple statements that something is “analogous” to something else, without specifying the nature of the analogy; this type of analogy is akin to a metaphor. Finally, there is drawing of conclusions in one domain based on results in another domain to which that is analogous. If, for example, an analogy is drawn between an atom (electrons surrounding a nucleus) and a solar system (planets surrounding a sun), it may be possible to conclude details about the atom from corresponding details about the solar system. In an attempt to circumscribe this problem, attention here is restricted to the classical, or proportional, type of analogy. Solving a proportional analogy “A:B < C:D” requires that the organism, or network, characterize the relationship between entities A and B, and in many cases also the relationship between entities A and C. 0893-6080/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved. PII: S0893-608 0(99)00106-9 150 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 This poses a challenge for traditional connectionist theories (for example, Grossberg, 1988; Hertz, Krogh & Palmer, 1991; Levine, 1991; Rumelhart & McClelland, 1986), because many connectionist networks have used some variety of a Hebbian learning rule, whereby an association between two entities is strengthened by co-occurrence. Strengthening or weakening of an association typically gives no clue as to the nature of the association, or the relationship between the two entities. For example, a strong Hebbian connection between nodes representing “Paris” and “France” tells us only that Paris and France are somehow related: it does not tell us that Paris is a part of France, that it is “in” France, or that it is the capital of France. This shortcoming of networks based on simple Hebbian learning rules is one cause for the criticism leveled at connectionist networks by some cognitive scientists such as Fodor and Pylyshyn (1988) who claim that such networks lack “compositionality” and “systematicity.” Hence, this is one reason there have been few neural networks so far that have simulated complex reasoning and inference processes. Yet our brains constitute an “existence proof” that a network constructed using non-linear dynamics, rather than heuristic programming, can perform complex reasoning, including analogical reasoning. This has inspired us to search for a way to model analogy learning and solving by means of a network incorporating principles that have already proved successful in modeling a wide variety of other cognitive and mental processes, including sensory perception (Grossberg, 1987; Grossberg & Mingolla, 1985), pattern categorization (Carpenter & Grossberg, 1987; Cohen & Grossberg, 1987), sequence learning (Bapi & Levine, 1994, 1997; Dehaene, Changeux & Nadal, 1987), and motor control (Bullock & Grossberg, 1988). Some preliminary connectionist models of analogy have already appeared. Mitchell (1993) developed a system called COPYCAT, a hybrid of connectionist and symbolic systems, to model proportional analogies involving strings of letters of the alphabet. Hummel and Holyoak (1997) developed a fully connectionist model for some semantic analogies, based on a previous neural network model of property binding (Shastri & Ajjanagadde, 1993). Both of these networks have yielded valuable insights into the organization of the cognitive processes involved in analogymaking. The premise here, though, is that further advances in connectionist modeling are needed to better approximate the process by which humans actually perform analogyrelated tasks. 2. Experimental data Today there is more psychological (Vosniadou & Ortony, 1989) and developmental (Diamond, 1991; Goswami, 1991, 1992, 1998; Mandler, 1990; Meltzoff, 1990; Simon & Klahr, 1995; Simon, Hespos & Rochat, 1995) data available than neurophysiological (Wharton and Grafman, 1998; Wharton et al., 1998) data on analogy-making and related cognitive skills. This is partly because of the difficulty involved in studying the brain in vivo and localizing the representation of concepts. 2.1. Psychological data A review of data on the development of the analogymaking capacity in young children is given by Goswami (1991). These data hint that the capacity for learning analogies occurs sooner in development than is often thought, frequently between the ages of 2 and 3 years old. The theories of Piaget (see for example, Gruber & Vonèche, 1995) hint that analogical reasoning should not become well established until the stage of formal operational thought, which Piaget believed did not start until about 11 years of age. Yet the data Goswami reviews shows that while analogical reasoning ability increases steadily with age, some proportional analogies that are sufficiently natural can be solved in both semantic and pictorial domains by many 4-year-olds. The data that Goswami reviewed hint that the formal operational development Piaget talked about facilitates analogical reasoning but is not necessary for all cases of it. Yet there are a variety of other factors mediating how early children learn a particular type of analogy (either a proportional analogy or an analogy between problem domains). For example, if the type of associations being mapped are relatively simple ones (e.g. “functional” ones like shoes: feet, or antonyms like cold:hot), or if the knowledge domain is one with which the children are familiar, the analogies are easier to learn. Other data reviewed by Vosniadou and Ortony (1989) hinted that young children do less well than older children or adults on analogies mainly because of a lack of domain knowledge, not because of an inability to reason relationally. In transferring properties from a source to target domain, children are as apt to transfer relational properties as descriptive ones. For example, when told that white blood cells are like soldiers, they will not infer that these cells wear uniforms, but might infer that the cells can die from an infection. Analogical thinking has also been found to occur in chimpanzees. Thompson, Oden and Boysen (1997) tested chimpanzees on a matching to sample task that required that the animals learn relations among relations. Their results did not support a previous conjecture that such relations could only be learned by animals previously trained in language. Rather, chimpanzees who had some experience with the test apparatus and with abstract thinking, but not language per se, could learn such relations readily. Yet there are also interesting analogies that language-trained chimpanzees make when defining words. For example, one languagetrained chimp described a cucumber as a green banana, N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 and another referred to an Alka-Seltzer as a listen drink (Goodall, 1990). All these developmental results lead to a metasuggestion that analogical reasoning is not as sharp a break from lowerorder processes like inter-concept association as is often thought. This suggests further that modeling of analogymaking may be able to utilize some of the same network architectures involved in modeling lower-level thought, with greater complexity and a few additional specialized mechanisms. 2.2. Neurophysiological data Preliminary results from physiological (PET) studies (Wharton et al., 1998) implicate inferior frontal cortex and inferior parietal cortex, in the left hemisphere, as brain regions mediating analogy making. This is consistent with the suggested role of prefrontal cortex and related circuitry in working memory, reasoning, generating and searching through alternatives (e.g. Fuster, 1997; Goldman-Rakic, 1987). These PET results do not, however, yet provide enough data to suggest brain pathways for analogical reasoning processes. This means that models such as ours could provide suggestions for cognitively significant pathways involving parts of the prefrontal and parietal cortices. Perhaps, as suggested elsewhere for other reasoning processes (Levine, 1996), these pathways are hard wired for general analogical capabilities, but the specific content of analogies is learned through modifiable connections with other parts of the cortex. 3. Previous models There has been a variety of computational models proposed for different aspects of analogy learning and formation. Some of them have used a full neural network structure, whereas others have combined partial neural network (connectionist) realizations with elements of symbolic programming from traditional artificial intelligence (Barnden & Holyoak, 1994; Blank, 1997; Burns, 1996; Hofstadter et al., 1995; Holyoak & Barnden, 1994; Hummel & Holyoak, 1997; Long & Garigliano, 1994; Mitchell, 1993; Plate, 1998). There is considerable literature on analogy models that are based on symbolic programming alone, but in the interests of brevity any approaches to analogy that do not include a neural network component are not discussed here: for example, models due to Vosniadou and Ortony (1989), Hammond (1989), Jani (1991), Sun (1991), Gentner, Ratterman and Forbus (1993), and Cook (1994) are not covered. Also, these models have differed widely both in the general cognitive problems they were attempting to solve and in the specific problem domains they were investigating. 151 3.1. Connectionist models of analogy Several of the analogy models in the network literature, starting with Barnden and Srinivas (1992), have been based in problem domains that involve elaborate semantic structures and relationships between sentences. In particular, the LISA model of Hummel and Holyoak (1997) relied on previous models of the process whereby particular entities were bound to particular roles in a sentence (Shastri & Ajjanagadde, 1993). The LISA model was designed to account for the two analogical processes of “access,” that is, how potential analogs in both the source and target domains are retrieved from memory, and “mapping,” the working memory process by which relationships between source and target elements are discerned. Hummel and Holyoak’s model can account for various psychological data on the differential factors influencing access and mapping. For this reason it can reproduce many characteristic human patterns of analogical inference, such as learning close and natural analogies better than it learns logically consistent but contrived analogies. The limitations of this model are that it relies heavily on the assumed previous learning of very high-level abstract concepts. Also, its structure does not appear to be based in any way on biologically realistic models of simpler mental processes. A variation of this type of semantic model is due to Plate (1998). Plate used a form of holographic vector representations to map parts of sentences in a semantic source domain to their closest analogs in the target domain. The changes from one to the other are then treated as a mapping and applied to other elements in the source domain. Blank (1997) developed a model called Analogator that can learn analogies between visual scenes including different geometric objects that could be light or dark. These were not proportional analogies but questions such as “what is the analog of Figure A in Domain B?” In other words, Analogator had to learn to distinguish figure from ground in a novel visual situation, given a prespecified figure–ground distinction in the source scene. 3.2. Hybrid models of analogy Mitchell (1993) utilized COPYCAT, an architecture that is part connectionist and part symbolic, to perform analogies on a circumscribed domain—strings of letters of the alphabet. Characteristic transitions such as successor and predecessor mappings could be learned within this domain. Some of the analogies COPYCAT was designed for were obvious, such as “abc:abd < rst:?.” There were others that involved generalizing mappings from one concept to a related one (a process the author called “conceptual slippage”), such as “abc:abd < kji:?,” or even “abc:abd < abbccc:?.” This network, while severely domain-restricted, captured some mappings that are characteristic of human analogical 152 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 1. Adaptive Resonance Theory (ART) network. inference processes. Also, it found the same analogies difficult or ambiguous that human solvers do. 3.3. Summary of analogy models Previous analogy models have differed enough in their objectives that it is hard to compare them either with each other or with the model presented here. All of them possess detailed capabilities in particular problem domains that the proposed model, which is not designed yet to deal with those domains, lacks. On the other contrary, none of these other models does what the model in here is designed to do: learn some low-level analogies between simple, natural concepts using generic architectural principles that have previously been applied to lower-level mental processes and to other aspects of concept formation, as discussed next. 3.4. Adaptive resonance models of concept learning Neural network classifiers, such as Adaptive Resonance Theory (ART) (Carpenter & Grossberg, 1987), Back Propagation (Rumelhart & McClelland, 1986; Werbos, 1993), Self-Organizing Maps (Kohonen, 1984), and Brain-Statein-a-Box (BSB) (Anderson, Silverstein, Ritz & Jones, 1977), accept exemplar vectors or patterns as input defined over different sensory modality fields and learn to put them into an existing perceptual category or into a new category if they do not belong to any existing category. A network that has learned to classify fruits using their color, taste, and shape would classify an input vector representing red, sweet, and round as an apple, or yellow, sweet, and cylindrical as a banana. Conversely when asked what an apple is, the network can readily describe it as red, sweet and round. Our network is based partly on ART, with some additional processes that we believe facilitate analogy-making. We utilize ART because it is one of the few neural network architectures today that explicitly attempt to model all three basic cognitive processes: sensation, perception, and attention. ART and its extensions are also better equipped than other networks at functioning in a realistic environment, as they make minimal assumptions about their inputs and require minimal preprocessing. Their input patterns can be either spatial with different intensities and orientations, or temporal with asynchronous elements having different duration. ART (Fig. 1) consists of two kinds of nodes in two different layers: one representing sensory features such as red, yellow, sweet, round, and cylindrical in the feature layer, and the other depicting classes of perceptual objects such as apple and banana in the category layer. Nodes in the category layer hold together their corresponding features through self-organized learnable reciprocal connections (also referred to as weights) between the two layers. In ART, as in most other neural networks, short-term and long-term memory involve different methods of pattern storage. Short-term memory is achieved by transiently increasing the activation of certain nodes in the feature layer on receiving the input. Bottom–up signals from the feature layer in turn transiently activate nodes at the category layer. These category nodes then compete via recurrent lateral inhibition, and the input is tentatively interpreted as being in the category coded by the node with largest activation. Then other processes, not essential to the current discussion, are utilized to compare the input with a previously stored category prototype to see if there is sufficient match to make this a permanent classification. Long-term memory is achieved via changes in connections. A pattern is learned, or stored in long-term memory, by gradually setting the strength of the mutual connections between feature and category layers, over repeated exposures to various exemplars. Pattern formation takes place during each presentation of every exemplar, according to an associative learning rule that detects the co-occurrence of specific pattern elements in terms of presence and absence, or different proportions, of these elements, and gains strength with the number of repetitions. Patterns are considered as classified when input through bottom–up (from feature to category layer) connections resonate with top–down (from category to feature layer) expectancies. For N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 153 Fig. 2. Proportional analogy-making: task description. example, after substantial exposure to various kinds of apples, the network would unequivocally activate the node for apple on receiving sensation in the feature nodes “red,” “sweet,” and “round.” Conversely, on exciting the apple node it would activate its corresponding features in an anticipation of what the input should be at the feature layer. If the input does not resonate with the top–down expectations then a mismatch occurs, which in turn triggers either the search for a new category node or the modification of an existing category through further learning. There are several noteworthy attempts at extending ART to model higher level cognition. For example, Nigrin’s (1993) model called SONNET simulates learning of synonyms, while Kant’s (1995) model called Categ_ART and Kant and Levine’s (1998) model called RALF simulate rule formation. Yet analogy-making relies on a variety of cognitive processes that are not captured by other recent extensions of ART. These are processes involved in discerning and mapping relationships and transitions between pairs of concepts. Hence our network, described in the next section, combines the basic building blocks of ART (including associative learning and lateral inhibition) with additional neural structures that are specifically designed to model these types of interconcept relationships. 4. Network architecture Now consider the proportional analogies which are of the form A:B < C:D, where A, B, C, and D could represent any concepts. Typically this kind of analogy is posed as the question: A is to B as C is to what? The answer, D, is arrived at by applying or transferring the relationship defined over a set of attributes of A and B to the relevant attributes of C. The first step in producing the right answer to an analogy question is finding and remembering the relationship defined over a set of attributes of A and B. Rephrased this means finding what transformed A into B. The second step in analogy-making is applying or transferring the relationship to the relevant attributes of C. This can be rephrased as asking what the outcome would be if the same transformation (as the one that took A to B) were applied to C. A more detailed high-level description including other steps in the proportional analogy-making task is given in Fig. 2. The network proposed here accomplishes these steps in analogy making by superimposing on the vector classification structure of ART some additional structure that incorporates relationships between attribute vectors that represent concepts. A description of this additional structure is now given. 4.1. Building blocks The analogy-making neural network developed here is based on the recurrent use of a few fundamental “building blocks.” This design philosophy in cognitive and neural modeling is also described in other books and articles (e.g. Grossberg, 1982; Hestenes, 1998; Levine, 1991; Nigrin, 1993). The first two building blocks are competitive-cooperative interactions (also known as lateral inhibition) and associative learning. These are already in extensive use in modeling and are based on well-established psychological and neurophysiological findings. The rest of 154 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Table 1 Cognitive functions and building blocks Cognitive functions Building blocks Analogy-making and semantic memory: relation learning, transformation rule learning in WM and rule application Synaptic/conceptual triads Non-Hebbian connections Contextual modulation of weights Weight transport Perception: object recognition in STM, applying categorization in LTM, perceptual binding Competitive–cooperative networks Associative (Hebbian) learning Adaptive resonance Opponent processing Sensation and attention: feature and boundary detection, figure– ground separation in STM Competitive–cooperative networks the blocks—relation learning, transition encoding, and rule application—are proposed here for the purposes of the analogy-making network, but could also potentially explain some other cognitive and neural phenomena. These are discussed in more detail, with generic equations, in the next few subsections. The complete detailed analogymaking network equations are discussed later in Appendix A. Table 1 summarizes the utility of different building blocks in modeling different cognitive phenomena. It describes simpler basic cognitive functions at the bottom and climbs up to more complex functions such as analogy-making at the top. 4.2. Nodes, clusters, and layers The analogy-making network is organized in five layers: feature layer F1, category (or perceptual binding or HAS-A) layer F2, abstract category (generalizations or IS-A) layer F3, relation layer F4, and modulator (context or task-specific) layer F5. Each layer in turn is organized in clusters. Each cluster consists of individual nodes representing concepts or percepts of similar significance. Characteristic structure of each layer is illustrated independently in Fig. 3. The first two layers are carried over from standard ART networks. The other three layers are additions made herein and not present in ART. The sensory feature layer F1 is divided into clusters of nodes which detect specific classes of features. For example, each node in the form cluster represents a distinct shape such as round, cylindrical, or square. Each node in the color cluster represents a distinct color such as red, yellow, or orange. Each node in the taste cluster represents a distinct taste such as sweet, sour, or tart. Each node in the word cluster represents a distinct label such as “square-word” or “apple-word.” The category layer F2 is divided into clusters of nodes which bind together separate attributes of coherent recognizable objects. This includes a cluster for fruits (binding form, color, taste, and word) and one for geometric figures (binding just form and word). The abstract category layer F3 consists of nodes representing abstract generalizations such as color, taste, shape, and fruit. Clusters represent types of edibles (fruit or vegetable) or of senses (color, form, taste, or word. Our network reflects the fact that the nature of relations between layers F2 and F3 (IS-A) is not the same as between layers F1 and F2 (HAS-A). For example, while it is appropriate to think of an apple as made up of “red,” “sweet,” and “round,” it is not appropriate to think of a fruit as made up of “apple,” “banana,” and “orange.” The relation layer F4 has one cluster that consists of relation nodes such as “has,” “is,” “forward” and “reverse.” These nodes encode the nature of relationships between F3 and F2 and between F2 and F1. For example, the connection between “apple” and “fruit” is mediated by the node “is,” and the connection between “apple” and “red” by the node “has.” (In ART, by contrast, all relations between nodes in consecutive layers are implicitly of the nature “HAS-A.”) Another cluster in the F4 layer consists of relation nodes representing generic transition categories: “activate,” “suppress,” “maintain,” and “change.” These nodes describe one role for working memory in proportional analogy-making, namely, to temporarily remember the changes in specific features in going from one input pattern to the next. For example, in the transition from apple to banana, “yellow” is activated, “red” is suppressed, “fruit” and “color” are maintained, and “red” is changed to “yellow.” Finally, the modulator layer F5 has one cluster that consists of nodes that represent transition stages in the analogy task. For example, the node that represents the transition “1–2” between first and second items modulates working memory connections between attributes of item 1 and corresponding attributes of item 2. Another cluster of F5 consists of nodes that encode other types of markers for particular contexts within the task, that also modulate specific inter-item connections. An example is the marker for a situation that involves weight transport (see below). Input patterns: As shown in Fig. 4, input pulses to the network are represented as square blocks (indicating the duration for which a particular item is presented to the network). Combined with the exponential decay they become “hat-shaped” pulse activities in the nodes. Duration of each input pulse (on time) and length of an individual presentation step are defined as in the bottom part of Fig. 4. For simplicity, all input pulses are assumed to be of the same duration. The following sections rely on caricatures such as the one given above to explain time courses of several variables on one diagram. These are idealized depictions intended to help better understand the concepts discussed here, and should not be interpreted as actual network outputs. N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 155 Fig. 3. Types of nodes, clusters and layers: (A) Sensory feature layer F1, and clusters. (B) Category or binding layer F2, and clusters. (C) Abstract category layer F3, and clusters. (D) Relation layer F4, and clusters. (E) Modulator (context or task specific) layer F5, and clusters. Input to the feature detector nodes in layer F1 represents an external pattern, whereas direct input to nodes in other layers represents an internally generated pattern. All layers allow working and long-term memory modifiable weights within and across clusters. Short-term memory interactions are allowed only within individual clusters except in specific cases, for example between form and word clusters in F1. 156 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 4. Input pulses and corresponding node activation. 4.3. Node interactions (weights) The analogy-making network requires not only shortterm and long-term memory as modeled in ART-like networks, but also semantic spatio-temporal pattern processing in both working and long-term memory. Tree diagrams in Fig. 5 summarize all the different types of weights used in the analogy-making network. Non-specific working (Fig. 5C) and long-term (Fig. 5B) memory connections are either spatial (that is, between node activities at the same time) or temporal (that is, between node activities at different times). Short-term memory (Fig. 5A) connections are only spatial, whereas task specific working memory (Fig. 5D) connections are only temporal. Spatial connections are node-to-node and obey Hebbian associative learning laws. Temporal connections are either triadic (weight-to-node/node-toweight) or dyadic (node-to-node). Triadic connections follow triadic learning laws (to be explained below). Dyadic connections follows various learning laws that are non-Hebbian in that they do not strictly detect simultaneity in node activities. Klopf (1988) and Kosko (1986) have proposed similar non-Hebbian laws in their models. Decay rates are highest to lowest from short-term to working to long-term memory. Working memory connections use both passive and active decay, whereas long-term memory only uses active decay. Specifically, working memory triadic weights follow additive learning laws. That is, they gain strength during presentation of an item but passively decay as soon as it is removed. Minai and Levy (1993) have studied similar rapid single trial generalizations in hippocampus. Interactions between the weights (to be described in Section 4.4) are summarized along with the types of weights in Fig. 5E. 4.3.1. Triadic temporal weights and relation learning Concept triplets such as “apple, has, and red,” or “red, is, and color,” along with their corresponding connections, are described here as conceptual triads. Conceptual triads are useful in learning the nature of relations. These structures are similar in spirit to synaptic triads that are used in modeling learning of temporal sequences in bird songs by Dehaene et al. (1987). Fig. 6 illustrates a possible use of triads in modeling longterm semantic memory. Clockwise from the top-left quadrant, these triads can be read as “apple has red,” “red is a color,” “color has instance red,” and “red is part of apple.” Alternatively, they can be read as responses to the following queries: (1) what color does an apple have? (red); (2) what is red? (a color); (3) what is the relation between red and color? (is); (4) what is the relation between apple and red? (has). Motivation for conceptual triads: To understand specific connections and learning within a triad, consider the triad “red, is, and color.” The first pair, “red and is,” can be thought of as the query “red is ?” The network is expected to produce “color” as the answer in this case. The second pair, “red and color,” can be thought of as being asked “how is red related to color?” In this case “red” followed by “color” should produce the answer “is.” The third pair, “is and color,” is not uniquely relevant to this triad, in that “is” followed by “color” should not produce “red,” except in a specific context. The relevant relationships are temporal and directional. That is, simultaneous occurrence of “red” and “color” should not produce “is,” and “color” followed by “is” should not produce “red.” Conceptual triads in our network follow a semilocal associative rule that facilitates learning the behavior just described. As shown in Fig. 7, a triad consists of three nodes and three connections (node-to-node, node-to-weight, and weight-to-node). The first three subsections to follow N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 157 Fig. 5. Types of node connections: (A) STM interactions. (B) LTM weights. (C) WM weights. (D) WM task-specific modulator weights. (E) Weight interactions. explain the behavior of each component in a partial triad (that is, a triad without the weight-to-node connection). The last two subsections explain another partial triad (that is, a triad without the node-to-weight connection). In the following discussion, the general roles attributed to the nodes B, A, and C (refer to Fig. 7), are, respectively, source, relation and target. The three connections wBC, wA,BC, and wBC,A, respectively, depict the weights source-to-target, relation-to{source-to-target} and {source-to-target}-to-relation. The latter two weights obey triadic learning rules. Node-to-weight connection: The first step in understanding the triadic construction is to see how the node-to-weight connection is created. Imagine an associative connection wAB 0 between nodes A and B 0 as shown in Fig. 7A. The rate of change of wAB 0 is proportional to the product of the activities of A and B 0 . Now substitute for the activation of node B 0 the activation of the entire associative node-to-node assembly consisting of nodes B, C and weight wBC as shown in the dotted bubble. This substitution is defined as the product of the activities of 158 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 6. Example of relation nodes. B and C and the weight wBC. One way to think of this definition is as the “energy level” of the entire node-tonode assembly. (This sort of substitutions can be carried out recursively to build a network of triads, which lead to representation of relation of relations.) The weight wAB 0 is renamed wA,BC and increases in presence of A, B, C and wBC but not otherwise: dwAB 0 / xA x 0B dt ( where : dwA;BC / xA wBC xB xC dt x 0B ; wBC xB xC wAB 0 ; wA;BC 1 2 This type of learnable connection weights from one node to a weight between two nodes is rarely seen in neural network models because there does not seem to be a biological basis for such a connection at the neuronal level. However, we can suggest a more biologically plausible mechanism whose mathematical dynamics approximate those of Eq. (2) for our node-to-weight connection. As shown in Fig. 8, our suggested mechanism involves adding to the network of nodes A, B, and C an axon collateral to an inter-neuron and some gating interactions between nodes. Guigon, Dorizzi, Burnod & Schlutz (1995) included in their model of sequence learning in the prefrontal cortex some matching neurons (we rename them matching nodes) that combine inputs from two sources via multiplicative gating. Guigon et al. review evidence for such multiplicative combinations occurring in various higher-order sensory and motor areas of cortex, such as multiplication between arm position and visual trajectory in the motor and premotor cortex (Burnod, Grandguillaume, Otto, Ferraina, Johnson & Caminiti, 1992). As explained in the caption of Fig. 8, the node xM2 has dynamics that approximate those of the nodeto-weight connection. The weight-to-node connection (see later in this section), weight transport (see Section 4.4), and inter-weight competition (also see Section 4.4) can also be approximated using suitable combinations of collateral pathways and matching nodes. For ease of exposition, though, we are using the shorthand of representing such complex networks by direct interactions between weights. As discussed earlier, a triad typically involves temporal connections from nodes B to C. Specifically the activity in node B is at a previous time. Substituting “red” for B, “is” for A and “color” for C, the node-to-weight connection wA,BC represents the event “red” followed by “is” and “color.” wA,BC gains strength only when this precise event is repeated. That is, it will not gain strength when “red” is followed by either “color” or “is” alone. Nor will it strengthen when both “is” and “color” are simultaneously present but without “red” in the previous time step. Node-to-node connection: To understand the behavior of the node-to-node connection in a triad, imagine an associative connection wBC between nodes B and C as shown in Fig. 7B. Particularly note the peculiar placement of node A 0 beside the weight wBC. The rate of change of wBC is in this case proportional to the product of the activities of B and C plus the activity of A 0 . Now replace the activity of node A 0 with the activity of the assembly consisting of node A and weight wA,BC as shown in Fig. 7B in the larger dotted bubble on the right, and defined as the product of the node activation xA and the weight wA,BC. The assembly activation can again be thought of as representing the assembly’s “energy level.” The weight wBC can now be seen as receiving contribution from node A via wA,BC as well as from the simultaneous activities in B and C: dwBC / xB xC 1 x 0A dt where : x 0A ; wA;BC xA dwBC / xB xC 1 wA;BC xA dt 3 4 The additive contribution of the assembly node A and weight wA,BC is important in working memory encoding. When the node-to-node connection wBC follows an additive learning law while the node-to-weight connection wA,BC observes a multiplicative learning law, wBC is learnt and forgotten as soon as the inputs to B and C are removed, whereas wA,BC is remembered for a longer time. The utility of this becomes clear in the following working memory scenario: replace B with “red” at the previous time step, A with “1–2,” and C with “yellow.” The function of “1–2” (A) and its triadic connection wA,BC is to remember for future use what happened in moving from item 1—”red” (B) at the previous time step—to item 2—“yellow” (C), while the actual connection from “red” to “yellow” (wBC) is forgotten after “yellow” is removed. Suppose that the N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 159 Fig. 7. Conceptual triads: (A) Node-to-weight connection, wA,BC. (B) Node-to-node connection, wBC. (C) To-node activation, xC. (D) Weight-to-node connection, wBC,A. (E) Relation node activation, xA. (F) Summary. triadic weight from “1–2” has learnt this transition and at some later time the node “1–2” becomes active again. At this point the node “1–2” starts contributing to the weight “red-to-yellow” via its triadic connection, helping that weight gain strength even when neither “red” nor “yellow” is present. The effect of this is to prepare the weight “red-toyellow” in anticipation of “red” becoming active soon. If that happens then the weight “red-to-yellow” will immediately 160 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 assembly here is presynaptic as opposed to postsynaptic in the previous case. Learning in this weight can be interpreted as the detection of the simultaneity in activation of the assembly (the dotted bubble in Fig. 7D) and the node A. The motivation behind connections such as wBC,A is to learn events for example, “red” (B) followed by “color” (C) and “is” (A). ( 0 x B ; wBC xB xC dwB 0 A 0 / x B xA where : 7 dt wB 0 A ; wBC;A dwBC;A / wBC xB xC xA dt Fig. 8. One possible way to approximate effects of a node-to weight connection by the activity of a node, xM2 in the figure. xC 0 is the terminus of an axon collateral and has activity proportional to xB times the weight wBC. xM1 is a matching node (see text) that multiplicatively gates inputs from xC 0 and xA. xM2 is another matching node that in turn gates inputs from xM1 and xC; hence its activity is proportional to xC 0 xM1 / xC 0 xA xC / xA wBC xB xC (with suitable time delays) as in Eq. (2) of the text. This node modulates the weight wBC. follow “red” up with “yellow.” In doing so the triad would have successfully reproduced at a later time what it had experienced previously in going from item 1 to 2. To-node activation: To understand the behavior of the “to-node” activation, imagine an associative connection w 0 BC between nodes B and C as shown on the left in Fig. 7C. The rate of change of the node activity C is proportional to the product of w 0 BC and the sigmoid of node activity B. Now replace the weight w 0 BC with the assembly consisting of node A, node-to-weight weight wA,BC and node-tonode weight wBC as shown in the dotted bubble in Fig. 7C, and defined as the product of weight wA,BC, sigmoid of node activity A, and weight wBC. Node C can now be seen as receiving contribution from node B via this assembly. This means that the rate of change in node activity C is proportional to the product of sigmoidal node activities of A and B, and weights wA,BC and wBC: dxC / w 0BC f xB dt where : w 0BC ; wA;BC f xA wBC 5 dxC / wA;BC f xA wBC f xB dt 6 Substituting “red” for B, “is” for A and “color” for C, and considering the temporal nature of the connection wBC as discussed earlier, the behavior of the “to-node” just described leads to the desirable activation of “color” when “red” is followed by “is.” Weight-to-node connection: The construction of the triadic weight-to-node connection in Fig. 7D is similar to the construction of the node-to-weight connection described earlier. The behavior of this connection is the same as the other, except for the change in direction. That is, the 8 Relation node activation: Activation of the relation node A is aptly understood by replacing B 0 in Fig. 7E with the assembly in the dotted bubble. Node activity A is proportional to the product of the presynaptic assembly activity and the weight wBC,A. The behavior results in the activation of “is” when “red” is followed by “color.” ( 0 f x B ; wBC f xB f xC dxA 0 / wB 0 A f x B where : dt wB 0 A ; wBC;A 9 dxA / wBC;A wBC f xB f xC dt 10 All these relations are summarized in Fig. 7F. 4.3.2. Transition learning and dyadic temporal weights The analogy network makes use of various types of dyadic (node-to-node) temporal connections, which employ non-Hebbian associative rapid learning laws that detect and compare the activation of nodes across different time steps. These laws make use of temporal integration (or a moving average) to represent past activity and to accommodate asynchronous inputs. The integration operation spreads out the node activities over a time window. A significant value for a temporally integrated node activity indicates that the particular node was active sometime in the near past. The comparison of past and present is facilitated by quenching and saturating the node activities with respect to certain thresholds. The following defines the operations of quenching, saturation, inversion, temporal integration, and differential activation. Then we introduce each non-Hebbian learning law that specifically encodes a particular kind of transition—activation, suppression, maintenance, or change. Quenching, saturation, and inversion: Quenching sets the node activity to a pre-specified value if it is below a quenching threshold. Otherwise, the node activity is not changed. Saturation sets the node activity to a pre-specified value if it is above a saturation threshold. Otherwise, the node activity is not changed. Inversion is defined by quenching and saturating node activity, then subtracting this activity from a prespecified value. In Fig. 9A, the first row shows two “regular” N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 161 Fig. 9. (A) Quenching: r x; u r ; c r x if x $ u r ; c r otherwise; Saturation: j x; u j ; c j x if x , u j ; c j otherwise; Quenching and saturation: x^ j r x; u r ; c r ; uj ; c j ; where: u r uj ; c r , c j ; Inversion: ^ c h ux^ 2 c h u where: c h c j : Here u s represent thresholds and c s represent Rt 2 ton x~ h x; ^ c h u r ; c r ; u j ; c j where: u r u j ; x~ h x; node activity levels. (B) Temporal integration: x t 2 t on 1 tavg x^ dt=tavg where: tavg . ton ; x^ j r x; ^ ux^ 2 c h u: (C) Differential activation: x_ ux^ 2 xu: consecutive node activity pulses. The second row shows the effects of applying both quenching and saturation to the node activity pulses in the first row. The third row shows the effects of applying inversion to the pulses in the second row. Temporal integration: Temporal integration (or moving window average) is shown in Fig. 9B. It is defined in terms of a definite integral over the window starting at current time minus the on time of the pulse minus the length of the averaging window, and ending at current time minus the on time of the pulse. That is, the only activities being averaged are those before the start of the current pulse. This prevents the current pulse from being considered as past activity. The duration of the averaging window determines how much past is considered relevant. By averaging the quenched and saturated node activity, it becomes easier to compare the past activity with the present. In Fig. 9B, the first row shows two quenched and saturated consecutive node activity pulses. The second row shows the effects of applying temporal integration to the node activity pulses in the first row. The third row shows the effects of applying quenching and saturation to the pulses in the second row. The fourth row shows the effects of inversion on the pulses in the third row. Differential activation: Differential node activation is defined as the absolute difference between the quenched, saturated and temporally integrated node activity and the quenched and saturated “regular” node activity. This essentially represents the comparison of the past node activity with the present. In Fig. 9C, the second row shows two quenched and saturated consecutive node activity pulses. The first row shows the effects of applying temporal integration, quenching and saturation to the node activity pulses in the second row. The third row shows the absolute difference between the first and second rows. Motivation for non-Hebbian associative rapid learning laws: Consider a network that has built-in sensors for red, yellow, round and cylindrical, but has no prior experience with apples or bananas, and has learned the generalizations such as color and shape (but obviously not fruit). Suppose that network is exposed for the first time ever to an apple and after some time to a banana. Even if the network has not learned the individual concepts of either apple or banana, it should still be able to interpret the transition from apple to banana in terms of temporal changes in individual features. That is, the network should be able to learn local individual transitions in each modality cluster, such as, “red is suppressed,” “yellow is activated,” “color-instance is changed from red to yellow,” “round is suppressed,” “cylindrical is activated,” and “shape-instance is changed from round to cylindrical.” This kind of learning is characteristic of working memory, which includes both retrieval of semantic stores and encoding of new relations that are most pertinent to the present context (Baddeley, 1986; Banquet, Gaussier, Contreras-Vidal, Gissler, Burnod & Long, 1998). Transition categories: The analogy-making network models four kinds of generic working memory transitions: maintenance, activation, suppression, and change. Each one of these transitions is encoded in a separate working memory dyadic temporal connection. The first three transition types are self-weights, that is, they encode transitions for the same node. The last type, change, is encoded in weights between different nodes within the same layer. Relation nodes (including the four generic transition category nodes) in layer F4 mediate dyadic temporal connections through triadic learning laws. Learning in these triadic weights is enabled during every transition by activating the generic transition category nodes. 162 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 10. Transition categories and corresponding conceptual triads: (A) Connections for sameness (maintenance). (B) Time course for sameness. (C) Connections for activation (turning on). (D) Time course for activation. (E) Connections for suppression (turning off). (F) Time course for suppression. (G) Connections for change. (H) Time course for change. Here v represents working memory weights, t represents triadic weights, m represents sameness, o represents activation, s represents suppression and x represents change. The time courses of learning in node-to-node temporal connections are similar but not shown here. Maintenance: The first transition category is “maintenance,” or detection of what remained the same from previous to current presentation. An example is detecting that “red” is on during the presentations of both “apple” and “red,” in the event “apple followed by red.” Fig. 10A shows a conceptual triad consisting of node activity at previous presentation (source), node activity at present (target) and the “maintenance” node (relation). It also shows a dyadic temporal weight between source and target nodes, which is mediated by two temporal triadic weights connecting it to the relation node. All three weights follow a delayed Hebbian learning law. That is, the rate of change in weights is proportional to the product of the previous node activity with the present. The weights gain strength only when a node is on during both its past and present, but not otherwise. The formulations are based on the generic triadic formulations introduced earlier (see Figs. 9 and 10 and their captions for definitions of the following terms): dwvt m ^ / x^m wvm x^x dt 11 dwvm / x^x^ 1 wvt m x^m dt 12 In Fig. 10B, the second row shows two quenched and N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 163 Fig. 11. Task-specific rule learning and application: (A) 1–2 to transition weight connections. (B) Time course for 1–2 to transition weight connections. Here vt vt wvt 1–2 represents the triadic working-memory weight from task-specific node 1–2 to a non-task-specific triadic working-memory weight w : Learning in w1–2 takes place only when there is activation in task-specific nodes 1–2 x^1–2 but no activation in 3–4 x~3–4 . 1–2 receives input when item 2 is presented and item 4 is expected, but not any other time. In contrast 3–4 receives input only when item 4 is expected. saturated consecutive node activity pulses. The first row shows the effects of applying temporal integration, quenching and saturation to the node activity pulses in the second row. The third row shows the quenched and saturated pulse activities in the “maintenance” relation node. The fourth row shows the pulse-like learning in the triadic connections (node-to-weight and weight-to-node) only when there is all three: past, present and “maintenance” node activity. Activation: The second transition category is “activation”, or detecting what turned on during the current presentation that was off before. An example is detecting that “circle” turns on when “red circle” follows “red square.” All weights in an “activation” triad, shown in Fig. 10C, observe a delayed anti-Hebbian learning law. That is, their rate of change is proportional to product of the inverted past activity and the quenched and saturated current activity. dwvt o ^ / x^o wvo x~x dt 13 dwvo / x~x^ 1 wvt o x^o dt 14 In Fig. 10D, the first row shows inverted past activity. The second row shows present node activity. The third row shows the quenched and saturated pulse activities in the “activation” relation node. The fourth row shows the pulse-like learning in the triadic connection. Suppression: The third transition category is “suppression,” or detecting what turned off during current presentation that was on before. An example is detecting that “round” turns off when “red” follows “apple.” All weights in a “suppression” triad, shown in Fig. 10E, observe a different form of delayed anti-Hebbian learning law. That is, their rate of change is proportional to product of past activity and inverted current activity: dwvt s ~ / x^s wvs x^x dt 15 In Fig. 10F, the second row shows the inverted present activity. The first row shows the past activity. The third row shows the quenched and saturated pulse activities in the “suppression” relation node. The fourth row shows the pulse-like learning in the triadic connection. Note that learning in the triadic weights only occurs in the presence of all three activities: past, inverted present and “suppression”. Change: The fourth transition category is “change,” or detecting a change of instance within the same category. An example is detecting that color changes from “red” to “yellow” when “banana” follows “apple”. All weights in a “change” triad, shown in Fig. 10G, observe a differential Hebbian learning law. That is, their rate of change is proportional to product of the differential activities of two different nodes. Further, the differential activity of the source node is multiplied by its past activity, while the differential activity of the target is multiplied by its present activity. This ensures that while the activity in source node is turning off, the activity in the target is turning on: dwvt x / x^x wvx x^A x_A x^B x_B dt 16 dwvx / x^A x_A x^B x_B 1 wvt x x^x dt 17 In Fig. 10H, the first three rows, respectively, show source node A’s past, present, and differential activities. 164 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 The next three rows show similar activities for target node B. The seventh row shows the “change” relation node activity. The last row shows the pulse-like learning in the triadic connection. Learning in the triadic weights happen only when there is activity in the “change” relation node. weight interactions: weight transport and competition between weights. Competition in turn takes two forms: a non-specific kind between three self-weights and a definite kind in which “change” weight competes with “activation” and “suppression” self-weights. 4.3.3. Task specific rule learning and application Generic transition rules are remembered in working memory weights only for the duration of the current item. That is, they capture the transition from the previous to the current item only while the current item is present. Task-specific nodes such as “1–2” and “3–4” prevent forgetting of the transition after items are removed, and still allow single trial transition rule learning. These nodes and their connections remember item to item transitions beyond single steps. For example, node “1–2” remembers the transition from item 1 to 2, even after item 2 is removed. As illustrated in Fig. 11A, task-specific nodes have triadic weights to all working memory temporal connections which include the triadic weights to relation nodes such as “maintain,” “activate,” and so on. These weights obey a multiplicative learning law with an active decay. That is, even after the working memory weights are forgotten, these triadic weights continue to remember particular transition rules to be applied during a later time. Although these weights learn in a similar manner to long-term memory weights, they unlearn more rapidly by using higher decay rates. That is, at a later time when there is activity in “1–2” but no corresponding activity in the original target assembly, triadic weights from “1–2” quickly forget earlier associations and learn newer ones. This makes it possible for a network not to be entrenched in the past. Input to these nodes is task-specific and only during specific item presentations. For example, node “1–2” receives input when item 2 is presented and item 4 is expected, but not any other time. In contrast, node “3–4” receives input only in the latter case. The triadic weight w1–2,{A,BC} from “1–2” to another working memory triadic weight wA,BC is shown in Fig. 11A. Learning in this weight is only enabled when “1–2” is on but not “3–4.” That is, when both “1–2” and “3–4” are on, this weight only facilitates rule application but no learning: 4.4.1. Weight transport Weight transport is suggested here as a mechanism for the effect analogical transfer in analogy-making. That is, in some analogies the transition from item 1 to item 2 be not applied literally to item 3 but generalized. Weight transport is controversial in neural network research because of the difficulty involved in implementing such a mechanism in networks in a biologically plausible manner. However, like the node-to-weight and weight-tonode connections discussed in Section 4.3, weight transport might be approximated through suitable local networks that include “matching nodes” that multiplicatively gate inputs from different sources (Guigon et al., 1995; Levine, 1996; see Fig. 8). Also this kind of weight interaction is distantly analogous to what occurs in back-propagation networks (Rumelhart & McClelland, 1986). Levine (1996) has speculated about the utility of weight transport in explaining the role of prefrontal cortex in drawing inferences about relationships among abstract concepts. Motivation for weight transport: To understand why weight transport is needed in analogy-making and how it works, consider the typical learning and recognition episodes between parents and verbal children. During one episode the parent shows the child a set of objects and while pointing at an individual object, utters the word describing that object. For example, on pointing to an apple the parent utters the word “apple.” The child as expected imitates the parent by repeating the same word after the parent. The underlying rule here is “to verbalize the object descriptor.” Now consider a later episode in which the parent is showing the same set of objects but utters words corresponding to the objects’ colors. For example, on showing an apple, the parent utters the word “red.” As before the child imitates the parent and says the word “red.” Following this, the parent is now pointing to a banana. Suppose that at this moment the child spontaneously generates the response word “yellow” (even before the parent utters it), and not the word “banana.” That is, the child had figured out the new rule for the current episode without any more repetitions: “verbalize that object’s color.” Such episodes are also common even in preverbal children and may occur spontaneously without intentional parental interaction or supervision. This phenomenon is known as “deferred imitation” in developmental psychology (Mandler, 1990). To understand how such rapid rule generalization and application becomes possible, break “verbalize the object’s color” rule in two: first “find the object’s color” and then “verbalize it.” The first rule, finding the color, can be reinterpreted as “maintaining the same color” from the time the dwvt 1–2 / x~3–4 x^1–2 x^A wvt x^B x^C wv dt 18 In Fig. 11B, the first row shows quenched and saturated pulse activities in the “1–2” node. The next row shows similar activity for the “3–4” node. The third row shows pulse-like learning in the working memory triadic weight. The fourth row shows the learning in the “1–2” triadic connection. 4.4. Types of weight interactions The analogy-making network makes use of two types of N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 165 Fig. 12. (A) Weight transport. (B) Time course for weight transport. Here x^z represents activity in task-specific weight-transport node that is turned on during 3–4 transitions and kept off otherwise. That is, this node enables weight transport when item 4 is expected. Weight transport is restricted to temporal working memory self-weights; specifically only to maintenance and suppression weights. object is shown till the time its color is actually found. In case of “apple” followed by the word “red,” this can be stated as “maintaining (or keeping) color red alive” at least prior to the verbalization. Now consider the application of this rule, which is a “transport” to other color instances, for example to yellow in case of the banana. That is, the maintenance of red has spread across the color cluster, resulting in the maintenance of any color instance that is experienced during subsequent presentations. Weight transport mechanism: Weight transport is implemented in our network by transporting one weight to another, that is, by quickly “pulling” all analogous weights to a common value under task-specific modulation. Weight transport is restricted to temporal working memory self-weights as shown in Fig. 12; specifically only to “maintenance” and “suppression” weights. This is because transporting “activation” to all nodes within a cluster lights up the entire cluster. Working memory connectivity dictates that the transport of a “maintenance” weight is only to other “maintenance” weights, and a “suppression” weight only to other “suppression” weights. To understand the weight transport formulation, suppose that all weights are initially zeroed except one. Considering fixed-point asymptotic behavior, that is, the rate of change of each individual weight is zero in the steady state, and a weight is set to the average of the rest of the weights only if it is less than the average, otherwise it remains unchanged. Hence weights with zero initial value quickly approach the non-zero weight, achieving the stated intent of weight transport, that is to “pull” all weights to the same value as the non-zero weight: 1 31 20 N X v w C 7 6B 7 6B j±i j C dwvi C 6B v7 / x^z 6B C 2 wi 7 7 6B N 2 1 C dt A 5 4@ 19 In Fig. 12B, the first row shows quenched and saturated pulse activities in the “1–2” node. The next row shows similar activity in the task specific “transport” node. Input to this modulation node enables weight transport when item 4 is expected. The third row shows pulse-like learning in the ith working memory weight (either “maintenance” or “suppression”). The fourth row shows “transport” from the ith to the jth working memory weight. 4.4.2. Competition between weights An important feature of working memory temporal weights is the competition between them. Other researchers (e.g. von der Malsburg, 1973; Nigrin, 1993) have used similar constructs. Specific competition between “change” and “activation” weights and between “change” and “suppression” weights are examined here (see Fig. 13): dwvs / 2wvx dt 20 dwvo / 2wvx dt 21 Motivation for competition between weights: Observe that a “change” is always accompanied by a corresponding “suppression” in the source activity and an “activation” in the target. For example, the change “red to yellow” in steps 166 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 weights within its modality cluster, that is, potentially a node can “change” to any other node in the cluster. Conversely a node can also serve as target of multiple “change” weights. The connectivity in working memory dictates that inhibition from a “change” weight is only applied to source’s “suppression” and target’s “activation,” but no other temporal weights. Fig. 13. Specific competition between weights: change weight wvx inhibits activation wvo and suppression wvs weights. 1 and 2 is accompanied by suppression of red and activation of yellow. If all three transition weights are learned then this could lead to undesirable effect. For example, that could lead to absurd analogies such as “apple is to banana as square is to yellow-square” or “apple is to banana as redsquare is to square.” To prevent such occurrences, we introduce competition between change weights and activation or suppression weights. That is, the rate of change in node-tonode temporal weights representing “suppression” and “activation” is negatively proportional to the node-to-node temporal weight representing “change.” Note that a node can serve as source to many “change” 4.5. Intralayer and interlayer connectivity Figs. 14 and 15 summarize the types of connections both within and between layers. The competitive and cooperative interactions in short-term memory are strictly spatial and intralayer. The connections in working and long-term memory are both spatial and temporal (i.e. time-delayed), and both interlayer and intralayer. Fig. 14A shows interlayer working and long-term spatial connectivity. Note that there are no spatial learnable connections between other layers and F5, or layers F1 and F4. As shown in Fig. 14B, nodeto-node LTM temporal connections are mediated by triadic weights from layer F4. Similar connections in WM in Fig. 14C are mediated by triadic weights from generic transition categories in layer F4 and task-specific nodes in layer F5. Fig. 15 gives examples of three kinds of spatial and temporal connections. Fig. 14. Types of layer connectivity: (A) Interlayer spatial LTM and WM connectivity. (B) Interlayer temporal LTM connectivity. (C) Interlayer temporal WM connectivity. Here l represents long-term memory weights. N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 167 Fig. 15. Examples of network connectivity: (A) Long-term memory weights. (B) Working memory weights. (C) Task specific modulator weights. Connections within one time-step are spatial while across time-steps are temporal. Temporal weights are mediated by triadic weights from relationship nodes (for example, IS-A or Change), or task specific node (for example, 1–2 or 3–4). Here mqj represents node at previous time-step, { } represents any weight. 5. Simulations This section presents the results of three analogy-making experiments. Simulations of the network equations (given in Appendix A) were carried out in MATLAB using its ode45 function. This subroutine numerically solves a system of ordinary differential equations, based on an explicit Runge–Kutta (4, 5) formula (Dormand & Prince, 1980). It is a “one-step” solver. That is, to compute values at the current time it needs only the solution at the immediately preceding time point. The primary objective of the simulations here is to show that the network can solve proportional analogies. That is, the simulations here individually neither demonstrate ARTlike long-term learning of perceptual objects such as “apple” and “banana,” nor triadic long-term learning of generalizations such as “color” and “fruit,” and relations such as “has” and “is.” A corollary of this is that the long-term memory connections are assumed to exist a priori, and their values are “caricatures” of what the “real” weights would be, if they had been acquired in real-time using ART-like and triadic learning. The initial values of all long-term memory weights here are set to 1. (It should be noted that triadic learning in working memory 168 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 16. SCRY network: LTM spatial weights. Fig. 17. SCRY node activation. (A) Step 1 and 2. (B) Step 3 and 4. N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 169 Fig. 18. SCRY weight vectors: “Maintain,” “Activate,” and “Suppress.” is still demonstrated, for example in generalizing “apple has red” to “fruit has color” in the analogy “apple:red < banana:?.”) The analogy-making task in each simulation is carried out in two different sessions: rule learning and application. The first session includes presentations of item 1 and 2, while the second session includes the presentation of item 3 followed by an expectation of item 4, which is typically the answer of a proportional analogy. The working memory rules learnt during the first session are made available during the second. The simulations presented here explicitly implement the network constructs given in Section 4, except for a few simplifications for ease of computation. The task-specific nodes in layer F5, namely, “1–2,” “3–4,” and “transport,” and the nodes in F4 representing the four generic transition categories “change,” “maintenance,” “activation,” and “suppression,” are not represented as literal nodes in the simulations. Rather, they are represented as direct modulations of corresponding weights, modulations that occur at appropriate time steps in the analogy task. 5.1. Red-square:red-circle < yellow-square:? The first proportional analogy experiment highlights the utility of weight transport and competition between weights in producing the answer “yellow-circle” to the analogy “redsquare:red-circle < yellow-square:?” (to be described by its abbreviation, SCRY). As shown in Fig. 16, only layers F1 and F3 are implemented here. Layer F1 is made up of two clusters: “form” and “color.” Form cluster consists of “square” and “circle” nodes while color cluster consists of “red” and “yellow.” Layer F3 is made up of one cluster which has abstract category (or generalization) nodes “form” and “color.” The rest of the diagrams in this subsection depict the actual simulation runs. Similar figures are presented for each succeeding experiment. The y-axis displays activation in individual nodes, while the x-axis shows time increments. Fig. 17A shows node activities during presentation of items 1 and 2. In this experiment “red-square” is followed by “red-circle.” As can be seen during the first time step there is activity in “red” as well as “square” representing 170 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 19. SCFW Network: LTM spatial weights. Fig. 20. (A) SCFW node activation: Step 1 and 2. (B) SCFW node activation: Step 3 and 4. N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 171 Fig. 21. SCFW weight vectors: “Maintain,” “Activate,” and “Suppress.” “red-square.” There is simultaneous activity in the nodes for the abstractions “form” and “color.” Note that the pulse shapes during the second time step are slightly different. This is because of learning in working memory weights, which starts to contribute to node activities. Fig. 17B shows node activities during the presentation of item 3 and an expectation of item 4. As can be seen, “yellow-square” is presented during the third time step. Weights learned previously are transferred during the expectation of item 4. Due to weight transport of “maintenance” from “red” to “yellow” (see Section 4.4.1), and “change” from “square” to “circle,” the network produces the answer “yellow-circle.” Note the different pulse shapes during time step 4, which is due to the “ready” availability of working memory weights. Fig. 18 shows learning during presentation of item 2 in temporal node-to-node working-memory self-weights. As seen here “red,” “form,” and “color” have learned to “maintain.” That is, these nodes had significant activity during presentations of both items 1 and 2. The only working memory “change” weight that gained strength during the presentation of item 2 was the one from “square” to “circle.” 5.2. Square-form:square-word < circle-form:? The second proportional analogy experiment highlights the utility of rule abstraction, weight transport and competition between weights in producing the answer “circle-word” to the analogy “square-form:square-word < circle-form:?” (abbreviated SCFW). In addition to layers F1 and F3 similar to the ones used in the previous experiment, Fig. 19 shows the use of layer F2 in this experiment with nodes “square-bind” and “circle-bind.” Fig. 20A shows node activities of “square-form” followed by “square-word.” As can be seen during the first time step there is not only activity in “square-form” but also (almost) simultaneous activity in “square-bind” and “form.” Fig. 20B shows node activities during the presentation of item 3 “circle-form,” and an expectation of item 4 during which weights learned previously are transferred. Due to 172 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 22. ARBY network: LTM spatial weights. combined effects of weight transport of “maintenance” to “circle-bind,” and “form-to-word” change, “circle-word” is produced as the answer to this analogy. It should be noted that during step 4 activity is not only confined to “circleword” but is also present in “circle-bind” and “word.” This is due to the long-term memory spatial connections. Fig. 21 shows learning during presentation of item 2 in temporal node-to-node working-memory self-weights. The network learns here that “square-bind” is “maintained.” Note the slight gain in “activate” self-weights of “squareword” and “word.” Similarly a small gain occurs in the “suppress” self-weights of “square-form” and “form.” These gains are in the “suppression” and “activation” selfweights that accompany the “change” weights. The network Fig. 23. ARBY network: “Has” LTM temporal weights. N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 173 Fig. 24. ARBY node activation: (A) Step 1 and 2. (B) Step 3 and 4. not only learned the change weight from “square-form” to “square-word,” but also from the abstract node representing “form” to the one representing “word.” It should be noted that without this generalized rule, the network cannot make the analogy under consideration. 5.3. Apple:red < banana:? The third proportional analogy experiment highlights the utility of triadic learning, rule abstraction, weight transport and competition between weights in producing the answer “yellow” to the analogy “apple:red < banana:?.” (ARBY). As shown in Fig. 22, the layer F3 has two clusters (Edible Objects and Abstract Senses). Temporal (triadic) long-term memory connections to and from the relation node “has” are shown in Fig. 23. Fig. 24 shows node activities of “apple” followed by “red.” Note the simultaneous significant activity in “apple,” “red,” “round,” and “fruit” during time step 1. During time step 2, in addition to activity in “red” and “color,” there is also significant activity in the relation node “has.” This is due to the triadic contribution of the temporal long-term memory weight “apple has red.” Fig. 25 shows learning in working memory self-weights. Note that “red” is maintained, while “apple” and “round” are suppressed. The only change weight learnt in this experiment is from “fruit” to “color.” Note that a direct weight from “apple” to “red” is not learnt here because of working memory connectivity which restricts learning of “change” only within a layer but not across layers. The node-to-node weight “fruit to color” is mediated by a triadic weight from the relation 174 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 Fig. 25. ARBY weight vectors: “Maintain,” “Activate,” and “Suppress.” node “has.” Along with the weights described previously, triadic weights to and from “has” and from “fruit” to “color” are also learnt during the presentation of item 2. This is as if the network has inferred that the same relation as in “apple to red” applies to the generalization “fruit to color.” During the expectation of item 4, maintenance is spread to “yellow” and suppression to “banana” and “cylinder” in the color, fruit, and shape clusters, respectively. This weight transport combined with the generic rule “fruit has color” activates the “yellow” node following the presentation of “banana.” 5.4. Additional experiments We did two more proportional analogy experiments of which the descriptions are given next. The architectures and simulation results are not shown here for brevity, but reflect the same general principles of network organization as do the architectures and results of our first three experiments. The fourth proportional analogy experiment highlights the utility of relation nodes, weight transport and competition between weights in producing the answer “shape” to the analogy “red:color < round:?” First, “red” is presented to the network. This activates “apple” and “color.” Activation of “apple” is relatively high here because the network does not know of many “red-colored things,” which would have otherwise competed with the apple node and kept its activity low. In absence of other red-colored things, the threshold for layer F2 is chosen so that the activity in apple is regarded as “off.” The “IS-A” relation node becomes active when “color” follows “red.” This is due to the long-term triadic weight contribution to “IS-A” from “apple” and “color.” The network produces the answer “shape,” when presented “round” as item 3 followed by the learned working memory triadic weight activating “IS-A.” Among the working memory node-to-node self-weights, N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 color is “maintained,” “IS-A” is activated, and “red” is suppressed. No change weights were learnt during the presentations of items 1 and 2. The last proportional analogy-making experiment highlights the utility of relation and transition learning in generic categories, in producing the answer “D” to the classical analogy “A:B < C:?.” The network here is made up of layers F2 and F4 only. Layer F2 consists of alphabet nodes such as “A,” “B,” “C,” and so on. Although the sensory layer F1 is not modeled in this experiment, the node “A” in layer F2 can be thought of as a binding node that holds together features such as “shape-A” and “word-A” in layer F1. The network does not have any spatial weights. Instead there are two sets of temporal (triadic) long-term memory weights from relation nodes: “forward” and “reverse.” Activation of the node representing item 1, “A,” is followed by activation of item 2, “B.” Through triadic connections, the relation node “forward” is also activated during step 2. The application of previously learned working memory rules to item 3, “C,” produces “D” during step 4 as the answer to the analogy under consideration. In this experiment the only self-weight learned is “activate forward.” In this experiment the only node-to-node working memory weight that encodes “change” is “change A to B.” 6. Discussion 6.1. Issues in analogy-making Modeling proportional analogies of the type “A:B < C:D” poses several challenges. Some but not all of these challenges have been met herein. The first challenge is that the path connecting A to B needs to be generalized. This requires nodes symbolizing generalizations of A and B. For example, the abstract concepts of “color” and “fruit” are respective generalizations of the sensory feature “red” and perceptual object “apple.” The analogy-making network proposed here directly addresses this issue, and thus can solve the analogy “apple:red < banana:?,” by generalizing the transformation “apple: red” to “fruit: color.” The second challenge is to not only remember the literal temporal path from A to B, but also to capture the nature of this transition in terms of relations, such as “has,” “is,” “forward” and “reverse.” The model here resolves this issue by use of relation learning and conceptual triads. It can solve, for example, the analogies “red:color < round:shape,” and “a:b < c:d,” by abstracting the relation “is” in the former case and “forward” in the latter. The third challenge is that the network may not have any prior knowledge of entities A and B. It may have knowledge about the individual features of these entities but no direct experience of them as such. In such cases, it is required to interpret the transformation A to B not in terms of their 175 generalizations but in terms of their components. The network proposed here provides such a mechanism in terms of encoding generic working memory transitions: activation, suppression, maintenance and change. This makes it possible to make the analogy “red-square:redcircle < yellow-square:yellow-circle.” The fourth challenge, for analogies more complex than those studied herein, is that there may exist more than one direct path from A to B. Because the nature of C cannot be anticipated before its presentation, it cannot be decided a priori which path will be the most effective in producing D. Sometimes this is context driven, as in “red:green < yellow:red” (rotation of colors in a traffic signal light) versus “red:green < tomato:cucumber.” The third item “yellow” or “tomato” determines which of the transformations embodying “red:green” is most relevant. To some extent the proposed analogy network provides this capability (also called “conceptual slippage” by Mitchell, 1993). It can make analogies such as “square-form:square-word < circle-form:circle-word,” and “square-form:square-word < red:red-word.” Because the third item is a shape in the first case versus a color in the second, the network has to relax the initial rule “verbalize the shape” into just “verbalize.” The fifth challenge arises when no direct connection between A and B exists. This requires tracing a path comprising more than one link. This sort of multi-link traversal (or search) becomes particularly challenging in networks where nodes have only local but no global visibility. This is because at every node along a given path there could be several possible links to explore. Without some global visibility or guidance a local search may be futile. The network proposed here does not provide a resolution for this issue. For this reason the network here would not be able make the analogy “apple:spoken-word-red < banana:spoken-word-yellow,” which requires a sequence of two transformations: the first to “produce the color of the fruit,” and the next to “verbalize that color.” 6.2. Limitations of this model The simulation of analogy-making herein is in some sense a contrived version of how it is presumably done in real life. Arguably, humans do not perform analogies the way the simulation proceeds about this task, which is to serially go from presentation of item 1 to item 2 to item 3 and not be able to revisit any previous items. A more plausible scenario is where items are presented to the subject all at once and left there for some definite period of time, or presented one-by-one but re-presented on demand. In most real-life situations subjects benefit from the facility of attention, and also from what can be referred to as “mental tagging.” Although our model employs lateral inhibition to “attend” to the active item at current time and also can “remember” what item was presented in which order, it does not “consciously” go back and forth between items. In our network, the analogy-making task is hard-coded. 176 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 That is, although the network can learn on its own the individual transitions between items 1 and 2, it is explicitly told when exactly to apply them to item 3. Learning how to selforganize this task-specific behavior would require extending the network to include generalization across multiple analogy-making episodes (Burns, 1996). As suggested earlier, analogy-making is considered to be a working memory process that may lead to long-term memory depending on the utility of the results. Our network does not address the consolidation of working memory into long-term memory; this is considered in other neural network models (e.g. Banquet et al., 1998). Another limitation of the current stage of our network is its inability to model simple analogies with repetitions (for example “aa:bb < cc:?”). Also, it cannot model analogies with transformations traversing more than one link (see the “fifth challenge” in Section 6.1). 6.3. Conclusions A neural network theory has been introduced that leads to solutions of several commonsense proportional analogies among elementary concepts. The network introduced herein follows an established tradition within the neural network community of breaking a complex cognitive task into its constituent operations and seeking to model those operations. The operations involved in analogy-making include reasoning about relations and combining and splitting concepts, and both of these have been modeled here. The network implementations of relations or mappings between concepts have led to the introduction of several types of connections that are unconventional in the current neural network literature. These include connections that represent activation or suppression of a particular feature, change from one feature to a related one, maintenance of a feature, and transport or generalization of maintenance or suppression weights. Is there a possible way that such unconventional connections might be represented in the actual brain? We have suggested that they might be based on the matching-node implementation shown in Fig. 8, which has been postulated to occur in various areas of sensory, motor, and association cortex (Guigon et al., 1995). The ability of even young children to learn, and to reason based on, some simple mappings of these types suggest that aspects of some of these processes in the association cortex might be hard-wired instead of learned from experience. Or it may be that there are hard-wired circuits for general operations like “activation,” but that learning via long-term memory is necessary to make these circuits represent specific operations like “add yellow.” Further ideas about how this works might be obtained from brain imaging (PET or fMRI) studies that investigate which brain areas are active while people are thinking about particular abstract concepts. There have been some preliminary imaging studies of cognitive tasks, but few that have dealt with thinking about highlevel abstractions. Further extensions of our analogy network can be suggested that deal with related but different cognitive tasks. Instead of proportional analogies, a similar network might be constructed that deals with geometric analogies, as studied by Blank (1997), or analogies from one narrative domain to another, as studied by Vosniadou and Ortony (1989). Also, the network could be varied to deal with similes or metaphors. Finally, the relational and mapping aspects of the network study can be brought to bear on potential neural network analyses of a variety of problems that have traditionally been part of symbolic artificial intelligence. One of these is property inheritance: how do we infer that a general category possesses the properties of a more specific subcategory, or vice versa? Furl (1999) describes a neural network model of property inheritance based on ART (see Section 3.4), which could be integrated with a later stage of our analogy model. Another problem that can possibly be addressed (with explicit representations of abstract categories) is how to implement the “axiom of choice,” that is, to be able to answer queries such as “give me all colors,” “give me a color,” or “give me a different color.” Hence, our network does not solve all problems in analogy learning and solving, nor does it yet point to a testable theory of how the human brain performs these tasks. It is, however, an advance in the direction of forming a plausible connectionist network model of these tasks, based on non-linear dynamics. We believe that our model captures better than previous network models the qualitative essence of results from infants (Goswami, 1998; Vosniadou & Ortony, 1989) and non-human primates (Thompson et al., 1997) suggesting that analogical processes occur earlier in cognitive development than was previously supposed. Other models (e.g. Barnden & Srinivas, 1992; Hummel & Holyoak, 1997) have tended more than ours to base their learned analogical relationships on the complexity of English semantic structure. Moreover, many of the building blocks of our network model have previously been used in models of simpler mental processes such as pattern classification and conditioning. It is thus a step toward the dynamic multilevel unification of our understanding of the mechanistic basis of human cognition. Appendix A. Network equations A.1. Glossary of mathematical symbols used Variables t time x node activation x^ quenched and saturated node activation x~ inverted node activation x integrated node activation x^ quenched integrated node activation x~ inverted integrated node activation N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 x_ differential node activation w learnable weight I input Qualifiers v working memory l long-term memory t triadic connection x change m maintenance o activation s suppression 1–2 transition from step 1 to step 2 3–4 transition from step 3 to step 4 z weight transport modulator Indexes i, j, k, A, B, C node l, m, n layer p, q, r cluster mqj node at previous time-step {} weight {mqj, lpi}, {AB} node-to-node connection {nrk, {mqj, lpi}}, {C,{AB}} node-to-weight connection {{mqj, lpi}, nrk}, {{AB}, C} weight-to-node connection Functions f sigmoid r quench j saturate h invert A.1.1. Node parameters A B C u lr ujl u lr u jl cr cj ch node activity decay maximum node activity minimum node activity quenching threshold for nodes in layer l, used during learning working memory weights saturation threshold for nodes in layer l, used during learning working memory weights quenching threshold for integrated node activities in layer l saturation threshold for integrated node activities in layer l quench down node activations to this value saturate up node activations to this value invert node activations from this value, used during learning self working memory weights A.1.2. Weight parameters D E F working memory weight decay, D , A minimum working memory weight maximum working memory weight A.1.3. Interaction and coupling parameters a mqj;lpi excitatory interaction coefficient from node mqj to lpi, where: lpi mqj 177 bmqj;lpi inhibitory interaction coefficient from node mqj to lpi, where: l m; pi ± qj gmqj;lpi coupling coefficient from node mqj to lpi, where: l m; pi ± qj dnrk;mqj;lpi coupling coefficient from node nrk to weight wvmqj;lpi ; where: l; m ± n emqj;lpi inhibitory interaction coefficient from weight o wvx to weight wvlpi;lpi and weight wvs ; where: lpi;lpi mqj;lpi l m; pi ± qj f1–2;nrk;mqj;lpi coupling coefficient from node “1–2” to ; where: l; m ± n weight wvt nrk;{mqj;lpi} A.1.4. Temporal parameters ton tavg tstep i Ilpi;step t on time of the input pulses time interval over which past node activity is integrated, tavg . ton . 0 time duration of the ith step input, tstep . ton . 0 i input value to node lpi during step (t) A.1.5. Network parameters L Gl Nlp total number of layers number of clusters in layer l number of nodes in cluster p of layer l A.1.6. Functions f x node activation function (e.g. sigmoid, linear, faster-than-linear) zero, if x negative, otherwise x x1 r x; u r ; c r quench x down to c r if x , u r ; otherwise x j x; u j ; c j saturate x up to c j if x . u j ; otherwise x h x; c h invert x to ux 2 c h u step t item present at time t onflag t1 if input pulse is on at time t, otherwise 0 A.1.7. Activation variables x lpi xrlpi xjlpi x^ lpi x~lpi xlpi x^lpi x~lpi x_lpi Ilpi xx xm activation of the ith node in cluster p of layer l at time t quenched activation of the node lpi saturated activation of the node lpi quenched and saturated activation of the node lpi inverted activation of the node lpi moving window average of the node lpi quenched and saturated moving window average of the node lpi inverted moving window average of the node lpi differential activation of the node lpi input to the node lpi activation of the “change” relation node in layer F4 activation of the “maintenance” relation node in layer F4 178 xo xs x1–2 x3–4 xz l ylpi ylt lpi yvlpi yvx lpi yvm lpi yvlpio yvs lpi N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 activation of the “turn on” relation node in layer F4 activation of the “suppress” relation node in layer F4 acitvation of the “1–2” task specific node in layer F5 activation of the “3–4” task specific node in layer F5 activation of the “transport” task-specific node in layer F5 contribution of the spatial long-term memory weights to the node lpi contribution of the triadic long-term memory weights to the node lpi contribution of the spatial working memory weights to the node lpi contribution of the triadic working memory weights encoding “change” to the node lpi contribution of the triadic working memory weights encoding “maintenance” to the node lpi contribution of the triadic working memory weights encoding “turning on” to the node lpi contribution of the triadic working memory weights encoding “suppression” to the node lpi A.1.8. Long-term memory weight variables wlmqj;lpi long-term memory weight from node mqj to node lpi, where: pi ± qj wlmqj;lpi long-term memory weight from node mqj at previous time-step to node lpi at current time-step triadic long term memory weight from node wlt nrk;{mqj;lpi} nrk at current time-step to wlmqj;lpi where: l; m ± n triadic long term memory weight from weight wlt {mqj;nrk};lpi wlmqj;nrk to node lpi at current time-step where: m; n ± l A.1.9. Working memory weight variables wvmqj;lpi spatial working memory weight from mode mqj to node lpi, where: l ± m; pi ± qj wvmqj;lpi temporal working memory weight from node mqj at previous time-step to node lpi at current time; wvm ; step (could stand for any of the wvx mqj;lpi lpi;lpi vo vs wlpi;lpi ; wlpi;lpi ) working memory weight, encoding “change” from wvx mqj;lpi node mqj at previous time-step to node lpi at current time-step, where: l m; pi ± qj o working memory weight, encoding “turning on” of wvlpi;lpi node lpi from previous to current time-step working memory weight, encoding “maintenance” wvm lpi;lpi of node lpi from previous to current time-step working memory weight, encoding “suppression” wvs lpi;lpi of node lpi from previous to current time-step summation of working memory weights encoding w4 lpi;lpi “maintenance”, “turning on” and “suppression” of node lpi from previous to current time-step wvt triadic working memory weight from node nrk;{mqj;lpi} nrk at current time-step to weight wvmqj;lpi ; where: l; m ± n triadic working memory weight from weight wvt {mqj;nrk};lpi wvmqj;nrk ; to node lpi at current time-step, where: m; n ± l triadic working memory weight from wvt 1–2;{nrk;{mqj;lpi}} node “1–2” at current time-step to weight wvt nrk;{mqj;lpi} vt w{nrk;{mqj;lpi}};1–2 triadic working memory weight from weight wvt to node “1–2” at current timenrk;{mqj;lpi} step A.2. Node activity equations Node activities in the analogy-network behave according to Eq. (22). The first term on the right-hand side represents passive exponential decay. The second term represents external input. The third term represents excitatory influences (in the first square bracket) that are shunted by how far the node activity is from its maximum. The fourth term represents inhibitory influences (in the second square bracket) that are shunted by how far the node activity is from its minimum dxlpi 1 l v 2Axlpi 1 Ilpi 1 B 2 xlpi ylpi 1 ylt lpi 1 ylpi dt 1 yvx lpi 1 yvm lpi 1 yvlpio 1 Gm N mq L X X X amqj;lpi f xmqj m1 q1 j1 Gm N mq L X X X 1 b f x 2 xlpi 2 C1 yvs lpi mqj;lpi mqj 22 m1 q1 j1 For the immediate purposes of the analogy-making network, the external input can be regarded as a time-varying user-specified quantity. But for the sake of complete description of the network, Eqs. (23)–(25) show the specific computations for converting an initial scalar input parameter into a time-varying quantity where : Sk # t , Sk11 ; Sk step t k k X tstep 23 i i0 ( onflag t 1 if Sk # t , Sk 1 ton 0 otherwise Ilpi t onflag t × Ilpi;step t 24 25 Seven excitatory influences in Eq. (22), respectively, represent contributions from the following weights: spatial LTM, temporal (triadic) LTM, spatial WM, temporal (triadic) “change” WM, temporal (triadic) “maintenance” WM, temporal (triadic) “activation” WM, and fixed spatial STM N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 cooperation. Two inhibitory influences represent contributions from temporal (triadic) “suppression” WM weights and fixed spatial STM competition. Each one of these contributions (except fixed STM interactions) is individually explained next. Contribution of spatial LTM weights to the current node is defined in Gm N mq L X X X yllpi wlmqj;lpi f xmqj 26 m1 q1 j1 f x 1 1 1 e2qx1w 27 Gm N Gn X Nnr mq X L X L X X X m1 q1 j1 × 1 wlmqj;lpi m1 q1 j1 wlt f xnrk nrk;{mqj;lpi} 1 Gm N Gn X Nnr mq X L X L X X X Gn X Nnr L X X n1 r1 k1 n1 r1 k1 wvt wvx f xnrk {mqj;nrk};lpi mqj;nrk 30 Contribution of temporal (triadic) “maintenance” WM weights is defined in Eq. (15) as an addition of two terms. The first term reflects the current node considered to be the “to-node” in two different kinds of triads where the “relation nodes“ are, respectively, “maintenance” and “1–2.” The second term reflects considering the current node as the “relation node” of a triad. Note again the use of past activity 1 Gm N mq L X X X m1 q1 j1 wvt ; wvm f xmqj f x^mqj m{mqj;mqj};lpi mqj;mqj 31 l wlt w f x nrk {mqj;nrk};lpi mqj;nrk n1 r1 k1 (28) Contribution of spatial WM weights to the current node is defined in Eq. (29): Gm N mq L X X X f xx wvt x;{mqj;lpi} vx vt ^mqj wvt f x 1 w f x × w f x nrk 1–2 nrk;{mqj;lpi} 1–2{mqj;lpi} mqj;lpi f x^mqj m1 q1 j1 yvlpi Gm N mq L X X X vm vt vt ^ yvm lpi wm;m{lpi;lpi} f xm 1 w1–2;m{lpi;lpi} f x1–2 × wlpi;lpi f xlpi n1 r1 k1 Gm N Gn X Nnr mq X L X L X X X f x^mqj yvx lpi ×f x^mqj Contribution of temporal (triadic) LTM weights to the current node is defined in Eq. (28) as an addition of two terms. The current node is considered to play the role of “tonode” in a triad (see Section 4.3.1) The second term considers the current node as the “relation node” of a triad ylt lpi past activity in Eq. (30) m1 q1 j1 where f is the sigmoid function 179 wvmqj;lpi f xmqj 29 m1 q1 j1 Contribution of temporal (triadic) “change” WM weights is defined in Eq. (30) as an addition of two terms. The second term reflects the current node as the “relation node” of a triad. The first term reflects the current node considered to play the role of “to-node,” but in three different kinds of triads. This is shown as the three additive terms inside the first square bracket. The first of which considers current node as the “to-node” of a triad that has “change” as its “relation node.” The second term concerns a triad that has any other “relation node” such as “has,” “is,” “forward,” and so on. The third term is from a triad that has the taskspecific node “1–2” as its “relation node.” Note the use of Contribution of temporal (triadic) “activation” WM weights is defined in Eq. (32) It can be explained as in Eq. (31) by replacing “maintenance” by “activation.” Note the use of inverted past activity (see Section 4.3.2.) o f xo 1 wvt f x1–2 × wvlpi;lpi f x~lpi yvlpio wvt o;o{lpi;lpi} 1–2;o{lpi;lpi} 1 Gm N mq L X X X o wvt ; wvmqj;mqj f xmqj f x~mqj o{mqj;mqj};lpi m1 q1 j1 32 Contribution of temporal (triadic) “suppression” WM weights is defined in Eq. (33). It can be explained as in Eq. (31) by replacing “maintenance” by “suppression.” vt vt vs ^ yvs lpi ws;s{lpi;lpi} f xs 1 w1–2;s{lpi;lpi} f x1–2 × wlpi;lpi f xlpi 1 Nmq Gm X L X X m1 q1 j1 wvt ; wvs f xmqj f x^mqj mqj;mqj s{mqj;mqj};lpi 33 The various node activity operations given in Eqs. (34)– (41) are similar to one another but modified to reflect individual node indexes pertinent to different clusters and layers of a network. They also reflect additional constrains on different thresholds, and saturation, quenching and inverting 180 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 values: dwvt nrk{mqj;lpi} ( xrlpi r xlpi ; ul r ; c r x lpi if xlpi $ ul r cr otherwise dt 34 where : c r C xjlpi j x lpi ; ulj ; c j 8 < x lpi if xlpi , ulj : otherwise cj 35 h x~ lpi h x^lpi ; c ux^ lpi 2 c u; x lpi t 2 t on 1 tavg t avg 43 1f1–2;nrk;mqj;lpi wvt x^ } 1–2;{nrk{mqj;lpi}} 1–2 dwvt {mqj;lpi};nrk dt x^ lpi j r x lpi ; ulr ; c r ; ujl ; c j Zt 2 ton 1 F 2 wvt 1 nrk{mqj;lpi} ×{dnrk;mqj;lpi x^nrk wvmqj;lpi x^mqj x^lpi where : c j B; u lr ujl h 2D wvt 2 E1 wvt nrk{mqj;lpi} nrk{mqj;lpi} 36 h j where : c c 37 2D wvt 2 E1 wvt {mqj;lpi};nrk {mqj;lpi};nrk 1 F 2 wvt 1 {mqj;lpi};nrk 44 × {dmqj;lpi;nrk wvmqj;lpi x^mqj;lpi x^mqj x^lpi x^nrk x^lpi dt 38 x^lpi j r x lpi ; u lr ; c r ; u jl ; c j ; where : u lr u jl 39 x~ lpi h x^lpi ; c h ux^ lpi 2 c h u; where : c h c j 40 x_lpi ux^lpi 2 x^lpi u 41 A.3. Long-term memory equations No LTM weights are changed in this network. That is, the rate of change is set to zero for all four kinds of LTM l ; temporal node-to-node wlmqj;lpi ; weights: spatial wmqj;lpi ; and temporal temporal (triadic) node-to-weight wlt nrk;{mqj;lpi} : (triadic) weight-to-node wlt {mqj;nrk};lpi A.4. Working memory equations The rate of change of spatial WM weights is defined in Eq. (42). It is similar to the standard Hebbian law, with the modifications that the negative influence is shunted by how far the weight is from its minimum, while the positive influence is shunted by how far the weight is from its maximum dwvmqj;lpi 2D wvmqj;lpi 2 E1 wvmqj;lpi 1 gmqj;lpi dt 42 x^ } 1 f1–2;mqj;lpi;nrk wvt 1–2;{{mqj;lpi};nrk} 1–2 The triadic weights in Eqs. (43) and (44) are learning “changes,” that is, they are connected to weights that have different source and target nodes. For brevity, triadic weights from “maintenance,” “activation,” and “suppression” relation nodes are not given here, but their rates of change are similar to Eqs. (43) and (44). For the temporal node-to-node WM weight encoding “change” as defined in Eq. (45), decay and product of negative and positive influences are similar to Eq. (43). Positive influences are from three sources. The first two terms in the curly bracket depicts the influence of differential Hebbian learning in a triad made up of source, target, and “change” nodes. The third term in the curly bracket represents influence of triadic weights from relation nodes such as “has” or “is.” The fourth term is due to the triadic weight from the taskspecific node “1–2.” dwvx mqj;lpi dt 2 E1 wvx 1 F 2 wvx 1 2D wvx mqj;lpi mqj;lpi mqj;lpi n × gmqj;lpi x^mqj x_mqj x^lpi x_lpi x^ 1 dx;mqj;lpi wvt x;{mqj;lpi} x F 2 wvmqj;lpi 1 x^mqj x^lpi The rate of change in temporal triadic WM weights is defined in Eqs. (43) and (44), which, respectively, describe the behavior of node-to-weight and weight-to-node triadic weights. In addition to the positive influence due to triadic learning (as shown in the first term in the curly bracket), they are also influenced by the triadic weight from the taskspecific “1–2” node. This is shown as the second term in the curly bracket and is analogous to a regular triadic weight’s influence on its node-to-node weight (see Section 4.3.1) 1 Gn X Nnr L X X dnrk;mqj;lpi wvt x^ nrk;{mqj;lpi} nrk n1 r1 k1 x^ 1 d1–2;mqj;lpi wvt 1–2;{mqj;lpi} 1–2 o 45 For convenience, the summation of three self-WM weights is defined in Eq. (46) and is used in Eqs. (47)– (49). This quantity exerts an inhibitory influence in Eq. (47), and thereby represents non-specific competition N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 dwvs lpi;lpi between self-weights o wvm 1 wvlpi;lpi 1 wvs w4 lpi;lpi lpi;lpi lpi;lpi 46 The rate of change in temporal node-to-node WM selfweight encoding “maintenance” is defined in Eq. (47). Decay and multiplication of negative and positive influences are similar to Eq. (43) except that the negative influence is from all self-weights (46) instead of just “maintenance” self-weight. The first two terms in the curly bracket depicts the influence of delayed Hebbian learning in a triad made up of source, target and “maintenance” nodes (see Section 4.2). The third term is due to the triadic weight from the task-specific node “1–2.” The fourth term represents influence of “weight transport” from other “maintenance” self-weights within the same cluster (see Section 4.4.2) dwvm lpi;lpi x^ 1 dz;lpi;lpi x^z 1 d1–2;lpi;lpi wvt 1–2;{lpi;lpi} 1–2 Nlp X 11 9 > > C C = C C A A > > 2 wvm ; lpi;lpi 1 wvm lpj;lpj BB BB @@ j1;j±i Nlp 2 1 47 The rate of change in temporal node-to-node WM self-weight encoding activation is defined in Eq. (48). This equation is similar to Eq. (47) with three modifications: additional negative influence from “change” weight (see Section 4.3.2), delayed anti-Hebbian learning (see Section 4.3.2), and absence of “weight transport.” The rate of change in temporal node-to-node WM self-weight encoding “suppression” is defined in Eq. (49), which is similar to Eq. (47) with two changes: competition from “change” weights and delayed anti-Hebbian learning o dwvlpi;lpi dt o 2D wvlpi;lpi 1 2 E w4 lpi;lpi Gm N mq L X X X m1 q1 j1 1 w4 lpi;lpi 1 Nmq Gm X L X X m1 q1 j1 ! elpi;mqj wvx lpi;mqj 1 1 F 2 wvs lpi;lpi 8 > > < x^ × glpi;lpi x^lpi x~lpi 1 ds;lpi;lpi wvt s;{lpi;lpi} s > > : 1 d1–2;lpi;lpi wvt x^ 1 dz;lpi;lpi x^z 1–2;{lpi;lpi} 1–2 00 Nlp X 11 9 > > C C = C C A A > > 2 wvs ; lpi;lpi 1 BB BB @@ j1;j±i Nlp 2 1 wvs lpj;lpj 49 The primary difference between the task-specific WM weights described here and the temporal WM weights described in the last subsection is that the former use active decay (see Section 4.3) whereas the latter use passive decay. Task-specific weights are only triadic, that is they are not from a node to another node but rather to an “assembly” and vice versa. The definitions here are given for connection to and from “1–2” node. Definitions for connections from other nodes such as “3–4” are similar but not shown here for brevity. Eq. (50) describes the rate of change of a weight between presynaptic “1–2” and a postsynaptic “assembly” which is another triadic(not task-specific) weight. Eq. (51) similarly describes the rate of change of a task-specific weight in the reverse direction, that is, with switched presynaptic and postsynaptic roles dwvt 1–2;{nrk;{mqj;lpi}} dt x~3–4 x^1–2 × { 2 D wvt 2 E1 wvt 1–2;{nrk;{mqj;lpi}} 1–2;{nrk;{mqj;lpi}} 1 F 2 wvt 1 f1–2;nrk;mqj;lpi 1–2;{nrk;{mqj;lpi}} x^ x^ } x^ wv wvt nrk;{mqj;lpi} nrk mqj;lpi mqj lpi dwvt {nrk;{mqj;lpi}};1–2 o emqj;lpi wvx 1 1 F 2 wvlpi;lpi mqj;lpi dt 48 50 x~3–4 wvt x^ x^ x^ wv nrk;{mqj;lpi} nrk mqj;lpi mqj lpi 2 E1 wvt × { 2 D wvt {nrk;{mqj;lpi}};1–2 {nrk;{mqj;lpi}};1–2 1 F 2 wvt 1 fnrk;mqj;lpi;1–2 x^1–2 } {nrk;{mqj;lpi}};1–2 x^ × {glpi;lpi x~lpi x^lpi 1 do;lpi;lpi wvt o;{lpi;lpi} o x^ } 1 d1–2;lpi;lpi wvt 1–2;{lpi;lpi} 1–2 2D wvs 2 E1 lpi;lpi A.5. Task-specific working memory equations 2D wvm 2 E1 w4 1 F 2 wvm 1 lpi;lpi lpi;lpi lpi;lpi dt 8 > > < × glpi;lpi x^lpi x^lpi 1 dm;lpi;lpi wvt x^ m;{lpi;lpi} m > > : 00 dt 181 51 The definitions in Eqs. (52) and (53) are the same as in Eqs. (50) and (51) except that the “assembly” in here represents a node-to-node weight instead of a triadic weight as in the 182 N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 References previous case dwvt 1–2;{mqj;lpi} dt x~3–4 x^1–2 × { 2 D wvt 2 E1 1–2;{mqj;lpi} wvt 1 F 2 wvt 1 1–2;{mqj;lpi} 1–2;{mqj;lpi} d1–2;mqj;lpi wvmqj;lpi x^mqj x^lpi } dwvt {mqj;lpi};1–2 dt 52 x~3–4 wvmqj;lpi x^mqj x^lpi 2 E1 wvt × { 2 D wvt {mqj;lpi};1–2 {mqj;lpi};1–2 1 F 2 wvt 1 dmqj;lpi;1–2 x^1–2 } {mqj;lpi};1–2 53 Appendix B. Parameters and initial conditions Parameters can be classified into five categories: node, weight, interaction and coupling, temporal and network. Most of these parameters have values that are common across all experiments. These values are A 1:5; B 1:0; C 0:0; D 1:0; E 1:0; F 0:0; u 20:0; f 10:0; u rl 0:05; u jl 0:05; c r 0:0; c j 1:0; c h 1:0; a 1:0; b 2:0; e 5:0; g 1:0; d 1:0; f 5:0; ton 1:0; tstep 25; tavg 25; L 4: Input values are 1 (on) during on time and 0 (off) otherwise. Maximum and minimum values for both node activities and weights are 1 and 0. Choices for other parameters such as short-term and working memory decay and interaction and coupling coefficients are made by considering the steady state values of the system variables. Short-term decay is typically higher than working memory decay. Saturation and quenching values are 1 and 0, respectively. Inversion value is chosen as 1. Quenching and saturation thresholds for temporally integrated activities are kept the same, with value typically lower than the threshold for regular node activities. These quenching and saturation values for the different analogy experiments are as follows: for SCRY, 0.4 in Layer 1 and 0.2 in Layer 3; for SCFW, 0.5 in Layer 1, 0.3 in Layer 2, and 0.4 in Layer 3; for ARBY, 0.4 in Layers 1, 3, and 4, and 0.5 in Layer 2; for RCRS, 0.5 in Layers 1 and 2 and 0.4 in Layers 3 and 4; for ABCD, 0.5 in Layer 2 and 0.4 in Layer 4. Triadic coupling coefficients are higher than regular weight coupling coefficients, because they involve products of four rather than two variables (all valued between 0 and 1), which needs a higher boost to produce similar effects. Inhibitory interaction strengths are higher than excitatory ones. All variables are initially set to zero except for spatial and temporal long-term memory weights, which are a mixture of 0s and 1s varying with each experiment as described in each of the network diagrams. Anderson, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R. S. (1977). Distinctive features, categorical perception, and probability learning: some applications of a neural model. Psychological Review, 84, 413– 451. Baddeley, A. (1986). Working memory. Oxford, UK: Oxford University Press. Banquet, J. -P., Gaussier, P., Contreras-Vidal, J. L., Gissler, A., Burnod, Y., & Long, D. (1998). A neural network model of memory, amnesia, and cortico–hippocampal interactions. In R. W. Parks, D. S. Levine & D. L. Long, Fundamentals of neural network modeling neuropsychology and cognitive neuroscience (pp. 77–119). Cambridge, MA: MIT Press. Bapi, R. S., & Levine, D. S. (1994). Modeling the role of the frontal lobes in sequential task performance. I. Basic structure and primacy effects. Neural Networks, 7, 1167–1180. Bapi, R. S., & Levine, D. S. (1997). Modeling the role of the frontal lobes in sequential task performance. II. Classification of sequences. Neural Network World, 1, 3–28. Barnden, J. A. & Holyoak, K. J. (1994). Analogy, metaphor, and reminding Advances in connectionist and neural computation theory. Vol. 3. Norwood, NJ: Ablex. Barnden, J. A., & Srinivas, K. (1992). Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning. International Journal of Man–Machine Studies, 36, 221–246. Blank, D.S. (1997). Learning to see analogies: a connectionist exploration. Unpublished doctoral dissertation, Indiana University. Bullock, D., & Grossberg, S. (1988). Neural dynamics of planned arm movements: emergent invariants and speed–accuracy properties during trajectory formation. Psychological Review, 95, 49–90. Burnod, Y., Grandguillaume, P., Otto, I., Ferraina, S., Johnson, P. B., & Caminiti, R. (1992). Visuo-motor transformations underlying arm movements toward visual targets: a neural network model of cerebral cortical operations. Journal of Neuroscience, 12, 1435–1453. Burns, B. D. (1996). Meta-analogical transfer: transfer between episodes of analogical reasoning. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 1032–1048. Carpenter, G. A., & Grossberg, S. (1987). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics, 26, 4919–4930. Cohen, M. A., & Grossberg, S. (1987). Masking fields: a massively parallel architecture for learning, recognizing, and predicting multiple groupings of data. Applied Optics, 26, 1866–1891. Cook, D. J. (1994). Defining the limits of analogical planning. In S. J. Hanson, G. A. Drastal & R. L. Rivest, Computational theory and natural learning systems (pp. 65–80). Vol. 1. Cambridge, MA: MIT Press. Dehaene, S., Changeux, J., & Nadal, J. (1987). Neural networks that learn temporal sequences by selection. Proceedings of the National Academy of Sciences, 84, 2727–2731. Diamond, A. (1991). Frontal lobe involvement in cognitive changes during the first year of life. In K. Gibson, M. Konner & A. Patterson, Brain and behavioral development. Chicago, IL: Aldine Press. Dormand, J. R., & Prince, P. J. (1980). A family of embedded Runge– Kutta formulae. Journal of Computing: Applied Mathematics, 6, 19–26. Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: a critical analysis. In S. Pinker & J. Mehler, Connections and symbols (pp. 3–71). Cambridge, MA: MIT Press. Furl, N.O. (1999). Category induction and exception learning. Unpublished master’s thesis, University of Texas at Arlington. Fuster, J. M. (1997). The prefrontal cortex, (3rd Ed.). New York: Raven. Gentner, D., Rattermann, M. J., & Forbus, K. D. (1993). The roles of similarity in transfer: separating retrievability and inferential soundness. Cognitive Psychology, 25, 524–575. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and N.G. Jani, D.S. Levine / Neural Networks 13 (2000) 149–183 regulation of behavior by representational memory. Handbook of Physiology, 5, 373–417. Goodall, J. (1990). Through a window: my thirty years with the chimpanzees of Gombe. Boston, MA: Houghton Mifflin. Goswami, U. (1991). Analogical reasoning: what develops? A review of research and theory. Child Development, 62, 1–22. Goswami, U. (1992). Analogical reasoning in children. Hove, UK: Lawrence Erlbaum Associates. Goswami, U. (1998). Cognition in children. Hove, UK: Psychology Press/ Erlbaum. Grossberg, S. (1982). Studies of mind and brain: neural principles of learning, perception, development, cognition and motor control. Dordrecht: Reidel. Grossberg, S. (1987). Competitive learning: from interactive activation to adaptive resonance. Cognitive Science, 11, 23–63. Grossberg, S. (1988). Neural networks and natural intelligence. Cambridge, MA: MIT Press. Grossberg, S., & Mingolla, E. (1985). Neural dynamics of form perception: boundary completion, illusory figures and neon color spreading. Psychological Review, 92, 173–211. Gruber, H. E. & Vonèche, J. J. (1995). The essential Piaget. Northvale, NJ: Jason Aronson. Guigon, E., Dorizzi, B., Burnod, Y., & Schultz, W. (1995). Neural correlates of learning in the prefrontal cortex of the monkey: a predictive model. Cerebral Cortex, 5, 135–147. Hammond, K. J. (1989). Case-based planning: viewing planning as a memory task. Boston, MA: Academic Press. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Reading, MA: Addison-Wesley. Hestenes, D. (1998). Modulatory mechanisms in mental disorders. In D. J. Stein & J. Ludik, Neural networks and psychopathology (pp. 132–164). New York: Cambridge University Press. Hofstadter, D. R. Fluid Analogies Research Group (1995). Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought. New York: Basic Books. Holyoak, K. J. & Barnden, J. A. (1994). Analogical connections. Advances in connectionist and neural computation theory, Vol. 2. Norwood, NJ: Ablex. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: analogy in creative thought. Cambridge, MA: MIT Press. Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: a theory of analogical access and mapping. Psychological Review 104, 427–466. Indurkhya, B. (1991). Metaphor and cognition. Cambridge, MA: MIT Press. Jani, N.G. (1991). Through the eyes of metaphor: analogy based learning. Unpublished master’s thesis, University of Texas at Arlington. Kant, J.-D. (1995) Categ_ART: a neural network for automatic extraction of human categorization rules. In ICANN’95 Proceedings, Vol. 2, (pp. 479–484). Kant, J.-D., Levine, D.S. (1998). RALF: A simplified neural network model of rule formation in the prefrontal cortex. Presentation at the 3rd International Conference on Computational Intelligence and Neuroscience, Research Triangle Park, NC. Klopf, A. H. (1988). A neuronal model of classical conditioning. Psychobiology, 16, 85–125. Kohonen, T. (1984). Self-organization and associative memory. Berlin: Springer. Kosko, B. (1986). Differential Hebbian learning. In J. S. Denker, Neural networks for computing, AIP Conference Proceedings (pp. 265–270). Vol. 151. New York: American Institute of Physics. 183 Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press. Levine, D. S. (1991). Introduction to neural and cognitive modeling. Hillsdale, NJ: Lawrence Erlbaum Associates Second edition to appear in 2000. Levine, D. S. (1996). Modeling dysfunction of the prefrontal executive system. In J. A. Reggia, E. Ruppin & R. S. Berndt, Neural modeling of brain and cognitive disorders (pp. 413–439). Singapore: World Scientific. Long, D., & Garigliano, R. (1994). Reasoning by analogy and causality: a model and application. London: Routledge/Chapman and Hall. von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14, 85–100. Mandler, J. (1990). Recall of events by preverbal children. Annals of the New York Academy of Sciences, 85, 485–516. Meltzoff, A. (1990). Towards a developmental cognitive science: the implications of cross-modal matching and imitation for the development of representations and memory in infancy. Annals of the New York Academy of Sciences 85, 1–37. Minai, A. A., & Levy, W. B. (1993). Sequence learning in a single trial, INNS World Congress on Neural Networks, Vol. 2. Hillsdale, NJ: Lawrence Erlbaum Associates (pp. 505–508). Mitchell, M. (1993). Analogy-making as perception: a computer model. Cambridge, MA: MIT Press. Nigrin, A. (1993). Neural networks for pattern recognition. Cambridge, MA: MIT Press. Plate, T. (1998). Analogy retrieval and processing with distributed representations. In K. J. Holyoak, D. Gentner & B. Kokinov, Advances in analogy research: integration of theory and data from the cognitive, computational, and neural sciences. NBU Series in Cognitive Science. Rumelhart, D. E., & McClelland, J.PDP Research Group (1986). Parallel distributed processing: explorations in the microstructure of cognition, Vols. 1 and 2. Cambridge, MA: MIT Press. Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: a connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences, 16, 417–494. Simon, T., & Klahr, D. (1995). A computational theory of children’s learning about number conservation. In T. Simon & G. S. Halford, Developing cognitive competence: new approaches to process modeling. Mahwah, NJ: Lawrence Erlbaum Associates. Simon, T., Hespos, S. J., & Rochat, P. (1995). Do infants understand simple arithmetic, or only physics? Cognitive Development, 10, 253–269. Sun, R. (1991). Integrating rules and connectionism for robust reasoning: a connectionist architecture with dual representation. Unpublished doctoral dissertation, Brandeis University. Thompson, R. K. R., Oden, D. L., & Boysen, S. T. (1997). Language-naive chimpanzees (Pan troglodytes) judge relations between relations in a conceptual matching-to-sample task. Journal of Experimental Psychology: Animal Behavior Processes, 23, 31–43. Vosniadou, S. & Ortony, A. (1989). Similarity and analogical reasoning. New York: Cambridge University Press. Werbos, P. J. (1993). The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. New York: Wiley. Wharton, C. M., & Grafman, J. G. (1998). Deductive reasoning and the brain. Trends in Cognitive Science, 2, 54–59. Wharton, C. M., Grafman, J. G., Flitman, S. K., Hansen, E. K., Brauner, J., Marks, A. R., & Honda, M. (1998). The neuroanatomy of analogical reasoning. In K. J. Holyoak, D. Gentner & B, Kokinov, Advances in analogy research: integration of theory and data from the cognitive, computational, and neural sciences, Proceedings of Analogy’98 Workshop, Sofia, Bulgaria (pp. 260–269).
© Copyright 2026 Paperzz