Evol. Intel. DOI 10.1007/s12065-014-0121-7 SPECIAL ISSUE An evolutionary cognitive architecture made of a bag of networks Alexander W. Churchill • Chrisantha Fernando Received: 15 September 2014 / Revised: 1 November 2014 / Accepted: 4 November 2014 Ó Springer-Verlag Berlin Heidelberg 2014 Abstract A cognitive architecture is presented for modelling some properties of sensorimotor learning in infants, namely the ability to accumulate adaptations and skills over multiple tasks in a manner which allows recombination and re-use of task specific competences. The control architecture invented consists of a population of compartments (units of neuroevolution) each containing networks capable of controlling a robot with many degrees of freedom. The nodes of the network undergo internal mutations, and the networks undergo stochastic structural modifications, constrained by a mutational and recombinational grammar. The nodes used consist of dynamical systems such as dynamic movement primitives, continuous time recurrent neural networks and high-level supervised and unsupervised learning algorithms. Edges in the network represent the passing of information from a sending node to a receiving node. The networks in a compartment operate in parallel and encode a space of possible subsumption-like architectures that are used to successfully evolve a variety of behaviours for a NAO H25 humanoid robot. Keywords Cognitive architecture Darwinian neurodynamics Open-ended evolution Robotics 1 Introduction Learning a controller for a complex multiple degree of freedom device such as a child or a robot is a non-trivial problem and the principles by which development achieves A. W. Churchill (&) C. Fernando School of Electronic Engineering and Computer Science, Queen Mary, University of London, London, UK e-mail: [email protected] goals with such high-dimensional systems has been the subject of considerable investigation from the early work of Bernstein [54], the dynamical systems approach of Thelen and Smith [64], to the robotics experiments of Schaal’s group on Dynamic Motor Primitives [50]. A fascinating and critical property of developmental systems is their ability to accumulate adaptations without catastrophic forgetting, a topic that has recently been highlighted by [39] as being highly suggestive of a role for structural plasticity in the brain along constructivist principles [8]. A holy grail in developmental robotics would be to achieve open-ended adaptation over long periods of time, with the same cognitive architecture being able to bootstrap prior knowledge and competences to achieve more and more complex tasks. In this paper we present a cognitive architecture formed of parallel cyclic networks of potentially complex nodes. These nodes can range from simply outputting a constant value to encapsulating a voice recognition unit. The composition of these networks is produced using an evolutionary approach, a practice first introduced by Husbands et al. in 1993 [28]. A key motivation behind Evolutionary Robotics is to produce complex, robust and adaptable behaviour through feedback from the environment. Using evolution, it is possible for complex behaviour to emerge from interactions among several parts, which could have extremely complicated or chaotic dynamics. For example, Koos et al. produced a highly robust robot which can adapt to unforeseen circumstances, such as damaging a leg [31]. In this paper experiments are performed on a humanoid robot, which is challenging due to many degrees of freedom. Evolutionary techniques have been used to produce reaching (e.g. [41, 48], dancing (e.g. [16]), and bipedal motion (e.g. [17, 18]) behaviour in humanoid robots. 123 Evol. Intel. Various attempts have been made to ‘‘scale up’’ the behaviour of robotic controllers to produce a growing library of learned competencies, such as through subsumption [4] and incremental evolution [26]. In the former case, independent behaviours are encoded in separate modules, which get activated through sensory conditions. Koza used Genetic Programming (GP) to evolve a subsumption architecture, with controllers based on rule-sets [32]. This has been adapted to use modules made of neural network controllers in [66] but scaling up to high-dimensional robots has not been attempted. Incremental evolution has been demonstrated to be successful for one or two steps, e.g. in the system of Gomez and Miikkulainen [21], and recent work from Mouret and Doncieux has produced promising results by bootstrapping controllers based on behavioural diversity [43]. However, an open-ended accumulation of adaptation has not been achieved. Other than subsumption and incremental evolution, there have been attempts to evolve neural network controllers that can switch between behaviours given different contexts. Calabretta et al. presented an early example of this, evolving two different behaviours in two individual neural networks [7]. More recent work from Durr et al. also explicitly evolves two neural networks for a robotic task which has changing goals depending on the current context [15]. The modular approach, i.e. using multiple networks, was found to greatly outperform a single network. In the world of videogames, Schrum et al. evolved modular neural networks that produced multiple behaviours in Ms. Pac-Man [52]. The system presented here is influenced by the research outlined above, and can implement a subsumption-like architecture, through the interactions of disjoint graphs. It deviates from the reported work in a number of ways. The nodes are complex and could consist of a neural network like many of the examples above but could also use any type of processing unit, such as a finite state machine. This is similar to [67], which evolves a subsumption architecture with several modules with predefined behaviours (e.g. wandering, recharging, etc.). Each module can consist of any structure capable of producing motor commands. The architcture presented in this paper differs on a key aspect— in [67] only one module can be active at a time, while here many modules can operate in parallel. This can enable complex control of a high-dimensional robot, as well as enabling modules to be active that do not directly affect the environment. For example, the ability to build models could be a key component for open-ended adaption [24]. Many researchers have used models for solving robotic problems. Typically these are divided into forward models (predicting the next state given the current state and an action) and inverse models (predicting the action required to reach a desired state from the current state) [30]. These 123 are generally learnt through machine learning techniques techniques such as Gaussian Process Regression (e.g. in [60]), Bayesian Networks (e.g. in [12]), Symbolic Regression (e.g. in [42]), Learning Classifier Systems (e.g. in [6]) and Reinforcement Learning (e.g. in [2]). Recent work from Bellas et al. build models with evolutionary techniques [3], and show that this is especially useful in non-stationary environments. Many researchers have used models as a driver for Intrinsic Motivation or Artificial Curiosity, where a learning machine autonomously creates its own goals, i.e. there is no explicit external task set by a human experimenter [1]. One method is to set up reward functions to maximise learning progress, for example Schmidhuber’s function to maximise the change in compressibility over time [51] and Oudeyer et al.’s maximisation of prediction error [44]. These functions drive the learning agent to discover something new about the world, while refraining from attempting to model regions with no or very little information, such as noise. Although not implemented in the experiments in this paper, below we describe how the presented system could be used evolve its own tasks. There are also similarities to the architecture presented in this paper and Learning Classifier Systems (LCS). While there are many different varieties of classifier system, the canonical ‘‘Michigan style’’ evolves a set of classifiers consisting of a condition, an action and a prediction (a detailed introduction to LCS can be found in [5]). LCS have been applied to learning agents both in simulation, e.g. the early work of Wilson’s ZCS implemented in [9] and Butz and Herbort’s application of XCSF to a robotic arm [6]), and real robots such as in Hurst and Bull’s work using ZCS and neural classifiers [27], and Studley and Bull’s work using an extension of XCS [59]. The work on LCS is very relevant to the system presented here. The controller graphs that are evolved here have an input condition that if satisfied will activate the graph (allow input to flow downstream) and produce an output which can be interpreted as an action, similar to the rules in an LCS such as ZCS. The most related literature to the work presented here is Dorigo and Colombetti’s robot shaping using hierarchical LCS modules [14]. In their influential book where they experiment on a number of robotic problems, the hierarchies and architectures are designed by hand, and modules are incrementally trained and then frozen, in a similar manner to incremental evolution. In the work presented here, the architectures themselves can evolve. Hence, differently to most Learning Classifier Systems, here the unit of evolution is a group of ‘‘classifiers’’ rather than a single rule. This group contains many graphs that can operate in parallel, meaning that there is both parallel message passing between graphs and parallel sensor-motor contingencies. This allows the system to Evol. Intel. control a real-world high degree-of-freedom humanoid robot with continuous sensor and motor commands, while Dorigo and Colombetti concentrate on a discritised version of the world. In this work we use a directed graph based control architecture capable of parallel processing, in which cyclic networks are allowed. The network is analogous to a neural network controller, however, the nodes can be much more complex than simple neurons. There is a long history of evolving parallel potentially cyclic directed graph representations, for example Teller and Veloso use parallel evolution for data mining (the PADO algorithm) to solve supervised learning problems [62]. The same authors also introduced the first GP method for producing neural networks, called ‘‘Neural Programming’’ [63]. Ricardo Poli uses a parallel genetic programming system called PDGP (parallel distributed genetic programming) to evolve feedforward parallel programs to solve the XOR problem amongst others [46]. Cartesian genetic programming [40] also evolves feed-forward programs. The dominant paradigm for structural evolution of networks is now NEAT [56], a powerful system that maintains diversity, protects innovations, and controls the dimensionality of a genotype carefully. The hyper-NEAT extension moves to an indirect encoding, which produces much better performance on large networks [55]. The system presented here differs from the above approaches in several respects. First, it is intended to be a cognitive architecture for the real-time control of robots, secondly the nodes are intended to be complex machine learning algorithms, not arithmetic primitives or simple mathematical functions, thirdly, our system is intended to be capable of controlling a humanoid robot. As in the existing systems, our nodes have multiple functions and transmit information between each other. Our motivation for using the graph representation is to eventually achieve compositionality, systematicity, and productivity because it is possible for specific node types to be physical symbol systems [20] if the rules for operating on them are grammatical. Previous attempts have been made to achieve grammar based genetic programming [38] in tree structures, but not in cyclic distributed parallel graph structures of the types described above. The challenge is to discover an evolvable grammar that generates fit variants with high probability, a task not unlike that faced in language learning by the fluid construction grammar [57]. The system could also be used to evolve fitness functions, providing a method for intrinsic motivation, a crucial component of open ended cognition in robotic agents. The structure of the paper is as follows. The network representation is described followed by the stochastic grammatical variation operators, and the overall evolutionary algorithm that controls the robot in real-time by generating and selecting active units. Some preliminary adaptation results are presented, and the network structures responsible for the behaviour are described. We have not yet achieved the accumulation of adaptation, but we have shown that a variety of distinct tasks can be evolved with disjoint network representations that are suitable for recombination. Several critical problems are encountered and discussed, and an approach is sketched for making progress. 2 Methods 2.1 Nodes The basic building block of the model is a node. It gets vector inputs, performs a processing operation, and sends vector messages to other nodes with some delay. When activated, nodes transmit this activity state to downstream nodes, and remain active for a fixed period. Nodes can have internal states and are connected to create potentially disjoint directed graphs. A node is defined by the following general tuple {t; i:d:; I; O; P; f }, where t is the node type; i:d: is a unique identification number; I is a vector that represents the inputs to the node, the i.d.’s and/or raw sensory locations from where the inputs originate, and delay associated in transmission from each i:d: listed; O is a vector that represents the outputs of the node, and the i.d. and activation state of this node, and the outputs of the node function; P is a vector that represents internal states of the node (all nodes have an activity state, and time to remain active after turning on); and f is the transformation function or learning algorithm, converting inputs to outputs, possibly using internal states and modifying these internal states. We define several classes of nodes in order to reduce the abstractness of the above formalism. Each class contains various sub-classes of nodes. All nodes share the generic ‘‘activation’’ property: when a node becomes activated, it transmits this information in its output message. Sensory nodes get raw sensory input, or act as internal pattern generators that encode the amplitudes and frequencies of a vector of sine waves. Sensor nodes are the only initiator nodes for a graph, i.e. the only nodes that can initiate activation. Motor nodes contain a vector of motors that they control. Their input vector is used to set the motor angles of these motors at each time-step that the node is active, and a re-afferent copy or corollary discharge of the command angles is always output as in [11]. If there is no input, a default motor angle for each motor is stored. Processing nodes receive inputs only from other nodes. They include Euclidean distance nodes that get a vector from its input nodes and calculates the distance to an internally stored parameter vector; a linear transform node 123 Evol. Intel. that does the dot product of the input vector and an internally stored weight matrix to produce a vector of outputs; and an Izhikevich neuron based liquid state machine [29] node. A reinforcement learning node has a two part input vector, a data part and a reward part. The data part consists of inputs from a set of other nodes, typically transform nodes or sensory nodes e.g. signalling joint angles. The reward part must be a single scalar value obtained from a transform node, for example this could signal the prediction error or the proximity to a desired state. The output of a learning node is a vector that will typically encode some parameters for downstream motor nodes to use, or it might be a temporal difference error signal. Types of reinforcement learning node include a stochastic hill climbing node, an actor-critic node, a simulated annealing node and a genetic algorithm node. Supervised learning nodes undertake online supervised learning based on an input training vector, and an input target vector. They do function approximation (regression) or classification. Such a node is essential for efficiently learning models of the world, e.g. forward models, or inverse models [23, 34, 53]. The type of model depends purely on the identity of the transform node that sends it the training and target vectors during its training. Unsupervised learning nodes take an input vector and compress it, e.g. a K-means clustering node does online clustering and outputs the class of each novel data point input to it; a Principle Component Analysis node takes an input vector, has a parameter n which is the number of principle components to output, and the output vector is the intensity of those n first principle components. In the NAO setup, there are high-level face recognition nodes, i.e. a sensor node that receives input from a camera and outputs a label and x,y retinal coordinate of a face; and a speech recognition node, which is a sensor node that receives input from a microphone and outputs either the text or label of the speech. Other high level nodes in the NAO include walk nodes, motor nodes that control the legs to walk to a particular location and take high level Cartesian instructions as input. To summarise, the repertoire of nodes is essentially unbounded and is only limited by their encoding into the framework. In this manner, any combination of such nodes can form, thus enabling an open-ended evolution of complex controllers. This advantage is also a disadvantage, the number of nodes is huge, and the search space of possible networks is absolutely enormous. This makes evolution in such a space extremely difficult, which is discussed in detail later in this paper. 2.2 Bags of networks The unit of neuroevolution consists of a set of disjoint graphs of connected nodes in a compartment or bag of 123 networks to which fitness is assigned as a unit. Activation of the networks in a compartment always starts at the sensory nodes, these activate in parallel the downstream processing nodes with the appropriate delays, which in turn are typically connected to motor nodes. There may be recurrent connections. The motivation for allowing disjoint graphs is that it enables a robot to be subject to potentially multiple parallel independent behavioural policies that together control the whole robot in parallel. By allowing graphs to be disjoint, a bag of potentially useful network motifs can be preserved silently and become utilised by the system in later variation events during evolution. This unusual nature of compartment representations will become more clear when the initialisation of networks and the variation operators acting on them is described below. 2.3 Evolvable action grammar We now describe the variation operations or graph rewrite rules [47]. De novo node constructor operators make new networks at initialisation giving the robot a primary sensorimotor repertoire of reflexes. For example such a network may consist of a single sensory node with a random number of sensory inputs connected to a linear transform node with a delay. The linear transform unit may have a random initial weight matrix sending output with random delays to three motor nodes. Each motor node may control 1 to 4 motors, randomly chosen. A complex constructor is also used throughout evolution as a macro mutation device for inserting functional motifs that consist of random sensory-(linear transform)-motor triplets in a chain of activation. Graph replicator operators produce perfect copies of graphs. These are required for replication and evolution of compartments (bag of networks). Graph connectivity operators only influence the connectivity between already existing nodes within a single graph i.e. the messages that nodes get from other nodes within the graph, the delays with which the messages influence activation, and the time period for which a node is active after activation. For example, the connection deletor removes a message i.d. from the input vector of a node. A connection adder adds a new message i.d. of a node in the graph to the input vector of another node in the graph. For now, all nodes are allowed to connect to/from any other node type, but evolvability in future work will demand this to be constrained. A critical feature of the above operators is that deletion of an edge may create two disjoint graphs that remain in the same compartment. Node duplication operations make copies of nodes within a single graph. The parallel node replicator takes a node and copies it entirely along with its input and output links such that this node now exists in parallel with its parent node, with the same inputs and outputs of the parent node. The serial node Evol. Intel. replicator copies a single node but instead of copying the inputs and outputs of the node, it makes the input to the offspring node the output of the parent node, and makes the offspring node connect to those nodes that the parent node connected to, i.e. it inserts a copy of the parent node downstream of the parent node. A node deletor operator removes a node along with its input and output edges, thus potentially disconnecting a graph into two (or more) components. All these operators act on the message i.d. list in the input vector of a node. Thus connectivity is encoded at the input end of nodes, not at the output ends. Intra-node operators modify the sensory inputs, the motors controlled, the transfer functions and the internal states of the nodes. Intra-node mutations depend on the specific node type. Intra-node operators do not mutate the type of the node. This greatly restricts the mutability of the genome, but increases its evolvability. Multi-graph recombinators take N networks and construct a network that is derived from properties of the N parent networks. In analogy with genetic programming [33], a standard operator is the branch crossover operator that is defined only on trees and not recurrent graphs. This chooses two nodes and crosses them over along with any downstream nodes to which they are connected. Motif crossover operators allow crossover only between functionally isomorphic digraph motifs, in order to prevent crossover from being destructive due to the competing conventions problem. encoded as goal graphs. The co-evolution of goal networks and action networks has the potential to produce openended behaviour by allowing the agent to produce its own tasks to develop skills and competencies. This is the next step for the architecture and is not presented in the results below. 2.5 Evolutionary algorithm The evolutionary algorithm consists of a population of 10-100 units of evolution (bags of networks). A binarytournament microbial steady-state Genetic Algorithm is used to select pairs of units which are evaluated sequentially on the robot [25]. The fitness of the two units is compared and the winner overwrites the loser with mutation and recombination as described above in the operator section. This continues until the behaviour is interesting, or the robot breaks or overheats. 3 Results This section should be read while referring to the accompanying videos in the Supplementary Material.1 Simulations are conducted in Webots for NAO, and real world experiments are conducted with the NAO robot itself. 3.1 Maximising distance travelled by a humanoid robot 2.4 Intrinsic motivation Robot goals are usually set by an external user. On one extreme, ‘‘dumb’’ robots’ goals and actions are handdesigned, allowing no flexibility at all. Optimal control algorithms that enable the automatic computation of (approximate) solutions that achieve a given task are a more robust and efficient way [36, 65]. Machine learning algorithms, such as reinforcement learning, enable the robot to learn what actions to do in order to solve a specific game or goal [37, 61]. And finally, on the other extreme, curious robots learn to act and move in order to explore their environment, i.e. learn to map and categorise it autonomously [2, 13, 22, 45]. However, in all these examples, the goals or tasks are set by the external user; even curious robots’ goals are set, i.e. go and learn. As described above, this can be accomplished by rewarding learning progress such as by maximising prediction error. This could encourage a learning agent to explore unseen parts of the environment. A goal node is a special class of node that receives input vectors and outputs a scalar. Using a goal node as the sink node in a network consisting of the nodes described above (with the exception of motor nodes), fitness functions can be encoded. In all of the experiments below, the fitness functions have been In the first experiment, the control networks are evolved to maximise the distance travelled by the NAO during a single episode of 100 time steps. The fitness function is, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ f ðxÞ ¼ ðx2t¼1 x2t¼100 Þ þ ðy2t¼1 y2t¼100 Þ In order to test the efficacy of the intra-node mutations, an initial graph de novo constructor was used. An example initial graph can be seen in Fig. 1 and is formed of three nodes—a sensor that is used for activation only, i.e. with no output, and two motor nodes, which operate the same motors. The activation and delay times were initialised to create an oscillating effect by alternating between motor angles set to opposite directions in each motor node. In the first evolutionary run of this system, only the motor angles and the delay and activation times of the action graphs were mutable, not the graph topology. This meant that fitness could only be improved through increasing the speed and strength of oscillations during the fixed time period. Each trial lasted for 50 time steps of 0.1 s, with the simulated NAO robot beginning from a resting supine position, as illustrated in Fig. 2. Evolution 1 Supplementary material is available at http://www.alexchurchill. com/papers/evin2014. 123 Evol. Intel. was performed for 250 generations, each consisting of 10 binary tournaments and 20 evaluations. The fitness of the individuals at every 10 generations is shown in Fig. 4a. There is a clear, although not dramatic increase in fitness (distance travelled) over the course of evolution. Fitness improvements occur after 100 and 120 generations, where the best solutions move from a distance of around 0.3 m to close to 0.4 and then close to 0.5 m. The initial setup conditions create a largely symmetrical rocking motion that moves the robot slowly towards its left. The best behaviour after 50 generations rocks the robot with much greater force, achieved by raising the legs higher, which causes it to move further to the left over the course of a time trial. At 100 generations, the legs move higher and are raised for more time, which causes the robot to rotate further, and Fig. 1 Showing a hand-designed controller graph, where sensor nodes (pentagrams) provide activation signals only. Edge numbers indicate delay times, and the integer within a motor node (rectangle) denotes the number of steps it remains active for. Motors are labelled in the motor node (red). Motor key: HY: Head Yaw, HP: Head Pitch, LSP: L Shoulder Pitch, LSR: L Shoulder Roll, LEY: L Elbow Yaw, LER: L Elbow Roll, LWY: L Wrist Yaw, LH: L Hand, LHYP: L Hip Yaw Pitch, LHR: L Hip Roll, LHP: L Hip Pitch, LKP: L Knee Pitch, LAP: L Ankle Pitch, LAR: L Ankle Roll, RHYP: R Hip Yaw Pitch, RHR: R Hip Roll, RHP: R Hip Pitch, RKP: R Knee Pitch, RAP: R Ankle Pitch, RAR: R Ankle Roll, RSP: R Shoulder Pitch, RSR: R Shoulder Roll, REY: R Elbow Yaw, RER: R Elbow Roll, RWY: R Wrist Yaw, RH: R Hand Fig. 2 Showing the resting position of the NAO, which it was reset to at the start of each trial. Images are from different angles of the same scene. a Front. b Side 123 thus increases the distance travelled. At 150 generations, it employs the same behaviour, moving to its right, but with the legs oscillating much more quickly. This final behaviour appears to be a local optimum for the robot. In a second evolutionary run, the complexity of node mutations was increased to make motor identities mutable, but again with graph topology held fixed. The same starting conditions, i.e. the node structure and parameters, were used as in the first run. The fitness of individuals over the course of evolution is shown in Fig. 4b. Here, a similar performance to the first run is achieved after only 70 generations, and by 100 evaluations the system exceeds the best results found on the first run. After 50 generations the behaviour looks similar to that exhibited by the control structures found in the first run. The robot rotates from side to side, biased towards movements on its left side. The action graphs in Fig. 3 show that after 50 generations, five left sided motors and three right sided motors are employed. After 100 generations, the behaviour of the best compartment has changed. The left leg now extends quite far and then drags the robot towards its left side. Looking at the node structure, the second node is now composed entirely of left sided motors. At 200 generations the right leg also extends and a stepping motion emerges. The behaviour of the best controller after 100 and 5,000 evaluations is shown in video V1 in the Supplementary Material, both in simulation and on the real robot. 3.2 Evolving headstands in a humanoid robot In the second experiment, bags of networks were evolved for 3,000 binary tournaments (6,000 evaluations) with all of the variation operators described in Sect. 2.3 above applied in the evolutionary process. Fitness was obtained by maximising the sum of the z-accelerometer sensor in the torso of NAO over the course of 150 time steps, Evol. Intel. Fig. 3 Showing the graphical structure of the best controller graphs after being evolved for different numbers of generations on the second evolutionary run of the maximise distance task. Sensor nodes (pentagrams) provide activation signals only. Edge numbers indicate delay times, and the integer within a motor node (rectangle) denotes the number of steps it remains active for. Refer to Fig. 1 for the motor key. a Gen 50. b Gen 100. c Gen 150. d Gen 200. e Gen 250 Fig. 5 Showing the fitness of controller graphs over 300 generations of evolution on the maximise z-accelerometer task Fig. 4 Showing the fitness of action bags of networks over 250 generations of evolution on the maximise distance task. a Activation and Delay time and motor angle mutations only. b Activation and Delay time, motor angle and motor mutations f ðxÞ ¼ 150 X zt ð2Þ t¼1 Maximal fitness would be attained by the robot standing on its head. Figure 5 shows the fitness of each evaluated member of the population at each pseudo-generation of 10 binary tournaments. The bags of networks are clearly increasing in fitness over the course of evolution. There is an especially large jump over the first 50 generations, and then a more gradual but positive ascent throughout the rest of evolution. After 32 generations have passed there is a sudden drop in fitness. This occurred because the NAO moved into a position on its side where it was balanced and unable to return to the rest position on its back. In this situation compartments are now selected in effectively a different environment. After 36 generations, the NAO moves out of this position and can reset normally again. The bag of networks representing the fittest individual found at the end of evolution can be seen in Fig. 6. The compartment is made up of three disconnected graphs that get activated and produce movement over the course of a trial, labelled as Groups A–C, with a small number of nodes that will lie dormant (never activated). Among the dormant nodes there is a sensor node connected to a linear transform node that will be activated but will have no 123 Evol. Intel. Fig. 6 Showing the compartment that represents the best individual found on the maximise Z-accelerometer task. There are three disjoint connected graphs that get activated during a run, and these have been highlighted and labelled A, B and C. Pentagrams show sensor nodes with key denoted below, triangles represents linear transform nodes Table 1 Fitnesses of the individual and combinations of disjoint graphs from best compartment found on the maximise z-accelerometer task, labelled as shown in Fig. 6 Group Fitness A 575 B 459 C A?B 272 772 A?C 634 B?C 590 A?B?C 807 A ? B ? C (Changed Z-accelerometer sensors to X-accelerometer sensors) 467 effect on the behaviour of the robot. The fitnesses obtained by the three separate active groups can be seen in Table 1. The behaviour of each entry in the table can be seen in Video V2. The compartment containing all three groups performs the best, and the fitness is considerably greater than any one group on its own, showing that the disjoint graphs do interact to create a more adaptive behaviour than any single disjoint component of the bag of networks. The final robot stance obtained by this compartment can be seen in Fig. 7a. Group A, the largest disjoint graph, has the highest fitness when considered on its own. It causes both 123 with a maximum of five outputs, diamonds Euclidian distance nodes, stars show K-Means nodes and rectangles shows motor nodes with motors denoted inside (refer to Fig. 1 for the motor key). Sensor key: HY Head Yaw, LKP L Knee Pitch, RER R Elbow Roll, RHP R Hip Pitch, GY Gyroscope Y, ZA Accelerometer Z, DZ Distance Z of NAO’s legs to move, as well as its right arm, which causes it to move backwards into an arch-like position, raising its torso and so increase its z-accelerometer reading. Group B also scores well by itself, mainly moving NAO’s legs, with a larger emphasis on the right one. This leads it to move its torso backwards. Group C does not produce much movement when starting from a resting position. The combination of Groups A and B moves both legs far back underneath the torso, arching NAO backwards at a sharp angle, as can be seen in Fig. 7b. With all three groups together, the robot moves its legs even further backwards, drawing the torso even closer to a vertical position. The two highest scoring groups (A and B) both make use of the Z-accelerometer sensor (143 in Fig. 6). To test the impact of this sensor, the action graphs were modified to replace all occurrences of the Z-accelerometer sensor with the X-accelerometer, which had the affect of oscillating the robot’s torso from side to side, leading to a much reduced fitness score. The final stance produced by the modified compartment can be seen in Fig. 7c. This demonstrates that a true closed-loop solution has been found. An interesting variant of the above task involves initialising the robot from the prone position rather than the supine position. Figure 8 shows that evolving to maximise z from a prone solution is much faster and more effective than evolving to maximise z from a supine position. Video V3 shows the best behaviour evolved after 1,400 Evol. Intel. Fig. 7 Showing the final position of the NAO after different disjoint graphs are activated together to control the robot, labelled as shown in Fig. 6. In a graphs A, B and C are activated together, in b graphs A and B are activated and in c graphs A, B and C are activated with the z-accelerometer sensor substituted for the x-accelerometer. a A, B and C. b A and B. c A, B and C (X-accel substituted for Z-accel) Fig. 8 Evolution to maximise the z-accelerometer from the prone position. a There is a punctuated increase in fitness. b The final pose obtained is of high fitness. a Evolutionary fitness dynamics. b Final pose obtained evaluations. The solution is quite different, exploits the initial position of the NAO and is elegantly simple. 3.3 Evolving bouncing behaviour in a humanoid robot on a Jolly Jumper In the third set of experiments, a physical NAO robot is placed in a Jolly Jumper infant bouncer (see Fig. 9), which consists of a harness attached to a stand by a spring. This is influenced by Thelen’s work [64], which describes how an infant placed in a bouncer discovers configurations of movement dynamics that provide acceptable solutions and then tunes these to produce efficient bouncing. The robot is placed in the harness and suspended with the tip of its feet firmly touching a cushion on the ground, in the first experiment, and just touching the floor in the second. The fitness function maximises the first derivative of the zaccelerometer (z) at each time step (t), f ðxÞ ¼ T X t¼1 jztþ1 zt j ð3Þ Each trial lasts for T ¼ 100 time steps of 100ms each. At the end of the trial, the robot’s joints are relaxed and it rests for 3 s. It may not return to the same position at the start of each trial, and the spring will not be completely dampened from the previous appraisal, which introduces noisy fitness evaluations. 3.3.1 Fixed topology In the first experiment, a Microbial GA [25] with a population of 10 is applied for 800 evaluations to a fixed graphical structure, evolving only the parameters. The compartment can be seen in Fig. 10, and consists of four Dynamic Motor Primitive (DMP) [49] nodes, connected to the knee and ankle motors, and receiving afferent signals from the motors as input. Additionally, two separate graphs take the force sensitive resistors (FSR) from each foot as input, and output to the ankle and knee motors for their respective sides of the body. These graphs will take control if the FSR signal is above an evolvable threshold. Figure 11 shows the fitness of the population over the course 123 Evol. Intel. quickly improves, and solutions adapt to this new condition, with the knees travelling much greater distances during each oscillation. After 500 evaluations fitness returns to the previous level and then is quickly surpassed, with the best solutions achieving a fitness of over 9. This shows that the presented architecture is able to quickly adapt in a dynamic environment. 3.3.2 Evolved topology A second experiment explores a similar task but with an evolved topology. A more complex evolutionary process is employed in this experiment, inspired by NEAT [56]. A population is seeded with 20 identical copies of a small graph consisting of a DMP node connected to the left knee and receiving an afferent signal as input. This is initialised to perform a basic oscillatory motion. As in NEAT, each node is assigned a unique id. An individual, i, is sequentially compared to each member, j, of each species using the similarity measure, 1 sði; jÞ ¼ ½ss ði; jÞ þ sp ði; jÞ þ sm ði; jÞ þ sshc ði; jÞ 4 Fig. 9 The NAO robot in a Jolly Jumper infant bouncer ð4Þ of evolution. The run proceeds in several stages, which are shown in Video V4. Initially a kicking motion develops, that enables fitness to increase quickly. After 120 evaluations, the feet get stuck on the cushion, and the robot is unable to use its previous motion. A new strategy emerges, moving the knee as far back as possible before kicking. After 200 evaluations, with the feet now unstuck, a new solution is found, moving the left knee far back and relying on a fast kicking motion in the right leg to bounce. After 350 evaluations, the cushion is removed, and the robot is now suspended above the ground at the start of each trial. Fitness falls sharply, as previous strategies relying on contact with the ground are much less successful. Fitness where ss ði; jÞ, sm ði; jÞ, sp ði; jÞ, are functions which return the number of shared sensors, motors and processing node functions respectively, and sshc ði; jÞ returns the number of shared connections between nodes with the same unique id. If sði; jÞ is below a threshold, sthresh (0.5 in this experiment), i is assigned to the species of j and the search is halted. Otherwise, i is assigned to a new species. The fitness score of each individual is divided by the number individuals in its species. A mutation count, mutc , initialised at 0, is also assigned to each new individual, and incremented if a mutation event occurs. Structural mutations are only permitted if mutc [ 3. In this way, innovations are protected and behavioural diversity is encouraged in the population. Additional variation operators pertaining to the DMP nodes are also included in this experiment. Fig. 10 The fixed topology compartment used in the first Jolly Jumper experiment. Pentagrams denote sensor nodes (LFSR left force sensitive resistor, RFSR right force sensitive resistor), triangles denotes processing nodes (DMP dynamic motor primitive, LT linear transform), rectangles denote motor nodes. Disjoint graphs are separated by broken lines 123 Evol. Intel. Fig. 11 Showing the fitness solutions of every 10 evaluations, over 800 evaluations on the first Jolly Jumper experiment Disjoint graphs containing a DMP have an explicit probability of duplicating, and a DMP in one graph can be replicated and replace an existing DMP in another. Evolution was again run directly on a real NAO robot, over the course of 11 h. Figure 12 shows the fitness of each individual in the population after each evaluation. There is a clear improvement in competence on this task from the individuals in the initial population to those at the end of the run. The initial behaviour that the population was seeded with provided an acceptable fitness of between 3 and 5. After around 500 evaluations, structural mutations begin to have a noticeable effect on fitness. There is a climb in fitness from evaluations 500 to 1,000, where the best individuals improve from fitness of 5 to 6.5, and again from evaluations 1,100 to 1,500, where the fitness of the best increases to around 8.5. Figure 13 shows the compartment of the best individual from the final population. As with the compartment shown in Fig. 6, it contains redundancy, with several graphs lacking sensors or motors. The main driving force is a single DMP, which is used to control the left hip, right hip, left shoulder and right knee. A second graph also controls the right knee, which can reset actions sent from the DMP, and create additional oscillatory movements. This solution produces rapid kicking motions, enabling it to move quickly along the longitudinal axis, and thus attain fast changes in its z-accelerometer sensor. 4 Discussion An outline and some preliminary investigations of a modular and evolvable control architecture for highdimensional robotic systems has been described. The Fig. 12 Showing the fitness of all solutions over 11 h of evaluations on the second Jolly Jumper experiment results show that using this architecture, interesting behaviour can emerge on a number of tasks using only a small number of evaluations. The majority of the paper describes the generative operators we have chosen as a first attempt to study ontogenic cognitive evolvability. We have only skimmed the surface of node types and variability operators that are possible within this general formulation. This architecture opens an entire research program about what kinds of operator and node allows lifetime evolvability of adaptive behaviour, i.e. about the grammar of thought and the grammar of action. Our work can be seen as an action fluid grammar akin to the work of Luc Steels on Fluid Construction Grammar for language, but in the domain of behaviour [19]. A related approach is that of the ‘‘options framework’’ [58], however no algorithm has so far been proposed that is capable of effective stochastic search over option space. Our framework is richer than the options framework and does show how stochastic search over an unlimited behaviour space could take place. The unlimited behaviour space is both a blessing and a curse for the architecture in its present form. As seen in the experiments that allow free topologies for the headstand and Jolly Jumper tasks, the evolved networks can suffer from redundant nodes and bloated topologies. The methods inspired by NEAT described in Sect. 3.3.2 can encourage diversity and protection of innovation that must be given time to develop. It also places constraints that encourage slow growth. However, this alone does not seem to be enough to prevent bloat and it is clear that extra constraints must be added to the system. There are several constraints that should be further explored. For example, one could impose restrictions on which nodes can be connected together, or weight the probabilities of new connecting nodes, to make certain node combinations more likely to 123 Evol. Intel. Fig. 13 Showing the best compartment found at the end of evolution on the second Jolly Jumper experiment. Pentagrams denote sensor nodes, triangles denotes processing nodes (DMP dynamic motor primitive, LT linear transform), rectangles denote motor nodes. Disjoint graphs are separated by broken lines occur. Putting a cost on adding connections and nodes could go a long way towards preventing bloated structures and redundant nodes. This regularisation could also produce more modular networks [10]. Learning Classifier Systems such as XCS [68] have been designed to produce specialist classifiers. Although they do not deal with bloat and redundancy in terms of the structure of graphs, they can prevent several graphs responding to the same sensory input and producing similar actions. An interesting extension to the current work would be to use the nodes and variation operators described in this paper to evolve graphical classifiers in an XCS-style system. While this could create complex controllers, a system would have to be designed to allow several classifiers to operate in parallel, perhaps by extending Dorigo and Colombetti’s work [14]. Other constraints can be imposed on node types and variation operators from the child development literature, especially the work under the umbrella of Esther Thelen’s dynamical systems approach [64]. For example we know how rhythmic movements are assembled from component oscillators in models that fit child motor control. Motor synergies of various kinds limit the degrees of freedom in a control problem, for example fixed joint velocity ratios between shoulder and elbow. There is also progressive uncoupling of joint rotations, i.e. freezing and unfreezing of couplings, which could be introduced into the system. As well as improving the system by making it more efficient and evolvable and extending the library of node types, there are two main areas that should be explored with the architecture described in this paper—(1) the accumulation of adaptive behaviours and (2) the co-evolution of behaviours and tasks. For the first issue, we have been exploring the use of an archive to store learnt 123 behaviours. Both individual graphs or full bags of networks could be added to a solution from the archive over the course of evolution. In this way learnt skills can be remembered without catastrophic forgetting and integrated into new solutions. As described in Sect. 2.4 above, tasks can be evolved in the framework through the use of special goal nodes, and all fitness functions in the experiments above have been encoded in this way. An idea that is currently being explored is the co-evolution of controller and task compartments. A population (or island) of controllers is assessed based on its performance on a particular task. By having several islands, each with a different goal network, we are able to assess and evolve the goal networks themselves. For example, a binary tournament could take place between two islands, with the goal network and/or population of the winner replacing the loser, with mutation of the goal network. The difficulty is finding the appropriate function to evaluate the goal networks. Initially we have tried competence progress, the improvement in fitness of controllers from the start to the end of a fixed number of evolutionary events. This was found to lead to trivial tasks, as goal networks could be trivially modified to enable easy adaptation from the controllers. Additional measures are certainly needed, with novelty being a promising candidate [35]. 5 Conclusion A system has been introduced that allows complex control motifs to be evolved, producing interesting behaviour on a number of tasks demonstrated both in simulation and on a physical robot. It remains to be seen whether the Evol. Intel. architecture is capable of the accumulation of adaptation and transfer learning across multiple tasks by the use of an archive of control motifs. A challenge for the utility of the architecture is simultaneously its strength, i.e. in its present form it is too rich. A number of methods for constraining the system have been outlined and a detailed investigation of their integration into the architecture is needed. One aspect missing from the architecture currently is gating and channeling at the inputs and outputs. Basic switching has been implemented but evolvable control flow operations are also needed. This adds an even greater complexity to the evolvability problem. The system supports the evolution of tasks (fitness functions) as well as control structures, and so it is possible for it to be applied as an evolutionary method for intrinsic motivation. If the issues surrounding bloated networks can be controlled and an evaluation method introduced to evolve non-trivial goal tasks, the architecture described in this paper could be used to produce open-ended cognition. Acknowledgments The work is funded by the FQEB Templeton grant ‘‘Bayes, Darwin and Hebb’’, and the FP-7 FET OPEN Grant INSIGHT. References 1. Baldassarre G, Mirolli M (2013) Intrinsically motivated learning in natural and artificial systems. Springer, New York 2. Baranes A, Oudeyer PY (2013) Active learning of inverse models with intrinsically motivated goal exploration in robots. Robot Auton Syst 61(1):49–73 3. Bellas F, Duro RJ, Faiña A, Souto D (2010) Multilevel darwinist brain (mdb): artificial evolution in a cognitive architecture for real robots. IEEE Trans Auton Mental Dev 2(4):340–354 4. Brooks RA (1990) Elephants don’t play chess. Robot Auton Syst 6(1):3–15 5. Bull L, Kovacs T (2005) Foundations of learning classifier systems, vol 183. Springer, New York 6. Butz MV, Herbort O (2008) Context-dependent predictions and cognitive arm control with xcsf. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, ACM, New York, pp 1357–1364 7. Calabretta R, Nolfi S, Parisi D, Wagner GP (2000) Duplication of modules facilitates the evolution of functional specialization. Artif Life 6(1):69–84 8. Chklovskii D, Mel B, Svoboda K (2004) Cortical rewiring and information storage. Nature 431:782–788 9. Cliff D, Ross S (1994) Adding temporary memory to zcs. Adapt Behav 3(2):101–150 10. Clune J, Mouret JB, Lipson H (2013) The evolutionary origins of modularity. Proc R Soc B Biol Sci 280(1755):20122,863 11. Crapse TB, Sommer MA (2008) Corollary discharge across the animalkingdom. Nat Rev Neurosci 9(8):587–600 [Crapse, Trinity B Sommer, Marc A R01-EY017592/EY/NEI NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Review England Nature reviews]. Neuroscience Nat Rev Neurosci. 2008 Aug; 9(8):587–600 12. Dearden A, Demiris Y (2005) Learning forward models for robots. In: IJCAI, vol 5, p 1440 13. Der R, Martius G (2012) The playful machine. Cognitive systems monographs. Springer, New York 14. Dorigo M (1998) Robot shaping: an experiment in behavior engineering. MIT press, Cambridge 15. Durr P, Mattiussi C, Floreano D (2010) Genetic representation and evolvability of modular neural controllers. IEEE Comput Intell Mag 5(3):10–19 16. Eaton M (2013) An approach to the synthesis of humanoid robot dance using non-interactive evolutionary techniques. In: IEEE international conference on systems, man, and cybernetics (SMC), pp 3305–3309. IEEE 17. Eaton M, Davitt TJ (2007) Evolutionary control of bipedal locomotion in a high degree-of-freedom humanoid robot: first steps. Artif Life Robot 11(1):112–115 18. Farchy A, Barrett S, MacAlpine P, Stone P (2013) Humanoid robots learning to walk faster: from the real world to simulation and back. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 39–46. International foundation for autonomous agents and multiagent systems 19. Fernando C, Zachar I, Szathmry E (2010) Linguistic constructions as neuronal replicators. In: Steels L (ed) Fluid construction grammar. John Bengamin, Oxford 20. Fodor J, Pylyshyn Z (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28:3–71 21. Gomez F, Miikkulainen R (1997) Incremental evolution of complex general behavior. Adapt Behav 5(3–4):317–342 22. Gordon G, Ahissar E (2012) A curious emergence of reaching, lecture notes in computer science, vo. 7429, chap 1. Springer, Berlin, pp 1–12 23. Gordon G, Ahissar E (2012) Hierarchical curiosity loops and active sensing. Neural Netw 32:119–129 [Gordon, Goren Ahissar, Ehud Neural Netw. 2012 Aug; 32:119–129. Epub 2012 Feb 14] 24. Haruno M, Wolpert D, Kawato M (2001) Mosaic model for sensorimotor learning and control. Neural Comput 13:2201–2220 25. Harvey I (2011) The microbial genetic algorithm. In: Advances in artificial life. Darwin meets von Neumann. Springer, New York, pp 126–133 26. Harvey I, Husbands P, Cliff D (1994) Seeing the light: artificial evolution, real vision. School of Cognitive and Computing Sciences, University of Sussex, Falmer 27. Hurst J, Bull L (2006) A neural learning classifier system with self-adaptive constructivism for mobile robot control. Artif Life 12(3):353–380 28. Husbands P, Harvey I, Cliff D (1993) An evolutionary approach to situated ai. In: Proceedings of the 9th Bi-annual conference of the society for the study of artificial intelligence and the simulation of behaviour (AISB 93), pp 61–70 29. Izhikevich EM (2006) Polychronization: computation with spikes. Neural Comput 18(2):245–282 [Izhikevich, Eugene M. Neural Comput.18(2), 245–82 (2006)] 30. Jordan MI, Rumelhart DE (1992) Forward models: supervised learning with a distal teacher. Cogn Sci 16(3):307–354 31. Koos S, Cully A, Mouret JB (2013) Fast damage recovery in robotics with the t-resilience algorithm. Int J Robot Res 32(14):1700–1723. doi:10.1177/0278364913499192 32. Koza JR (1992) Evolution of subsumption using genetic programming. In: Proceedings of the 1st European conference on artificial life, pp 110–119 33. Koza JR (1999) Genetic programming III: darwinian invention and problem solving. Morgan Kaufmann, San Francisco. 99010099 [John R. Koza et al. ill.; 25 cm. Includes bibliographical references (p. [1081]–1114) 34. Lalazar H, Vaadia E (2008) Neural basis of sensorimotor learning: modifying internal models. Curr Opin Neurobiol 18((6)):573–581 123 Evol. Intel. 35. Lehman J, Stanley KO (2011) Abandoning objectives: evolution through the search for novelty alone. Evol Comput 19(2):189–223 36. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, Hoboken 37. Li H, Liao X, Carin L (2009) Multi-task reinforcement learning in partially observable stochastic environments. J Mach Learn Res 10(1131–1186):1577109 38. Mckay RI, Hoai NX, Whigham PA, Shan Y, ONeill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach 11(3–4):365–396 39. Miller JF, Khan GM (2011) Where is the brain inside the brain? Memet Comput 3(3):217–228 40. Miller JF, Thomson P (2000) Cartesian genetic programming. In: Genetic programming. Springer, New York, pp 121–132 41. Mohamed Z, Kitani M, Kaneko Si, Capi G (2014) Humanoid robot arm performance optimization using multi objective evolutionary algorithm. Int J Control Autom Syst 12(4):870–877 42. Moriguchi H, Lipson H (2011) Learning symbolic forward models for robotic motion planning and control. In: Proceedings of European conference of artificial life (ECAL 2011), pp 558–564 43. Mouret JB, Doncieux S (2009) Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity. In: IEEE congress on evolutionary computation, 2009. CEC’09, pp 1161–1168. IEEE 44. Oudeyer PY, Kaplan F, Hafner VV (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286 45. Pape L, Oddo CM, Controzzi M, Cipriani C, Frster A, Carrozza MC, Schmidhuber J(2012) Learning tactile skills through curious exploration. Frontiers in neurorobotics 6 46. Poli R (1996) Parallel distributed genetic programming. Citeseer 47. Rozenberg G (1997) Handbook of graph grammars and computing by graph transformation. World Scientific, Singapore. 96037597 edited by Grzegorz Rozenberg. ill.; 23 cm. Includes bibliographical references and indexes. v. 1. Foundations 48. Savastano P, Nolfi S (2013) A robotic model of reaching and grasping development 49. Schaal S (2006) Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In: Adaptive motion of animals and machines. Springer, New York, pp 261–280 50. Schaal S, Peters J, Nakanishi J, Ijspeert A (2005) Learning movement primitives. In: Robotics research. Springer, New York, pp 561–572 51. Schmidhuber J (2009) Driven by compression progress: a simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Anticipatory behavior in adaptive learning systems. Springer, New York, pp 48–76 52. Schrum J, Miikkulainen R (2014) Evolving multimodal behavior with modular neural networks in ms. pac-man. In: Proceedings of 123 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. the genetic and evolutionary computation conference (GECCO 2014). Vancouver, BC, Canada, pp 325–332. http://nn.cs.utexas. edu/?schrum:gecco2014. Best Paper: Digital Entertainment and Arts Shadmehr R (2004) Generalization as a behavioral window to the neural mechanisms of learning internal models. Hum Mov Sci 23:543–568 Sporns O, Edelman GM (1993) Solving bernstein’s problem: a proposal for the development of coordinated movement by selection. Child Dev 64:960–981 Stanley KO, D’Ambrosio DB, Gauci J (2009) A hypercube-based encoding for evolving large-scale neural networks. Artif Life 15(2):185–212 Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evol Comput 10(2):99–127 Steels L, De Beule J (2006) Unify and merge in fluid construction grammar Stolle M, Precup D (2002) Learning options in reinforcement learning, lecture notes in computer science, vol 2371, chap 16. Springer, Berlin, pp 212–223 Studley M, Bull L (2005) X-tcs: accuracy-based learning classifier system robotics. In: The 2005 IEEE congress on evolutionary computation, vol 3, pp 2099–2106. IEEE Sturm J, Plagemann C, Burgard W (2008) Unsupervised body scheme learning through self-perception. In: IEEE international conference on robotics and automation, 2008. ICRA 2008, pp 3328–3333. IEEE Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press, Cambridge, MA. [97026416 Richard S. Sutton and Andrew G. Barto. ill.; 24 cm. Includes bibliographical references (p. [291]– 312) and index] Teller A, Veloso M (1995) Program evolution for data mining. Int J Exp Syst Res Appl 8:213–236 Teller A, Veloso M (1996) Neural programming and an internal reinforcement policy. In: Late breaking papers at the genetic programming 1996 conference, pp 186–192. Citeseer, New Jersey Thelen E (1995) Motor development: a new synthesis. Am Psychol 50(2):79 Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235. doi:10.1038/nn963 Togelius J (2004) Evolution of a subsumption architecture neurocontroller. J Intell Fuzzy Syst 15(1):15–20 Urzelai J, Floreano D, Dorigo M, Colombetti M (1998) Incremental robot shaping. Connect Sci 10(3–4):341–360 Wilson SW (2000) Get real! xcs with continuous-valued inputs. In: Learning classifier systems. Springer, New York, pp 209–219
© Copyright 2025 Paperzz