Hierarchical Neural Networks for Behavior-Based Decision Making Undergraduate Honors Thesis David Robson Department of Computer Sciences University of Texas at Austin [email protected] Supervising Professor: Risto Miikkulainen May 10, 2010 Abstract This paper introduces the concept of Hierarchical Neural Networks as a viable strategy for behaviorbased decision making in certain contexts. Hierarchical Neural Networks, or HNNs, refers in this case to a system in which multiple neural networks are connected in a manner similar to an acyclic graph. In this way, responsibility can be divided between each neural network in every layer simplifying the vector of inputs, of outputs, and the overall complexity of each network resulting in improved performance. This approach is shown to outperform a single neural network when both systems are tasked to learn a survival strategy incorporating several behaviors in a real-time environment. 1 Contents 1 Introduction 3 2 Background 5 3 Experiment 3.1 NEAT . . . . . . . . . . . . . . . . . . . . . 3.2 Application . . . . . . . . . . . . . . . . . . 3.2.1 Single Network Agents . . . . . . . . 3.2.2 Hierarchical Neural Network Agents 3.2.3 Gatherers and Evaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 8 8 9 4 Results 4.1 Experiment . . . . . . . . . . . . . . . . . . . 4.2 Single Network Controller . . . . . . . . . . . 4.3 Decision Network with Behavioral Algorithms 4.4 Decision Network with Behavioral Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 10 12 13 5 Conclusions and Future Work 14 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 1 Introduction Artificial Intelligence is a rapidly growing field in computer science from which techniques are being used to solve a wide range of problems. We are constantly learning how to get computers to do things that only humans used to be able to do and redefining what is ”impossible” for a computer. For years, the pole-balancing experiment, in which the agent applies force to a cart in order to keep a pole upright on top of the cart, was used to test new evolutionary algorithms. New techniques have led to old benchmarks like the beloved pole balancing experiment becoming too easy [7] and called for new ones to be created. This shows that our understanding of artificial intelligence is constantly growing and allowing us to solve more difficult tasks. One area has consistently been the focus of hundreds of researchers who have driven a great number of the advances in the field. Game playing agents have been around for a long time. Ever since Pong, developers have tried to create more challenging artificial agents and often good artificial intelligence can be the selling point for a game. From Pong to checkers and chess, artifical agents continue to get better and after some time humans are no longer competitive with the best algorithms. However, today’s games are significantly more complex and require agents to process much more information in real time. Figure 1: The centuries old chess (left, http://www.filetransit.com/screenshot.php?id=45365) where on average about 30 moves are possible per board configuration and agents are not required to perform in real time and a screenshot showing some pathing from Halo ODST (right, http://aigamedev.com/insider/reviews/halo3odst-squad-patrol/) where the agents need to maneuver a 3d environment in real time while facing multiple human opponents. Like the old games, these new ones typically offer artificial agents for the player either without other people available to play with or looking for a more casual game. Unfortunately, the technology is not quite to the point where artificial agents can convincingly simulate the behavior of human players. The computer opponents can rarely challenge even a novice player and are forced to utilize unfair resource or damage bonuses to compensate for a lack of intelligence. Add on top of that the expected predictability of a scripted algorithm and serious players are forced to play human opponents for a challenging game. This is undesirable in an industry with so much potential for profit. What makes these games difficult is the increased complexity factor. The agent is forced to utilize much more information, make decisions in real 3 time, and accomplish both complex and unpredictable behaviors. Neural Networks have been applied to a wide range of problems and they are very good at function approximation and pattern recognition. However in the domain of game intelligence, several problems arise. First, agents in this domain are typically required to perform several, sometimes unrelated, tasks. Agents typically learn behaviors like move, hide, and attack, all of which require different information and whose success is measured differently. If the neural network cannot leverage its knowledge about one behavior when learning another, each behavior will constantly pull the network in a different direction resulting in longer training times. Second, the agents typically have a much larger volume of information which is much more varied than in other applications. To perform a wide range of behaviors, it is usually necessary that the agent have access to a wide range of information on which to base its decisions. This results in a need for much larger and more complex networks to solve the task. Additionally, the variety of the data means that not every piece of information will be necessary or useful for each task and will therefore only serve to confuse the learning algorithm. Finally, when an agent is responsible for carrying out multiple behaviors in a simulation it becomes more difficult to assign blame through traditional learning algorithms. How does a neural network know whether it should move better, hide better, hide more, or hide less if all it knows is that it lost? In our approach, Hierarchical Neural Networks were used to address these problems. In an HNN system, several neural networks are used to break the decision-making process into smaller steps. At the lowest level, this process consists of two steps: decide which behavior to pursue and then act according to that behavior. Figure 2: The Decision Network Acts as a switch between the Behavioral Networks. In this way, knowledge is compartmentalized and each neural network can be finely tuned to perform each action in the sense that unnecessary inputs can be removed from the network. This removes unnecessary network complication which allows for faster learning and more easily predictable behavior. Additionally, if several different teams of agents are needed to perform overlapping sets of actions, the behavioral networks 4 can be reused between agents. If each agent were controlled by a single network then the networks would need to be recreated every time. 2 Background This work focuses on two main principles: • Homogenous teams of agents learning to perform multiple complex behaviors • Using Hierarchical Neural Networks to divide the learning task The work discussed in this paper is mainly concerned with finding a better way for a team of agents to learn multiple behaviors. The method exploits the fact that, for certain problems, the set of tasks the agents are required to perform is easily divisible and therefore lends itself to the creation of the behavioral networks. It has been shown that homogenous teams of agents can learn to perform multiple complex behaviors [1]. In the case of [1], agents each with the ability to perform two separate behaviors divided the work across the entire team on the fly. This is the essentially the end goal of our work, but we set out to do it in a more modular, intuitive, and hopefully faster way. The main focus of this work is leveraging the hierarchical structure to improve performance. Juell and Marsh [2] used a hierachical neural to learn the task of facial recognition. In their experiment, the parent network was tasked with recognizing the entire face. To assist it, the child networks were trained to recognize various facial features like the eyes, nose, and mouth. This allowed the algorithm to generalize better in the sense that the division of responsibility allowed the networks to recognize multiple faces in images larger than the set of inputs to the network. This approach different from ours in that it used the lower level networks to provide advice to the high level network in order to reach a final decision [2]. All the same advantages of the hierarchical neural network apply, however. The lower level networks are reusable if similar features need to be recognized, each lower level network can ignore irrelevant information necessary to the other networks, and the approach to breaking a face into identifiable features is every bit as intuitive as deciding which behavior to pursue before pursuing it. Whiteson et al. have tested a very similar approach in their work [6]. The switch network they describe as a means of evolving soccer keepaway players is essentially a Hierarchical Neural Network with a decision network choosing between behaviors implemented as algorithms rather than neural networks which we describe later. We take this approach one step further by training the high level network and allowing evolution to determine when certain behaviors should be used. Their approach of task decomposition equates to the identification of sub-behaviors for the behavioral networks presented in this paper. 5 3 Experiment This section describes the NEAT evolutionary algorithm (Section 3.1), as well as the application which served as the environment for our research (Section 3.2). 3.1 NEAT The agents in this experiment learn using the NeuroEvolution of Augmenting Topologies (NEAT) algorithm developed by Kenneth Stanley at The University of Texas. When neural networks are trained using NEAT, the topology is constantly changed by adding both new nodes and link to the existing network in addition to the traditional method of changing the network’s weights [7]. In this way, networks trained using NEAT are more adaptable during training and need less human design. Several additional innovations added by Neat include: • Historical marking of genes • Speciation • Minimizing dimensionality Historical marking of genes refers to the association of each gene with a unique global innovation number. From then on, each gene will always keep its unique global innovation number. Whenever crossover occurs, the innovation numbers are used to identify identical structures in the parent genes and the offspring can receive all unique structures from the more fit parent. In this way, the competing convention problem [3, 4, 5] where multiple equivalent symmetric solutions can evolve is avoided. Speciation is the attempt to avoid local maxima in the solution space by diverging the population after a certain amount of time has passed. In order to protect innovation, new individuals are evaluated within their species so as to not be squeezed out by the mature individuals in the original species. In this way, new network topologies can be investigated for innovation while protecting both the original species and innovative species. Trivially, the more inputs to a neural network, the more dimensions of the solution space that need to be searched before the best solution can be found. Since NEAT is based on the process of augmenting the topology by adding new nodes and weights, dimensionality can quickly become a serious problem for the networks. NEAT minimizes the problem of dimensionality through incremental growth from minimal structure. Any network trained by NEAT begins life as a minimal network with the necessary input nodes, output nodes, and links but no hidden units. When hidden units and links are added, the networks fitness must improve or the structural complexification will be removed from the population. In this way, additional structure in networks trained by NEAT are guaranteed to contribute to the increased fitness of the network. Because of these advantages and the experimental results published [7], NEAT was the evolutionary algorithm of choice for this project. NEAT begins by creating an initial population according to a starting genome file. This population is then evaluated in some way, usually a simple function, for fitness. In our application, agents use their neural networks to move in the environment, gathering food along the way and trying to stay alive as long as possible. After a set number of turns, the round ends and each agent is evaluated according to its fitness function. In most cases, the fitness of agents in our application is directly 6 proportional to the quantity of food collected. NEAT then produces the next generation of individuals through crossover and mutation of the previous generation. In crossover, two parent genome are combined to produce the child. In mutation, the parent genome is modified by either adding a new node, adding a new link, or mutating the weights of the network. The probability of an individual’s genome being chosen for the next generation is directly proportional to the individual’s relative fitness with respect to the population. For this reason, any fitness given to every individual in the population at the same time is essentially negated. Once the next generation is produced, it is again evaluated exactly like the generation before it in order to produce another generation. The process is repeated until the generation limit is reached. 3.2 Application Figure 3: Our application popluated with Ants (learning agents with white background), Spiders (scripted predators with red background), and Food (stationary items collected by ants). For our research, we developed a C++ application (affectionately referred to as SuperAnts!) to serve as the learning environment to test our hypothesis. The environment is an n × n rectangular grid world populated by the learning agents, predators, and food. Every entity in the environment is spawned at a random unoccupied location on the board. Movement is turn based meaning every agent is allowed a single move for every step of the environment. The predators are scripted to always move directly towards the nearest learning agent while food particles do not move and are removed from the environment once consumed by a learning agent. If an agent is consumed by a predator, that agent is removed from the environment as with the food and can no longer gain fitness for the rest of the round. In the Hierarchical Neural Network setup, each learning agent has a master network and several behavioral networks with which to make its decision. In our case, the agents’ behavioral networks consist of one network for evading predators and one network for gathering food. The master network is responsible for analysing the immediate environment surrounding the agent and deciding which behavioral network to activate. Once the decision has been made, the behavioral network is given its inputs and determines the appropriate move 7 for the agent to execute the desired behavior. The key aspect of this process is that each network in the system is allowed its own vector of inputs. In other words, a network whose goal is an evasive behavior does not need to be concerned about the location of food particles. Were the agent’s decision making structure a single network, the same network would be responsible for merging all behaviors and also determining which inputs are important to which behaviors and not important to others. 3.2.1 Single Network Agents The Single Network Agents utilized the single network controller and were intended as a control with which to both test learning in our environment and against which to measure the Hierarchical Neural Network Agents. The neural networks for these agents were given five inputs. The first was a bias input, required by neat, which would always be 0. The next two were the direction of the nearest food particle on the x and y axes. The values for each input could be -1, 0, or 1 depending on whether the food particle’s position had a greater, equal, or smaller x or y value. Similarly, the final two inputs were the direction of the nearest predator on the x and y axes. Since the inputs to each agents neural network are egocentric, the networks can generalize their knowledge to any position in the environment and to any size environment. These sensors give the Single Network Agents enough information to learn both food gathering and predator evasion. The agents controller network has two outputs, one for movement in the x direction and one for movement in the y direction. As with the sensors, the values for the movement outputs could be -1, 0, or 1 depending on whether the agents wants to move left, stay still, or move right and similarly for up and down movement. The fitness function for these agents which will define how they learn is solely based on food gathered. Because of this, agents which gather more food than the others each round will have a greater influence on the next generation. Proficiency at predator evasion, however, will only have an indirect effect on agent fitness. No fitness bonus is given to agents who succeed at evading the predators and no fitness is taken away if an agent fails. Any agent that fails to evade the predators will incur an indirect fitness penalty in the form of lost time which could have been spent gathering food. We chose not to include a direct reward or penalty for predator evasion mostly because it is difficult to determine how to assign blame for failure or success. If an agent lives, the predators may simply have chosen to hunt the others and while if an agent dies it could have been the case that he was surrounded but otherwise has above average evasion skill. By leaving the penalty indirect we hoped to minimize the effect of these corner cases. 3.2.2 Hierarchical Neural Network Agents The Hierarchical Neural Network Agents were controlled by the HNN controller and formed the main focus of our experiment. The HNN controller was tested in two parts. It was first tested with the decision network’s behaviors implemented as algorithms. The purpose of this was to test whether the hierarchical structure was capable of outperforming the single network controller under the most advantageous circumstances. In both cases, the decision network remains the same. The decision network is aware of the nearest food particle and the nearest predator and must learn to use this information to activate the right behavioral network. Unlike the previously described single network controller, the output of the decision network is not the next move of the agent but which of the behavioral networks to activate. Once the decision network has made its decision, control is passed off to the behavioral network which will decide the agent’s next move. During 8 evolution, all fitness is attributed to the decision network, the behioral networks are no longer being trained. In the first test, the decision network had two behaviors at its disposal. The first was a gathering behavior whcih would cause the agent to move directly towards the nearest food while disregarding predators. The second behavior was a predator evasion behavior which woul cause the agent to move directly away from the nearest predator. While these were not the most intelligent behaviors, they were sufficient for testing the HNN controllers. It should be noted that this approach is essentially the same as the switch network discussed in [6]. Finally, the Hierarchical Neural Network Agents were tested with all behaviors implemented as neural networks. In this experiment the agents’ decision networks would also have two behaviors at their disposal, one food gathering behavior and one predator evasion behavior. Since the behaviors would be implemented by neural networks in this experiment, new populations had to be created and trained in order to produce the behavioral networks for the decision network to utilize. To do this, the Gatherer and Evader populations were created. 3.2.3 Gatherers and Evaders The Gatherers and Evaders were both created in order to build the behavioral networks which the Hierarchical Neural Network Agents would use to do their learning. For this reason it was critical that both populations be able to learn their task well or the Hierarchical Neural Network Agents would be unable to learn anything. Both groups were trained in their own specific environments until a sufficiently fit network had been evolved at which point the network was saved for future use by the Hierarchical Neural Network Agents. The Gatherers were created to develop the gathering behavioral network for the Hierarchical Neural Network Agents. The Gatherers were controlled by a single neural network which was trained on the food gathering task in the absence of predators. The goal was to create a behavioral network whose sole concern was gathering food, so the only inputs are the direction of the nearest food particle along the x and y axes. The outputs are the same as for all agents, desired movement along the x and y axes. 10 Gatherers were trained for 200 generations of 100 turns each to gather 10 food particles. At the end of training, the gatherer network with the highest fitness was saved for use by the Hierarchical Neural Network Agents. The Evaders were created to learn the predator evasion task in much the same way as the Gatherers. The Evaders were again controlled by a single neural network which would become one of the behavioral networks for the Hierarchical Neural Network Agents. The Evaders’ inputs were the directions to the nearest two predator along the x and y axes. The rationale behind this was to exploit the benefit of HNN’s in being able to better tailor the vector of inputs to each situation. The ouputs were the same as for all learning agents. 10 Evaders were trained for 200 generations of 100 turns to evade 5 predators. Since the Evaders are not concerned with gathering food, the fitness function had to be changed. Instead of scoring individuals on the amount of food collected, the number of turns survived was used. This fitness function was sufficient for the agents to learn to avoid the predators. 9 4 Results 4.1 Experiment We planned to test the Hierarchical Neural Network strategy in several steps. 1. Train agents with single neural network controller 2. Train agents using HNN consisting of decision network and static behavioral algorithms 3. Train agents using HNN consisting of decision network and behavioral networks The single network controller would serve as a baseline to test the Hierarchical Neural Network strategy against, the decision network with behavioral algorithms would provide a proof of concept, and the decision network with behavioral networks would serve as the full implementation of our strategy. In all experiments, the board is 25×25 cells and each population is run for 200 generations of 100 turns each. For all charts, the x axis is the population number while the y axis is average fitness per individual. Due to the random spawning of agents, food, and predators each round, there is a significant amount of noise in the fitness from generation to generation. For this reason, a linear trend line has been added to each graph to show the rise in average fitness over time. 4.2 Single Network Controller Figure 4: Average fitness of a Worker population learning to gather food in the presence of 0 predators using a single neural network controller. As the first test for the single controller setup, the agents were trained to gather food without the presence of predators. In this experiment the agents’ only task was to gather as much food as possible. Since the 10 agents did not have to worry about predators, this experiment would test our application at the most basic level. It was critical for the agents to be able to learn in this experiment before we could hope for them to learn in more complicated situation. The environment was populated with 10 workers and 10 food particles for training. This experiment showed that our agents were successfuly able to learn the food gathering task using NEAT. Figure 4 shows the average fitness across generations of one run of the experiment. The agents are able to improve their average fitness to over three times the initial average. Figure 5: Average fitness of a Worker population learning to gather food in the presence of 2 predators using a single neural network controller. As another test for the single controller setup, the agents were again trained to gather food. This time, however, two predators were added to the environment to chase and destroy the learning agents. This would show if the single network controller was able to learn to gather food in the presence of predators. In this experiment the agents had to learn to balance the task of food gathering with the task of predator evasion. This was essential for our work because the hierarchical neural networks were created to balance multiple behaviors, in this case food gathering and predator evasion, and we needed another successful implementation to test the results of the HNN system against. By complicating the task for the agents, an expected drop in fitness is observed in the population. However, the agents are still able to learn the tasks with NEAT and improve over time. The environment was populated with 10 workers, 10 food particles, and 2 spiders. Our experiment showed that the single neural network system was able to learn the food gathering + evasion task in our application which meant we were ready to test the hierarchical neural network systems. It should be again noted that the single neural network system is able to learn the skill of predator evasion indirectly, with no benefit given for success or penalty incurred for failure except the lost fitness of would-be-collected food. 11 4.3 Decision Network with Behavioral Algorithms Figure 6: Average fitness of a Worker population learning to gather food in the presence of 2 predators using a hierarchical neural network controller choosing between pre-specified behaviors. In the first test for the hierchical neural network controller setup, the agents were again trained to gather food in the presence of predators. The goal of this test was to expand on the first experiment which found that a single neural network controller could learn the food gathering + evasion task in this environment. Once we knew that learning was possible in our application, the next step was to test a simple version of the HNN concept. The environment was populated with 10 workers, 10 food particles, and 2 spiders for the experiment. This test proves the validity of the Hierarchical Neural Network approach. Given useful behavioral algorithms, the decision network can learn to utilize them. The most important result in this experiment is that the hierachical neural network approach could be a viable option given sufficiently sophisticated behaviors. Also significant is that the hierarchical neural network system in this case outperforms the single controller network which was trained under identical circumstances. While this is likely due in some part to the fact that the hierarchical neural network in this case was given hand-crafted behavioral algorithms as its behaviors, the result still demonstrates that it is possible for HNN’s to surpass single neural networks in certain situations. 12 4.4 Decision Network with Behavioral Networks Figure 7: Average fitness of a Worker population learning to gather food in the presence of 2 predators using a hierarchical neural network controller choosing between behaviors implemented as neural networks. The final test for the hierchical neural network controller setup consisted of two new species of agents being created to learn the behaviors which the decision network would utilize. Evaders were created to learn a basic evasion behavior. The Evaders were trained with a population size of 10 and in the presence of 5 predators. The Gatherers were created to collect food as efficiently as possible. The Gatherers were trained with a population size of 10 to gather the 10 food particles on the board. No predators were used during the Gatherer’s training. Once each group was finished training, the neural networks generation with the highest fitness across all 200 generations were saved to serve as the behavioral networks. In the last experiment, the agents were given the full hierarchical neural network setup where the decision network was trained to effectively utlize the saved behavioral networks. The environment was populated with 10 workers, 10 food particles, and 2 spiders. This test proves the validity of the Hierarchical Neural Network approach in its full implementation. The behavioral neural networks, when trained properly, can be used as substitutes for the behavioral algorithms and can be used effectively to train the decision network on the food gathering + evasion task. It is important to note that this implementation both sufficiently improves at the task and outperforms the single network controller. This indicates that in certain circumstances when the behaviors are easy enough to produce and the task sufficiently complex for a single network controller that the hierarchical neural network approach is a useful alternative. 13 5 5.1 Conclusions and Future Work Conclusions The results demonstrate that the Hierarchical Neural Network approach can produce stronger solutions more quickly than the single neural network under certain circumstances. While incurring additional up front cost for the creation of the hierarchical structure, numerous benefits are gained such as the reusability of behavioral networks and reduced fan-in of inputs to each individual network. One interesting point of analysis is the fact that the HNN agents with behavioral algorithms consistently held a higher level of fitness than the HNN agents with behavioral networks over the course of their respective experiments. It seems that this could be due to the simplistic nature of the food gathering and predator evasion tasks. Since the problems are so simple the behavioral algorithms, though very simple in nature themselves, may have been a good enough approximation of an optimal solution. A wothwhile piece of work in the future may be increasing the complexty of the tasks to see if the advantage can be decreased. As was mentioned before, a significant amount of noise is present in the data in the form of large fluctuations in fitness between generations due to the random spawning of agents each round. While this did not prevent learning for any of the three controllers, it certainly affected the data and the future work discusses ways to reduce the noise. Finally, the three data sets were tested for statistical significane. A paired t-test was used pairwise between the three data sets to determine the statistical significance of the results. Between all three data sets, the two-tailed P value was less than 0.0001 indicating extremely statistically significant data. 5.2 Future Work The largest task for future work will be reducing the ammount of noise present in the data due to random spawning. The way we plan to do this is by evaluating the population each generation not one time, but ten or thirty times and using the sum of the fitness of each evaluation as that agent’s fitness. This way, every agent in the population will spawn in a favorable location some rounds and unfavorable locations other rounds and a single unfortunate spawn will not be able to negatively impact an agent’s total fitness so strongly. With these changes, the data should appear more linear with less fluctuations. Also, we hypothesize that with a more accurate picture of each agent’s true fitness every generation, learning will accelerate. 14 Figure 8: Examples of a favorable spawn (left) where the agent is close to food and far from predators and an unfavorable spawn (right) where the agent is close to a predator who blocks the agent’s path to food. Another test which will demonstrate the versatility and robustness of Hierarchical Neural Networks would be to train the networks using other evolutionary algorithms. The beauty of the Hierarchical Neural Network approach is that it is so general. It allows any topology of the hierarchical network and should allow any training algorithm to be used on both the behavioral networks and decision network. Similarly, a hybrid network could be tested consisting of both behavioral networks and behavioral algorithms. This approach could be used to compare the behavioral algorithms and behavioral networks used in this work. Both the algorithms and networks could be given to a single decision network in the same Hierarchical Neural Network, trained, and then analyzed to see if the algorithms or networks were favored. Since the HNN with behavioral algorithms performed better than HNN with behavioral networks, we would expect to see the algorithms used more frequently by the decision network. Also important will be testing this method on more complicated tasks. Complication of the task will consist of both adding additional behaviors that the decision network must utilize and increasing the complexity of the individual behaviors. This work arose from a similiar project in which we set out to test whether neuroevolution was capable of learning through what we called symbiotic coevolution. In symbiotic coevolution, two groups of agents capable of different actions are dependant on the other group for their survival and must learn to support the other group so that they may in turn be supported. Our plan was to simulate worker ants which would be tasked to gather food for soldier ants to eat who would then kill the workers’ predators so the workers could continue to gather food. If the two populations do not learn to work together than the workers are consumed and the soldiers starve. Initial testing showed that this problem was more difficult for the single neural networks than we thought and so the idea of Hierarchical Neural Networks was created in the hope of solving this problem where the single neural network failed. This is considered a less important priority in the short term and will only be pursued after other work mentioned. 15 Acknowledgements Thanks to my supervisor, Risto Miikkulainen, for all his guidance and support which made this work possible. A great deal of the coding and planning for this project was contributed by fellow Turing Scholar Matthew deWett. Matt was essential for the conception and early development of this work. Thanks to Matt for getting the project off the ground and helping shape its direction. This research was begun as part of a project for CS 378i in the Computational Intelligence in Game Design stream in the Freshman Research Initiative. References [1] Bryant, B., and Miikkulainen, R. Neuroevolution for adaptive teams. In Proceedings of the 2003 Congress on Evolutionary Computation (CEC-2003) (2003). [2] Juell, P., and Marsh, R. A hierarchical neural network for human face detection. Pattern Recognition 29, 5 (1995). [3] Montana, D. J., and Davis, L. Training feedforward neural networks using genetic algorithms. In Proceedings of the 11th International Joint Conference on Artifcial Intelligence (1989). [4] Radcliffe, N. J. Genetic set recombination and its application to neural network topology optimization. Neural computing and applications (1993). [5] Schaffer, J. D., W. D., and Eshelman, L. J. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92) (1992). [6] Shimon Whiteson, Nate Kohl, R. M., and Stone, P. Evolving soccer keepaway players through task decomposition. Machine Learning 59, 1 (2005). [7] Stanley, K., and Miikkulainen, R. Efficient reinforcement learning through evolving neural network topologies. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2002) (2002). 16
© Copyright 2026 Paperzz