Hierarchical Neural Networks for Behavior

Hierarchical Neural Networks for Behavior-Based
Decision Making
Undergraduate Honors Thesis
David Robson
Department of Computer Sciences
University of Texas at Austin
[email protected]
Supervising Professor: Risto Miikkulainen
May 10, 2010
Abstract
This paper introduces the concept of Hierarchical Neural Networks as a viable strategy for behaviorbased decision making in certain contexts. Hierarchical Neural Networks, or HNNs, refers in this case to
a system in which multiple neural networks are connected in a manner similar to an acyclic graph. In
this way, responsibility can be divided between each neural network in every layer simplifying the vector
of inputs, of outputs, and the overall complexity of each network resulting in improved performance.
This approach is shown to outperform a single neural network when both systems are tasked to learn a
survival strategy incorporating several behaviors in a real-time environment.
1
Contents
1 Introduction
3
2 Background
5
3 Experiment
3.1 NEAT . . . . . . . . . . . . . . . . . . . . .
3.2 Application . . . . . . . . . . . . . . . . . .
3.2.1 Single Network Agents . . . . . . . .
3.2.2 Hierarchical Neural Network Agents
3.2.3 Gatherers and Evaders . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
6
7
8
8
9
4 Results
4.1 Experiment . . . . . . . . . . . . . . . . . . .
4.2 Single Network Controller . . . . . . . . . . .
4.3 Decision Network with Behavioral Algorithms
4.4 Decision Network with Behavioral Networks .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
10
12
13
5 Conclusions and Future Work
14
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2
1
Introduction
Artificial Intelligence is a rapidly growing field in computer science from which techniques are being used
to solve a wide range of problems. We are constantly learning how to get computers to do things that
only humans used to be able to do and redefining what is ”impossible” for a computer. For years, the
pole-balancing experiment, in which the agent applies force to a cart in order to keep a pole upright on top
of the cart, was used to test new evolutionary algorithms. New techniques have led to old benchmarks like
the beloved pole balancing experiment becoming too easy [7] and called for new ones to be created. This
shows that our understanding of artificial intelligence is constantly growing and allowing us to solve more
difficult tasks.
One area has consistently been the focus of hundreds of researchers who have driven a great number of
the advances in the field. Game playing agents have been around for a long time. Ever since Pong, developers
have tried to create more challenging artificial agents and often good artificial intelligence can be the selling
point for a game. From Pong to checkers and chess, artifical agents continue to get better and after some
time humans are no longer competitive with the best algorithms. However, today’s games are significantly
more complex and require agents to process much more information in real time.
Figure 1: The centuries old chess (left, http://www.filetransit.com/screenshot.php?id=45365) where
on average about 30 moves are possible per board configuration and agents are not required
to perform in real time and a screenshot showing some pathing from Halo ODST (right,
http://aigamedev.com/insider/reviews/halo3odst-squad-patrol/) where the agents need to maneuver a 3d
environment in real time while facing multiple human opponents.
Like the old games, these new ones typically offer artificial agents for the player either without other
people available to play with or looking for a more casual game. Unfortunately, the technology is not
quite to the point where artificial agents can convincingly simulate the behavior of human players. The
computer opponents can rarely challenge even a novice player and are forced to utilize unfair resource or
damage bonuses to compensate for a lack of intelligence. Add on top of that the expected predictability of
a scripted algorithm and serious players are forced to play human opponents for a challenging game. This
is undesirable in an industry with so much potential for profit. What makes these games difficult is the
increased complexity factor. The agent is forced to utilize much more information, make decisions in real
3
time, and accomplish both complex and unpredictable behaviors.
Neural Networks have been applied to a wide range of problems and they are very good at function
approximation and pattern recognition. However in the domain of game intelligence, several problems arise.
First, agents in this domain are typically required to perform several, sometimes unrelated, tasks. Agents
typically learn behaviors like move, hide, and attack, all of which require different information and whose
success is measured differently. If the neural network cannot leverage its knowledge about one behavior
when learning another, each behavior will constantly pull the network in a different direction resulting in
longer training times. Second, the agents typically have a much larger volume of information which is much
more varied than in other applications. To perform a wide range of behaviors, it is usually necessary that
the agent have access to a wide range of information on which to base its decisions. This results in a need
for much larger and more complex networks to solve the task. Additionally, the variety of the data means
that not every piece of information will be necessary or useful for each task and will therefore only serve to
confuse the learning algorithm. Finally, when an agent is responsible for carrying out multiple behaviors in
a simulation it becomes more difficult to assign blame through traditional learning algorithms. How does a
neural network know whether it should move better, hide better, hide more, or hide less if all it knows is
that it lost? In our approach, Hierarchical Neural Networks were used to address these problems.
In an HNN system, several neural networks are used to break the decision-making process into smaller
steps. At the lowest level, this process consists of two steps: decide which behavior to pursue and then act
according to that behavior.
Figure 2: The Decision Network Acts as a switch between the Behavioral Networks.
In this way, knowledge is compartmentalized and each neural network can be finely tuned to perform each
action in the sense that unnecessary inputs can be removed from the network. This removes unnecessary
network complication which allows for faster learning and more easily predictable behavior. Additionally, if
several different teams of agents are needed to perform overlapping sets of actions, the behavioral networks
4
can be reused between agents. If each agent were controlled by a single network then the networks would
need to be recreated every time.
2
Background
This work focuses on two main principles:
• Homogenous teams of agents learning to perform multiple complex behaviors
• Using Hierarchical Neural Networks to divide the learning task
The work discussed in this paper is mainly concerned with finding a better way for a team of agents to
learn multiple behaviors. The method exploits the fact that, for certain problems, the set of tasks the agents
are required to perform is easily divisible and therefore lends itself to the creation of the behavioral networks.
It has been shown that homogenous teams of agents can learn to perform multiple complex behaviors [1]. In
the case of [1], agents each with the ability to perform two separate behaviors divided the work across the
entire team on the fly. This is the essentially the end goal of our work, but we set out to do it in a more
modular, intuitive, and hopefully faster way.
The main focus of this work is leveraging the hierarchical structure to improve performance. Juell and
Marsh [2] used a hierachical neural to learn the task of facial recognition. In their experiment, the parent
network was tasked with recognizing the entire face. To assist it, the child networks were trained to recognize
various facial features like the eyes, nose, and mouth. This allowed the algorithm to generalize better in the
sense that the division of responsibility allowed the networks to recognize multiple faces in images larger than
the set of inputs to the network. This approach different from ours in that it used the lower level networks
to provide advice to the high level network in order to reach a final decision [2]. All the same advantages
of the hierarchical neural network apply, however. The lower level networks are reusable if similar features
need to be recognized, each lower level network can ignore irrelevant information necessary to the other
networks, and the approach to breaking a face into identifiable features is every bit as intuitive as deciding
which behavior to pursue before pursuing it.
Whiteson et al. have tested a very similar approach in their work [6]. The switch network they describe
as a means of evolving soccer keepaway players is essentially a Hierarchical Neural Network with a decision
network choosing between behaviors implemented as algorithms rather than neural networks which we describe later. We take this approach one step further by training the high level network and allowing evolution
to determine when certain behaviors should be used. Their approach of task decomposition equates to the
identification of sub-behaviors for the behavioral networks presented in this paper.
5
3
Experiment
This section describes the NEAT evolutionary algorithm (Section 3.1), as well as the application which served
as the environment for our research (Section 3.2).
3.1
NEAT
The agents in this experiment learn using the NeuroEvolution of Augmenting Topologies (NEAT) algorithm
developed by Kenneth Stanley at The University of Texas. When neural networks are trained using NEAT,
the topology is constantly changed by adding both new nodes and link to the existing network in addition
to the traditional method of changing the network’s weights [7]. In this way, networks trained using NEAT
are more adaptable during training and need less human design. Several additional innovations added by
Neat include:
• Historical marking of genes
• Speciation
• Minimizing dimensionality
Historical marking of genes refers to the association of each gene with a unique global innovation number.
From then on, each gene will always keep its unique global innovation number. Whenever crossover occurs,
the innovation numbers are used to identify identical structures in the parent genes and the offspring can
receive all unique structures from the more fit parent. In this way, the competing convention problem [3, 4, 5]
where multiple equivalent symmetric solutions can evolve is avoided.
Speciation is the attempt to avoid local maxima in the solution space by diverging the population after
a certain amount of time has passed. In order to protect innovation, new individuals are evaluated within
their species so as to not be squeezed out by the mature individuals in the original species. In this way,
new network topologies can be investigated for innovation while protecting both the original species and
innovative species.
Trivially, the more inputs to a neural network, the more dimensions of the solution space that need to
be searched before the best solution can be found. Since NEAT is based on the process of augmenting
the topology by adding new nodes and weights, dimensionality can quickly become a serious problem for
the networks. NEAT minimizes the problem of dimensionality through incremental growth from minimal
structure. Any network trained by NEAT begins life as a minimal network with the necessary input nodes,
output nodes, and links but no hidden units. When hidden units and links are added, the networks fitness
must improve or the structural complexification will be removed from the population. In this way, additional
structure in networks trained by NEAT are guaranteed to contribute to the increased fitness of the network.
Because of these advantages and the experimental results published [7], NEAT was the evolutionary
algorithm of choice for this project. NEAT begins by creating an initial population according to a starting
genome file. This population is then evaluated in some way, usually a simple function, for fitness. In our
application, agents use their neural networks to move in the environment, gathering food along the way
and trying to stay alive as long as possible. After a set number of turns, the round ends and each agent is
evaluated according to its fitness function. In most cases, the fitness of agents in our application is directly
6
proportional to the quantity of food collected. NEAT then produces the next generation of individuals
through crossover and mutation of the previous generation. In crossover, two parent genome are combined
to produce the child. In mutation, the parent genome is modified by either adding a new node, adding a new
link, or mutating the weights of the network. The probability of an individual’s genome being chosen for the
next generation is directly proportional to the individual’s relative fitness with respect to the population. For
this reason, any fitness given to every individual in the population at the same time is essentially negated.
Once the next generation is produced, it is again evaluated exactly like the generation before it in order to
produce another generation. The process is repeated until the generation limit is reached.
3.2
Application
Figure 3: Our application popluated with Ants (learning agents with white background), Spiders (scripted
predators with red background), and Food (stationary items collected by ants).
For our research, we developed a C++ application (affectionately referred to as SuperAnts!) to serve as the
learning environment to test our hypothesis. The environment is an n × n rectangular grid world populated
by the learning agents, predators, and food. Every entity in the environment is spawned at a random
unoccupied location on the board. Movement is turn based meaning every agent is allowed a single move
for every step of the environment. The predators are scripted to always move directly towards the nearest
learning agent while food particles do not move and are removed from the environment once consumed by a
learning agent. If an agent is consumed by a predator, that agent is removed from the environment as with
the food and can no longer gain fitness for the rest of the round.
In the Hierarchical Neural Network setup, each learning agent has a master network and several behavioral
networks with which to make its decision. In our case, the agents’ behavioral networks consist of one network
for evading predators and one network for gathering food. The master network is responsible for analysing
the immediate environment surrounding the agent and deciding which behavioral network to activate. Once
the decision has been made, the behavioral network is given its inputs and determines the appropriate move
7
for the agent to execute the desired behavior. The key aspect of this process is that each network in the
system is allowed its own vector of inputs. In other words, a network whose goal is an evasive behavior does
not need to be concerned about the location of food particles. Were the agent’s decision making structure a
single network, the same network would be responsible for merging all behaviors and also determining which
inputs are important to which behaviors and not important to others.
3.2.1
Single Network Agents
The Single Network Agents utilized the single network controller and were intended as a control with which
to both test learning in our environment and against which to measure the Hierarchical Neural Network
Agents. The neural networks for these agents were given five inputs. The first was a bias input, required by
neat, which would always be 0. The next two were the direction of the nearest food particle on the x and y
axes. The values for each input could be -1, 0, or 1 depending on whether the food particle’s position had
a greater, equal, or smaller x or y value. Similarly, the final two inputs were the direction of the nearest
predator on the x and y axes. Since the inputs to each agents neural network are egocentric, the networks can
generalize their knowledge to any position in the environment and to any size environment. These sensors
give the Single Network Agents enough information to learn both food gathering and predator evasion. The
agents controller network has two outputs, one for movement in the x direction and one for movement in
the y direction. As with the sensors, the values for the movement outputs could be -1, 0, or 1 depending on
whether the agents wants to move left, stay still, or move right and similarly for up and down movement.
The fitness function for these agents which will define how they learn is solely based on food gathered.
Because of this, agents which gather more food than the others each round will have a greater influence
on the next generation. Proficiency at predator evasion, however, will only have an indirect effect on agent
fitness. No fitness bonus is given to agents who succeed at evading the predators and no fitness is taken
away if an agent fails. Any agent that fails to evade the predators will incur an indirect fitness penalty in
the form of lost time which could have been spent gathering food. We chose not to include a direct reward
or penalty for predator evasion mostly because it is difficult to determine how to assign blame for failure or
success. If an agent lives, the predators may simply have chosen to hunt the others and while if an agent
dies it could have been the case that he was surrounded but otherwise has above average evasion skill. By
leaving the penalty indirect we hoped to minimize the effect of these corner cases.
3.2.2
Hierarchical Neural Network Agents
The Hierarchical Neural Network Agents were controlled by the HNN controller and formed the main focus of
our experiment. The HNN controller was tested in two parts. It was first tested with the decision network’s
behaviors implemented as algorithms. The purpose of this was to test whether the hierarchical structure was
capable of outperforming the single network controller under the most advantageous circumstances. In both
cases, the decision network remains the same. The decision network is aware of the nearest food particle
and the nearest predator and must learn to use this information to activate the right behavioral network.
Unlike the previously described single network controller, the output of the decision network is not the next
move of the agent but which of the behavioral networks to activate. Once the decision network has made
its decision, control is passed off to the behavioral network which will decide the agent’s next move. During
8
evolution, all fitness is attributed to the decision network, the behioral networks are no longer being trained.
In the first test, the decision network had two behaviors at its disposal. The first was a gathering behavior
whcih would cause the agent to move directly towards the nearest food while disregarding predators. The
second behavior was a predator evasion behavior which woul cause the agent to move directly away from
the nearest predator. While these were not the most intelligent behaviors, they were sufficient for testing
the HNN controllers. It should be noted that this approach is essentially the same as the switch network
discussed in [6].
Finally, the Hierarchical Neural Network Agents were tested with all behaviors implemented as neural
networks. In this experiment the agents’ decision networks would also have two behaviors at their disposal,
one food gathering behavior and one predator evasion behavior. Since the behaviors would be implemented
by neural networks in this experiment, new populations had to be created and trained in order to produce
the behavioral networks for the decision network to utilize. To do this, the Gatherer and Evader populations
were created.
3.2.3
Gatherers and Evaders
The Gatherers and Evaders were both created in order to build the behavioral networks which the Hierarchical
Neural Network Agents would use to do their learning. For this reason it was critical that both populations
be able to learn their task well or the Hierarchical Neural Network Agents would be unable to learn anything.
Both groups were trained in their own specific environments until a sufficiently fit network had been evolved
at which point the network was saved for future use by the Hierarchical Neural Network Agents.
The Gatherers were created to develop the gathering behavioral network for the Hierarchical Neural
Network Agents. The Gatherers were controlled by a single neural network which was trained on the food
gathering task in the absence of predators. The goal was to create a behavioral network whose sole concern
was gathering food, so the only inputs are the direction of the nearest food particle along the x and y axes.
The outputs are the same as for all agents, desired movement along the x and y axes. 10 Gatherers were
trained for 200 generations of 100 turns each to gather 10 food particles. At the end of training, the gatherer
network with the highest fitness was saved for use by the Hierarchical Neural Network Agents.
The Evaders were created to learn the predator evasion task in much the same way as the Gatherers.
The Evaders were again controlled by a single neural network which would become one of the behavioral
networks for the Hierarchical Neural Network Agents. The Evaders’ inputs were the directions to the nearest
two predator along the x and y axes. The rationale behind this was to exploit the benefit of HNN’s in being
able to better tailor the vector of inputs to each situation. The ouputs were the same as for all learning
agents. 10 Evaders were trained for 200 generations of 100 turns to evade 5 predators. Since the Evaders are
not concerned with gathering food, the fitness function had to be changed. Instead of scoring individuals on
the amount of food collected, the number of turns survived was used. This fitness function was sufficient for
the agents to learn to avoid the predators.
9
4
Results
4.1
Experiment
We planned to test the Hierarchical Neural Network strategy in several steps.
1. Train agents with single neural network controller
2. Train agents using HNN consisting of decision network and static behavioral algorithms
3. Train agents using HNN consisting of decision network and behavioral networks
The single network controller would serve as a baseline to test the Hierarchical Neural Network strategy
against, the decision network with behavioral algorithms would provide a proof of concept, and the decision
network with behavioral networks would serve as the full implementation of our strategy. In all experiments,
the board is 25×25 cells and each population is run for 200 generations of 100 turns each. For all charts,
the x axis is the population number while the y axis is average fitness per individual. Due to the random
spawning of agents, food, and predators each round, there is a significant amount of noise in the fitness from
generation to generation. For this reason, a linear trend line has been added to each graph to show the rise
in average fitness over time.
4.2
Single Network Controller
Figure 4: Average fitness of a Worker population learning to gather food in the presence of 0 predators using
a single neural network controller.
As the first test for the single controller setup, the agents were trained to gather food without the presence
of predators. In this experiment the agents’ only task was to gather as much food as possible. Since the
10
agents did not have to worry about predators, this experiment would test our application at the most basic
level. It was critical for the agents to be able to learn in this experiment before we could hope for them to
learn in more complicated situation. The environment was populated with 10 workers and 10 food particles
for training. This experiment showed that our agents were successfuly able to learn the food gathering task
using NEAT. Figure 4 shows the average fitness across generations of one run of the experiment. The agents
are able to improve their average fitness to over three times the initial average.
Figure 5: Average fitness of a Worker population learning to gather food in the presence of 2 predators using
a single neural network controller.
As another test for the single controller setup, the agents were again trained to gather food. This time,
however, two predators were added to the environment to chase and destroy the learning agents. This would
show if the single network controller was able to learn to gather food in the presence of predators. In this
experiment the agents had to learn to balance the task of food gathering with the task of predator evasion.
This was essential for our work because the hierarchical neural networks were created to balance multiple
behaviors, in this case food gathering and predator evasion, and we needed another successful implementation
to test the results of the HNN system against. By complicating the task for the agents, an expected drop in
fitness is observed in the population. However, the agents are still able to learn the tasks with NEAT and
improve over time. The environment was populated with 10 workers, 10 food particles, and 2 spiders. Our
experiment showed that the single neural network system was able to learn the food gathering + evasion
task in our application which meant we were ready to test the hierarchical neural network systems. It should
be again noted that the single neural network system is able to learn the skill of predator evasion indirectly,
with no benefit given for success or penalty incurred for failure except the lost fitness of would-be-collected
food.
11
4.3
Decision Network with Behavioral Algorithms
Figure 6: Average fitness of a Worker population learning to gather food in the presence of 2 predators using
a hierarchical neural network controller choosing between pre-specified behaviors.
In the first test for the hierchical neural network controller setup, the agents were again trained to gather
food in the presence of predators. The goal of this test was to expand on the first experiment which found
that a single neural network controller could learn the food gathering + evasion task in this environment.
Once we knew that learning was possible in our application, the next step was to test a simple version of
the HNN concept. The environment was populated with 10 workers, 10 food particles, and 2 spiders for
the experiment. This test proves the validity of the Hierarchical Neural Network approach. Given useful
behavioral algorithms, the decision network can learn to utilize them. The most important result in this
experiment is that the hierachical neural network approach could be a viable option given sufficiently sophisticated behaviors. Also significant is that the hierarchical neural network system in this case outperforms
the single controller network which was trained under identical circumstances. While this is likely due in
some part to the fact that the hierarchical neural network in this case was given hand-crafted behavioral
algorithms as its behaviors, the result still demonstrates that it is possible for HNN’s to surpass single neural
networks in certain situations.
12
4.4
Decision Network with Behavioral Networks
Figure 7: Average fitness of a Worker population learning to gather food in the presence of 2 predators using
a hierarchical neural network controller choosing between behaviors implemented as neural networks.
The final test for the hierchical neural network controller setup consisted of two new species of agents being
created to learn the behaviors which the decision network would utilize. Evaders were created to learn
a basic evasion behavior. The Evaders were trained with a population size of 10 and in the presence of
5 predators. The Gatherers were created to collect food as efficiently as possible. The Gatherers were
trained with a population size of 10 to gather the 10 food particles on the board. No predators were used
during the Gatherer’s training. Once each group was finished training, the neural networks generation with
the highest fitness across all 200 generations were saved to serve as the behavioral networks. In the last
experiment, the agents were given the full hierarchical neural network setup where the decision network
was trained to effectively utlize the saved behavioral networks. The environment was populated with 10
workers, 10 food particles, and 2 spiders. This test proves the validity of the Hierarchical Neural Network
approach in its full implementation. The behavioral neural networks, when trained properly, can be used
as substitutes for the behavioral algorithms and can be used effectively to train the decision network on the
food gathering + evasion task. It is important to note that this implementation both sufficiently improves
at the task and outperforms the single network controller. This indicates that in certain circumstances when
the behaviors are easy enough to produce and the task sufficiently complex for a single network controller
that the hierarchical neural network approach is a useful alternative.
13
5
5.1
Conclusions and Future Work
Conclusions
The results demonstrate that the Hierarchical Neural Network approach can produce stronger solutions
more quickly than the single neural network under certain circumstances. While incurring additional up
front cost for the creation of the hierarchical structure, numerous benefits are gained such as the reusability
of behavioral networks and reduced fan-in of inputs to each individual network.
One interesting point of analysis is the fact that the HNN agents with behavioral algorithms consistently
held a higher level of fitness than the HNN agents with behavioral networks over the course of their respective
experiments. It seems that this could be due to the simplistic nature of the food gathering and predator
evasion tasks. Since the problems are so simple the behavioral algorithms, though very simple in nature
themselves, may have been a good enough approximation of an optimal solution. A wothwhile piece of work
in the future may be increasing the complexty of the tasks to see if the advantage can be decreased.
As was mentioned before, a significant amount of noise is present in the data in the form of large
fluctuations in fitness between generations due to the random spawning of agents each round. While this
did not prevent learning for any of the three controllers, it certainly affected the data and the future work
discusses ways to reduce the noise.
Finally, the three data sets were tested for statistical significane. A paired t-test was used pairwise
between the three data sets to determine the statistical significance of the results. Between all three data
sets, the two-tailed P value was less than 0.0001 indicating extremely statistically significant data.
5.2
Future Work
The largest task for future work will be reducing the ammount of noise present in the data due to random
spawning. The way we plan to do this is by evaluating the population each generation not one time, but
ten or thirty times and using the sum of the fitness of each evaluation as that agent’s fitness. This way,
every agent in the population will spawn in a favorable location some rounds and unfavorable locations
other rounds and a single unfortunate spawn will not be able to negatively impact an agent’s total fitness so
strongly. With these changes, the data should appear more linear with less fluctuations. Also, we hypothesize
that with a more accurate picture of each agent’s true fitness every generation, learning will accelerate.
14
Figure 8: Examples of a favorable spawn (left) where the agent is close to food and far from predators and
an unfavorable spawn (right) where the agent is close to a predator who blocks the agent’s path to food.
Another test which will demonstrate the versatility and robustness of Hierarchical Neural Networks would
be to train the networks using other evolutionary algorithms. The beauty of the Hierarchical Neural Network
approach is that it is so general. It allows any topology of the hierarchical network and should allow any
training algorithm to be used on both the behavioral networks and decision network.
Similarly, a hybrid network could be tested consisting of both behavioral networks and behavioral algorithms. This approach could be used to compare the behavioral algorithms and behavioral networks used
in this work. Both the algorithms and networks could be given to a single decision network in the same
Hierarchical Neural Network, trained, and then analyzed to see if the algorithms or networks were favored.
Since the HNN with behavioral algorithms performed better than HNN with behavioral networks, we would
expect to see the algorithms used more frequently by the decision network.
Also important will be testing this method on more complicated tasks. Complication of the task will
consist of both adding additional behaviors that the decision network must utilize and increasing the complexity of the individual behaviors. This work arose from a similiar project in which we set out to test
whether neuroevolution was capable of learning through what we called symbiotic coevolution. In symbiotic
coevolution, two groups of agents capable of different actions are dependant on the other group for their
survival and must learn to support the other group so that they may in turn be supported. Our plan was to
simulate worker ants which would be tasked to gather food for soldier ants to eat who would then kill the
workers’ predators so the workers could continue to gather food. If the two populations do not learn to work
together than the workers are consumed and the soldiers starve. Initial testing showed that this problem
was more difficult for the single neural networks than we thought and so the idea of Hierarchical Neural
Networks was created in the hope of solving this problem where the single neural network failed. This is
considered a less important priority in the short term and will only be pursued after other work mentioned.
15
Acknowledgements
Thanks to my supervisor, Risto Miikkulainen, for all his guidance and support which made this work possible.
A great deal of the coding and planning for this project was contributed by fellow Turing Scholar Matthew
deWett. Matt was essential for the conception and early development of this work. Thanks to Matt for getting
the project off the ground and helping shape its direction.
This research was begun as part of a project for CS 378i in the Computational Intelligence in Game
Design stream in the Freshman Research Initiative.
References
[1] Bryant, B., and Miikkulainen, R. Neuroevolution for adaptive teams. In Proceedings of the 2003
Congress on Evolutionary Computation (CEC-2003) (2003).
[2] Juell, P., and Marsh, R. A hierarchical neural network for human face detection. Pattern Recognition
29, 5 (1995).
[3] Montana, D. J., and Davis, L. Training feedforward neural networks using genetic algorithms. In
Proceedings of the 11th International Joint Conference on Artifcial Intelligence (1989).
[4] Radcliffe, N. J. Genetic set recombination and its application to neural network topology optimization.
Neural computing and applications (1993).
[5] Schaffer, J. D., W. D., and Eshelman, L. J. Combinations of genetic algorithms and neural
networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations
of Genetic Algorithms and Neural Networks (COGANN-92) (1992).
[6] Shimon Whiteson, Nate Kohl, R. M., and Stone, P. Evolving soccer keepaway players through
task decomposition. Machine Learning 59, 1 (2005).
[7] Stanley, K., and Miikkulainen, R. Efficient reinforcement learning through evolving neural network
topologies. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2002)
(2002).
16