Togelius2006Evolving

Evolving robust and specialized car racing skills
Julian Togelius
Department of Computer Science
University of Essex, UK
[email protected]
Abstract— Neural network-based controllers are evolved for
racing simulated R/C cars around several tracks of varying
difficulty. The transferability of driving skills acquired when
evolving for a single track is evaluated, and different ways of
evolving controllers able to perform well on many different
tracks are investigated. It is further shown that such generally
proficient controllers can reliably be developed into specialized
controllers for individual tracks. Evolution of sensor parameters
together with network weights is shown to lead to higher final
fitness, but only if turned on after a general controller is
developed, otherwise it hinders evolution. It is argued that
simulated car racing is a scalable and relevant testbed for
evolutionary robotics research, and that the results of this
research can be useful for commercial computer games.
Keywords: Evolutionary robotics, games, car racing,
driving, incremental evolution
I. I NTRODUCTION
Car racing is a remarkably popular preoccupation - both to
watch and to participate in - be it in a computer simulation
or in the “real world”. But it is not only popular, it is
also challenging: racing well requires fast and accurate
reactions, knowledge of the car’s behaviour in different
environments, and various forms of real-time planning, such
as path planning and deciding when to overtake a competitor.
In other words, it requires many of the core components
of intelligence being researched within computational intelligence and robotics. The success of the recent DARPA
Grand Challenge[1], where completely autonomous real cars
raced in a demanding desert environment, may be taken as
a measure of the interest in car racing within these research
communities.
This paper deals with using evolutionary algorithms to
create neural network controllers for simulated car racing.
Specifically, we evolve controllers that have robust performance over different tracks, and can be specialized to work
better on particular tracks.
Evolutionary robotics (the use of evolutionary algorithms
for embodied control problems) and simulated car racing
are in many ways ideal companions. The benefit for the
development of racing games and simulations is clear: evolutionary robotics offers a way to automatically develop
controllers, possibly specialized for specific tracks or types
of tracks, driving styles, skill levels, competitors etc. One
could envision a racing simulator where the user is allowed to
construct his own tracks and cars, and the game automatically
develops a set of controllers to drive these tracks. The game
could also automatically adapt to the user’s driving style,
Simon M. Lucas
Department of Computer Science
University of Essex, UK
[email protected]
or learn from other drivers (humans or machines) on the
Internet.
The benefits for evolutionary robotics might require some
explanation. While evolutionary robotics has successfully
been used for various interdisciplinary investigations (e.g. of
memory mechanisms, neural architectures and evolutionary
dynamics), and for parameter tuning of some more complex
controllers, its approximately 15 years of development have
not seen much scaling up[2]. That is, we have yet to see the
evolution of robot controllers (as opposed to just parameters
of such) for any really complex problem - problems where
artificial evolution becomes a superior alternative to manual
design of controllers.
We believe that some of the reason for this lack of progress
is the limited environments, sensor data, embodiments, and
tasks in most evolutionary robotics experiments. A typical
such experiment uses a semi-holonomic robot operating in
an impoverished environment (in many ways resembling
a “Skinner box”, the simplistic boxes pioneered by B. F.
Skinner for studying operant conditioning[3]), using simple,
low-bandwidth sensor input, doing a task that is hard to incrementally scale up. The car racing task uses a more complex
and interesting robot morphology, as a car is more complex
to control than a semi-holonomic robot, but at the same
time it has more capabilites. While a simple racing track
might be as impoverished an environment as ever a Skinner
box, it can be scaled up. A controller might be evolved
to race a simple track, which can then be progressively
complexified (by adding competitors, gears, crossroads, blind
alleys, bridges, jumps etc.) up to and above the level of the
DARPA Grand Challenge, without ever changing the nature
of the fitness function, thus ensuring smooth scaling up.
This solution to the problems of the environment and task
scalability does come at cost: the car will probably need ever
more sophisticated sensors, including high-bandwitdh visual
input, to navigate more complex tracks. But such input can be
supplied, if we use one of today’s graphically sophisticated
racing games as experimental environment. This shifts the
problem to one of controller encodings that can handle such
complex input.
A. Prior research
1) Evolutionary car racing: A few investigations into
evolutionary car racing can be found in the recent literature. Togelius and Lucas[4] investigated various controller
architectures and sensor input representations for simulated
car racing. It was concluded that the only combination out
of those studied that allows evolution to reliably produce
good racing controllers uses neural networks informed by
egocentric information from range-finder sensors. Best performance was achieved by making ranges and angles of the
rangefinders evolvable, and providing the network with a
further sensor indicating angle to the next waypoint. The
controllers were only tried on one track, but some noise
was introduced into the environment and the track was
surrounded by impenetrable walls. The best of the evolved
controllers outperformed all of a small sample human competitors.
Stanley et al.[5] used a similar setup - neural networks informed by range-finders - in an experiment aimed at evolving
both controllers and crash-warning systems for subsequently
impaired controllers. The experiment was conducted on a
single track in simulation (using the RARS simulator[6]),
and the track was not surrounded by walls, so the car was
allowed (at a fitness penalty) to venture outside the track.
In another interesting experiment, Floreano et al. evolved
neural networks for simulated car racing using first-person
visual input from the driving simulator Carworld[7][8]. However, only 5 x 5 pixels of the visual field was used as inputs
for the network; the position of these pixels was dynamically
selected by the network, in a process known as active vision.
A different approach to evolutionary car racing was taken
by Tanev et al., who evolved parameters for a hand-coded
racing car controller, using anticipatory modeling of the
car’s position[9]. While the amount of human input into
the controller design process is arguably higher in this case,
this approach allowed evolution of controllers for real radiocontrolled cars without an intermediary simulation step.
Also related is the work of Wloch and Bentley, who used
a human-designed controller built into a high quality racing
simulator, but used artificial evolution to optimize all physical
and mechanical parameters of the car[10]. Evolution here
managed to come up with car configurations that performed
better than any of the stock cars in the simulator.
2) Supervised learning and real-world applications: Machine learning techniques have also been used in real-world
car driving and racing applications, though these techniques
have been forms of supervised learning rather than evolutionary learning. Perhaps most well-known of these is
Pomerleau’s use of backpropagation to train neural networks
to associate pre-processed video input with a human driver’s
actions, leading to a controller able to keep a real car on
the road and take curves appropriately[11]. More recently,
the winning team in the DARPA Grand Challenge made
extensive use of machine learning in developing their car
controller.
Going from physical reality to virtual reality, the Microsoft’s Xbox video game Forza Motorsport is worthy
of mention, as all the opponent car controllers have been
trained by supervised learning of human player data, instead
of the usual racing game technique of blindly following
precalculated racing lines[12]. The player can even train his
own “drivatars” to race tracks in his place, after they have
acquired his or her individual driving style.
Supervised learning, however, ultimately suffers from requiring good training data. Sometimes such training data
is simply not available, at other times it is prohibitively
expensive to obtain, and at yet other times imitating human
drivers is simply not what we want.
B. Motivations for this paper
While the research referred to above has shown the usefulness of evolutionary robotics techniques for car racing, the
controllers have in all those cases only been tested on a single
track, and sometimes with severe simplifying assumptions,
such as being able to drive through walls. Thus, the first
objective of the research reported in this paper is to evolve
neural network controllers each capable of competitively and
reliably navigating a variety of different tracks, including
tracks they have not been trained on. Based on the rangefinding and aimpoint sensors proposed in[4], we investigate
which sensor setup and evolutionary sequence allows us to
create such controllers.
A second objective is to investigate whether evolution of
a specialized controller, i.e. one performing very well on a
particular track, can be sped up by starting from an already
evolved “general” controller. Such a process could be useful
for example in a racing game, where users are allowed to
design tracks and a controller providing good performance
on such tracks needs to be created on the fly.
The concrete questions we pose and try to answer are
the following: How robust is the evolutionary algorithm,
that is, how certain can we be that a given evolutionary
run will produce a proficient controller for a given track?
Is the layout of the racing track directly influencing the
fitness landscape so that some tracks are much harder than
others to evolve, while not being impossible to drive? What
is the transferability of knowledge gained in evolving for
one track in terms of performance on other tracks? Can we
evolve controllers that can proficiently race all tracks in our
training set? How? Can such generally proficient controllers
be used to reliably create specialized controllers that perform
well, but only on particular tracks? Finally, can this be done
even for tracks for which it is not possible to evolve a good
controller from scratch?
While this investigation primarily addresses the scalability
of the problem domain (and to some extent of the sensor/network combination), it may also be of use for practical
applications such as racing games to find out the most
reliable ways to evolve proficient controllers.
C. Overview of the paper
The paper is laid out as follows: first, we describe the
characteristics of the car racing simulation we will be using,
including sensor models, tracks, and how this models differs
from the problem of racing real radio-controlled cars. The
next section details the neural networks and evolutionary algorithm we employ. We then proceed to describe experiments
on evolving controllers optimized for the individual tracks
from scratch, followed by a section where we investigate
Fig. 1. The eight tracks. Notice how tracks 1 and 2 (at the top), 3 and
4, 5 and 6 differ in the clockwise/anti-clockwise layout of waypoints and
associated starting points. Tracks 7 and 8 have no relation to each other
apart from both being difficult.
damaging such cars in collisions is harder due to their low
weight.
The dynamics of the car are based on a reasonably detailed
mechanical model, taking into account the small size of the
car and bad grip on the surface, but is not based on any actual
measurement [13][14]. The model is similar to that used in
[4], and differs mainly in its improved collision handling;
after more experience with the physical R/C cars the collision
response system was reimplemented to make collisions more
realistic (and, as an effect, more undesirable). Now, a collison
may cause the car to get stuck if the wall is struck at an
unfortunate angle, something often seen in experiments with
physical cars.
A track consists of a set of walls, a chain of waypoints,
and a set of starting positions and directions. When a car
is added to a track in one of the starting positions, with
corresponding starting direction, both the position and angle
being subject to random alterations. The waypoints are used
for fitness calculations.
For the experiments we have designed eight different
tracks, presented in figure 1. The tracks are designed to
vary in difficulty, from easy to hard. Three of the tracks
are versions of three other tracks with all the waypoints
in reverse order, and the directions of the starting positions
reversed.
The main differences between our simulation and the
real R/C car racing problem have to do with sensing. As
reported in Tanev et al. as well as [4], there is a small but
not unimportant lag in the communication between camera,
computer and car, leading to the controller acting on outdated
perceptions. Apart from that, there is often some error
in estimations of the car’s position and velocity from an
overhead camera. In contrast, the simulation allows instant
and accurate information to be fed to the controller.
III. E VOLVABLE INTELLIGENCE
how to evolve controllers that provide robust performance
over several tracks. These controllers are then validated on
tracks for which they have not been evolved. Finally, these
controllers are further evolved to provide better fitness on
specific tracks, conclusions are drawn, and further research
is suggested.
II. T HE CAR RACING MODEL
The experiments in this article were performed in a
2-dimensional simulator, intended to qualitatively if not
quantitatively, model a standard radio-controlled (R/C) toy
car (approximately 17 centimeters long) in an arena with
dimensions approximately 3*2 meters, where the track is
delimited by solid walls. The simulation has the dimensions
400*300 pixels, and the car measures 20*10 pixels.
R/C toy car racing differs from racing full-sized cars in
several ways. One is the simplified controls; many R/C cars
have only three possible drive modes (forward, backward,
and neutral) and three possible steering modes (left, right
and center). Other differences are that many toy cars have
bad grip on many surfaces, leading to easy skidding, and that
A. Sensors
The car experiences its environment through two types
of sensors: the waypoint sensor, and the wall sensors. The
waypoint sensor gives the difference between the car’s current orientation and the angle to the next waypoint (but not
the distance to the waypoint). When pointing straight to a
waypoint, this sensor thus outputs 0, when the waypoint is
to the left of the car it outputs a positive value, and vice versa.
As for the wall sensors, each sensor has an angle (relative to
the orientation of the car) and a range, between 0 and 200
pixels. The output of the wall sensor is zero if no wall is
encountered along a line with the specified angle and range
from the centre of the car, otherwise it is a fraction of one,
depending on how close to the car the sensed wall is. A small
amount of noise is applied to all sensor readings, as it is to
starting positions and orientations.
In some of the experiments the sensor parameters are
mutated by the evolutionary algorithm, but in all experiments
they start from the following setup: one sensor points straight
forward (0 radians) in the direction of the car and has
T rack
1
2
3
4
5
6
7
8
10
0.32 (0.07)
0.38 (0.24)
0.32 (0.09)
0.53 (0.17)
0.45 (0.08)
0.4 (0.08)
0.3 (0.07)
0.16 (0.02)
50
0.54 (0.2)
0.49 (0.38)
0.97 (0.5)
1.3 (0.48)
0.95 (0.6)
0.68 (0.27)
0.35 (0.05)
0.19 (0.03)
100
0.7 (0.38)
0.56 (0.36)
1.47 (0.63)
1.5 (0.54)
0.95 (0.58)
1.02 (0.74)
0.39 (0.09)
0.2 (0.01)
200
0.81 (0.5)
0.71 (0.5)
1.98 (0.66)
2.33 (0.59)
1.65 (0.45)
1.29 (0.76)
0.46 (0.13)
0.2 (0.01)
P r.
2
2
7
9
8
5
0
0
TABLE I
T HE FITNESS OF THE BEST CONTROLLER
OF VARIOUS GENERATIONS ON
THE DIFFERENT TRACKS , AND NUMBER OF RUNS PRODUCING
PROFICIENT CONTROLLERS .
Fig. 2. The initial sensor setup, which is kept throughout the evolutionary
run for those runs where sensor parameters are not evolvable. Here, the car
is seen in close-up moving upward-leftward. At this particular position, the
front-right sensor returns a positive number very close to 0, as it detects a
wall near the limit of its range; the front-left sensor returns a number close
to 0.5, and the back sensor a slightly larger number. The front, left and right
sensors do not detect any walls at all and thus return 0.
range 200 pixels, as has three sensors pointing forwardleft, forward-right and backward respectively. The two other
sensors, which point left and right, have reach 100; this is
illustrated in figure 2.
B. Neural networks
The controllers in the experiments below are based on
neural networks. More precisely, we are using multilayer
perceptrons with three neuronal layers (two adaptive layers)
and tanh activation functions. A network has at least three
inputs: one fixed input with the value 1, one speed input
in the approximate range [0..3], and one input from the
waypoint sensor, in the range [-Π..Π]. In addition to this,
it might have any number of inputs from wall sensors, in
the range [0..1]. All networks have two outputs, which are
interpreted as driving commands for the car.
C. Evolutionary algorithm
The genome is an array of floating point numbers, of
variable or fixed length depending on the experimental setup.
Apart from information on the number of wall sensors and
hidden neurons, it encodes the orientation and range of the
wall sensors, and weights of the connections in the neural
network.
The evolutionary algorithm used is a kind of evolutionary
strategy, with µ = 50 and δ = 50. In other words, 50
genomes (the elite) are created at the start of evolution. At
each generation, one copy is made of each genome in the
elite, and all copies are mutated. After that, fitness value is
calculated for each genome, and the 50 best individuals of
all 100 form the new elite.
There are two mutation operators: Gaussian mutation
of all weight values, and Gaussian mutation of all sensor
parameters (angles and lengths), which might be turned on
or off. In both cases, the standard deviation of the Gaussian
distribution was set to 0.3.
Last but not least: the fitness function. The fitness of a
controller is calculated as the number of waypoints it has
F ITNESS AVERAGED OVER 10
SEPARATE
EVOLUTIONARY RUNS ; STANDARD DEVIATION BETWEEN PARENTHESES .
passed, divided by the number of waypoints in the track,
plus an intermediate term representing how far it is on its way
to the next waypoint, calculated from the relative distances
between the car and the previous and next waypoint. A
fitness of 1.0 thus means having completed one full track
within the alloted time. Waypoints can only be passed in the
correct order, and a waypoint is counted as passed when the
centre of the car is within 30 pixels from the waypoint. In
the evolutionary experiments reported below, each car was
allowed 700 timesteps (enough to do two to three laps on
most tracks in the test set) and fitness was averaged over
three trials.
IV. E VOLVING TRACK - SPECIFIC CONTROLLERS
The first experiments consisted in evolving controllers for
the eight tracks separately, in order to the test the software
in general and to rank the difficulty of the tracks.
For each of the tracks, the evolutionary algorithm was run
10 times, each time starting from a population of “clean”
controllers, with all connection weights set to zero and sensor
parameters as explained above. Only weight mutation was
allowed. The evolutionary runs were for 200 generations
each.
A. Fixed sensor parameters
1) Evolving from scratch: The results are listed in table I,
which is read as follows: each row represents the results for
one particular track. The first column gives the mean of the
fitnesses of the best controller of each of the evolutionary
runs at generation 10, and the standard deviation of the
fitnesses of the same controllers. The next three columns
present the results of the same calculations at generations 50,
100 and 200, respectively. The “Pr” column gives the number
of proficient best controllers for each track. An evolutionary
run is deemed to have produced a proficient controller if
its best controller at generation 200 has a fitness (averaged,
as always, over three trials) of at least 1.5, meaning that it
completes at least one and a half lap within the allowed time.
For the first two tracks, proficient controllers were produced by the evolutionary process within 200 generations,
but only in two out of ten runs. This means that while it is
possible to evolve neural networks that can be relied on to
T rack
1
2
3
4
5
6
7
8
10
0.3 (0.05)
0.32 (0.09)
0.53 (0.22)
1.37 (0.89)
0.4 (0.07)
0.48 (0.12)
0.33 (0.11)
0.16 (0.02)
50
0.58 (0.17)
0.72 (0.4)
1.39 (0.51)
2.25 (0.34)
0.64 (0.35)
0.7 (0.29)
0.43 (0.08)
0.21 (0)
100
0.65
0.81
2.77
2.42
0.95
0.83
0.44
0.21
(0.18)
(0.49)
(0.66)
(0.37)
(0.55)
(0.39)
(0.08)
(0)
200
0.89 (0.4)
0.91 (0.6)
1.99 (0.7)
2.41 (0.36)
1.31 (0.66)
0.99 (0.65)
0.5 (0.15)
0.21 (0)
P r.
1
3
7
10
4
2
0
0
sensors. However, there are some interesting differences, and
the controllers evolved for track 1, 2 and 6 (but not the others)
actually perform better on tracks for which they were not
evolved. It is quite hard to see any kind of logic in which
controllers will do well on which tracks, except those they
were evolved for, and more data would definitely be needed
to resolve this.
TABLE III
E VOLVING
CONTROLLERS FOR INDIVIDUAL TRACKS FROM SCRATCH
WITH SENSOR MUTATION TURNED ON ; FORMAT AS IN TABLE
I.
race around one of these track without getting stuck or taking
excessively long time, the evolutionary process in itself is
not reliable. In fact, most of the evolutionary runs are false
starts. For tracks 3, 4, 5 and 6, the situation is different
as at least half of all evolutionary runs produce proficient
controllers. The best evolved controllers for these tracks get
around the track fairly fast without colliding with walls. For
tracks 7 and 8, however, we have not been able to evolve
proficient controllers from scratch at all. The “best” (least
bad) controllers evolved for track 7 might get halfway around
the track before getting stuck on a wall, or losing orientation
and starting to move back along the track.
2) Generality of evolved controllers: Next, we examined
the generality of these controllers by testing their performance of the best controller for each track on each of the
ten tracks. The results are presented in figure II, and clearly
show that the generality is very low. No controller performed
very well on any track it had not been evolved on, with the
interesting exception of the controller evolved for track 1,
that actually performed better on track 3 than on the track
for which it had been evolved, and on which it had a rather
mediocre performance. It should be noted that both track
1 and track 3 (like all odd-numbered tracks) run counterclockwise, and there indeed seems to be a slight bias for the
other controllers to get higher fitness on tracks running in
the same direction as the track for which they were evolved.
We have not analysed this further.
B. Evolved sensor parameters
1) Evolving from scratch: Evolving controllers from
scratch with sensor parameter mutations turned on resulted in
somewhat lower average fitnesses and numbers of proficient
controllers, as can be seen in table III. The controllers that
reached proficiency seemed to be roughly equally fit as those
evolved with fixed sensors, but more evolutionary runs got
stuck in some local optimum and never produced proficient
controllers when sensor parameters were evolvable. It is not
known whether this is simply because of the increase in
search space dimensionality caused by the addition of sensor
parameters, or if they complicate the evolutionary process in
some other way.
2) Generality of evolved controllers: Controllers evolved
with evolvable sensor parameters turn out to generalize really
badly, almost as badly as the controllers evolved with fixed
V. E VOLVING ROBUST DRIVING SKILLS
The next suite of experiments were on evolving robust
controllers, i.e. controllers that can drive proficiently on a
large set of tracks.
A. Simultaneous evolution
Our first attempt consisted in evolving controllers on all
tracks at once. For this purpose, we ran several evolutionary
runs where each controller was tested on all the first six
tracks, each for three trials, and the fitness was averaged over
all these trials. We ran several evolutionary runs with this
setup, and with both evolvable and fixed sensor parameters,
for long periods of time, but found very little progress - no
controller reached an average fitness above 1.
B. Incremental evolution
Abandoning this method, we tried incremental evolution.
The idea here was to evolve a controller on one track, and
when it reached proficiency (mean fitness above 1.5) add
another track to the training set - so that controllers are
now evaluated on both tracks and fitness averaged - and
continue evolving. This procedure is then repeated, with a
new track added to the fitness function each time the best
controller of the population has an average fitness of 1.5 or
over, until we have a controller that races all of the first
six tracks proficiently. The order of the tracks was 5, 6,
3, 4, 1 and finally 2, the rationale being that the balance
between clockwise and counterclockwise should be as equal
as possible in order to prevent lopsided controllers, and that
easier tracks should be added to the mix before harder ones.
This approach turned out to work much better than simultaneous evolution. Several runs were performed, and
while some of them failed to produce generally proficient
controllers, some others fared better. A successful run usually
takes a long time, on the order of several hundred generations, but it seems that once a run has come up with a
controller that is proficient on the first three or four tracks,
it almost always proceeds to produce a generally proficient
controller. One of the successful runs is depicted in figure 3,
and the mean fitness of the best controller of that run when
tested on all eight tracks separately is shown in IV. As can
be seen from this table, the controller does a good job on the
six tracks for which it was evolved, bar that it occasionally
gets stuck on a wall in track 2. It never makes its way around
track 7 or 8.
The successful runs were all made with sensor mutation
turned off. Some runs of incremental evolution were made
with sensor mutation allowed; however, they failed to produce any proficient controllers. We speculate that this is
Evo/T est
1
2
3
4
5
6
7
8
1
1.02
0.28
0.58
0.15
0.07
1.33
0.45
0.16
(0.14)
(0.06)
(0.16)
(0.01)
(0.02)
(0.18)
(0.11)
(0.03)
2
0.87 (0.1)
1.13 (0.35)
0.6 (0.22)
0.32 (0.02)
-0.02 (0)
0.43 (0.07)
0 (0.07)
0.28 (0.04)
3
1.45 (0.18)
0.18 (0.1)
2.1 (0.48)
0.06 (0.05)
0.05 (0)
0.4 (0.2)
0.6 (0.18)
0.09 (0.07)
4
0.52 (0)
0.75 (0.26)
1.45 (0.66)
1.77 (0.52)
0.2 (0.11)
0.67 (0.22)
0.03 (0.04)
0.29 (0.18)
5
1.26 (0.17)
0.5 (0.13)
0.62 (0.13)
0.22 (0.1)
2.37 (0.28)
1.39 (0.42)
0.36 (0.08)
0.21 (0.03)
6
0.03 (0)
0.66 (0.19)
0.04 (0.1)
0.13 (0.13)
0.1 (0.04)
2.34 (0.05)
0.07 (0.03)
0.08 (0.1)
7
0.2 (0.18)
0.18 (0.15)
0.03 (0.09)
0.07 (0.09)
0.03 (0.05)
0.13 (0.13)
0.22 (0.15)
0.1 (0.09)
8
0.13
0.14
0.14
0.13
0.13
0.14
0.08
0.13
(0)
(0.02)
(0.02)
(0.02)
(0.01)
(0.11)
(0)
(0)
TABLE II
T HE FITNESS OF EACH CONTROLLER ON EACH TRACK . E ACH ROW REPRESENTS THE PERFORMANCE OF THE BEST CONTROLLER
EVOLUTIONARY RUN WITH FIXED SENSORS , EVOLVED THE TRACK WITH THE SAME NUMBER AS THE ROW.
PERFORMANCE OF THE CONTROLLERS ON THE TRACK WITH THE SAME NUMBER AS THE COLUMN .
TRIALS OF THE CONTROLLER GIVEN BY THE ROW ON THE TRACK GIVEN BY THE COLUMN .
E ACH
OF ONE
COLUMN REPRESENTS THE
E ACH CELL CONTAINS THE MEAN FITNESS OF 50
C ELLS WITH
BOLD TEXT INDICATE THE TRACK ON WHICH
A CERTAIN CONTROLLER PERFORMED BEST.
T rack
Fitness/sd
1
1.66 (0.08)
2
1.48 (0.25)
3
2.56 (0.2)
4
2.49 (0.15)
5
2 (0.25)
6
2.02 (0.42)
7
0.4 (0.21)
8
0.16 (0.07)
TABLE IV
F ITNESS OF AN INCREMENTALLY EVOLVED GENERAL CONTROLLER WITH FIXED SENSOR PARAMETERS ON THE DIFFERENT TRACKS . C OMPOUND
FITNESS OVER ALL 8 TRACKS IS 2.01 (0.11).
Fig. 3. A successful incremental run, producing a generally proficient
controller. New tracks were added to the fitness function when fitness of
the best controller reached 1.5; this happened at generations 53, 240, 253,
394 and 536. Maximum fitness continued to increase for approximately 50
generations after that. The graph show the fitness of the best controller (dark
line) and the mean fitness of the population.
because these runs suffer from ”premature specialization” after evolving a good controller for the first track, the sensor
setup might not be suited for good driving on the second
track, and changing the parameters would diminish fitness
on the first track, thus creating a local optimum. That the
first two tracks are, from the point of view of the car, mirror
images of each other, adds plausibility to this hypothesis.
C. Further evolution
Evolving sensor parameters can be beneficial, however,
when this is done for a controller that has already reached
general proficiency. We used one of the generally proficient
Fig. 4. Sensor setup of the further evolved general controller analysed in
table V. Only three sensors seem to be long enough to of any use, and
all of those point to the right or front-right. The asymmetry and ”waste”
is somewhat surprising, as the controller performs well on all the first six
tracks (but it does do slightly better on clockwise than on anti-clockwise
tracks).
controllers evolved using the incremental method as the seed
for a new evolutionary run, with sensor mutation turned
on and controllers tested on all six tracks simultaneously.
The results was an increase in mean fitness, as can be seen
in V. Although the mean fitness does not increase on every
single track, the best controller of the last generation races
all the tracks more reliably, and is very rarely observed to
crash into a wall in such a way that the car gets stuck. The
evolved sensors of this controller showed little similiarity to
the original sensor setup, described above - see figure 4 for
an example.
VI. E VOLVING SPECIALIZED CONTROLLERS
In order to see whether we could create even better
controllers, we used one of the further evolved controllers
(with evolved sensor parameters) as basis for specializing
T rack
Fitness (sd)
1
1.66 (0.12)
2
1.86 (0.02)
3
2.27 (0.45)
4
2.66 (0.3)
5
2.19 (0.23)
6
2.47 (0.18)
7
0.22 (0.15)
8
0.15 (0.01)
TABLE V
F ITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS . C OMPOUND FITNESS
2.22 (0.09).
T rack
1
2
3
4
5
6
7
8
10
1.9 (0.1)
2.06 (0.1)
3.25 (0.08)
3.35 (0.11)
2.66 (0.13)
2.64 (0)
1.53 (0.29)
0.59 (0.15)
50
1.99 (0.06)
2.12 (0.04)
3.4 (0.1)
3.58 (0.11)
2.84 (0.02)
2.71 (0.08)
1.84 (0.13)
0.73 (0.22)
100
2.02
2.14
3.45
3.61
2.88
2.72
1.88
0.85
(0.01)
(0)
(0.12)
(0.1)
(0.06)
(0.08)
(0.12)
(0.21)
200
2.04 (0.02)
2.15 (0.01)
3.57 (0.1)
3.67 (0.1)
2.88 (0.06)
2.82 (0.1)
1.9 (0.09)
0.93 (0.25)
P r.
10
10
10
10
10
10
10
0
TABLE VI
F ITNESS OF BEST CONTROLLERS , EVOLVING CONTROLLERS
SPECIALISED FOR EACH TRACK , STARTING FROM A FURTHER EVOLVED
GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS .
Fig. 5. Sensor setup of controller specialized for track 5. While more or
less retaining the two longest-range sensors from the further evolved general
controller it is based on, it has added medium-range sensors in the front and
back, and a very short-range sensor to the left.
controllers. For each track, 10 evolutionary runs were made,
where the initial population was seeded with the general
controller and evolution was allowed to continue for 200
generations. Results are shown in table VI. The mean fitness
improved significantly on all six first tracks, and much of
the fitness increase occured early in the evolutionary run,
as can be seen from a comparison with table V. Further,
the variability in mean fitness of the specialized controllers
from different evolutionary runs is very low, meaning that the
reliability of the evolutionary process is very high. Perhaps
most surprising, however, is that all 10 evolutionary runs
produced proficient controllers for track 7, on which the
general controller had not been trained (and indeed had very
low fitness) and for which it had previously been found to
be impossible to evolve a proficient controller from scratch.
Analysis of the evolved sensor parameters of the specialized controllers show a remarkable diversity, even among
controllers specialized for the same track, as evident in
figures 5, 6 and 7. Sometimes, no similarity can be found
between the evolved configuration and either the original
sensor parameters or those of the further evolved general
controller the specialization was based on.
Fig. 6. Sensor setup of a controller specialized for, and able to consistently
reach good fitness on, track 7. Presumably the use of all but one sensor and
their angular spread reflects the large variety of different situations the car
has to handle in order to navigate this more difficult track.
Fig. 7. Sensor setup of another controller specialized for track 7, like the
one in figure 6 seemingly using all its sensors, but in a quite different way.
VII. O BSERVATIONS ON EVOLVED DRIVING BEHAVIOUR
It has previously been found that the evolutionary approach
used in this paper can produce controllers that outperform
human drivers[4]. To corroborate this result, one of the
authors measured his own performance on the various tracks,
driving the car using keyboard inputs and a suitable delay
of 50 ms between timesteps. Averaged over 10 attempts,
the author’s fitness on track 2 was 1.89, it was 2.65 on
track 5, and 1.83 on track 7, numbers which compare rather
unfavourably with those found in table VI. The responsible
author would like to believe that this says more about the
capabilities of the evolved controllers than those of the
author.
Traces of steering and driving commands from the evolved
controllers show that they often use a PWM-like technique,
in that they frequently - sometimes almost every timestep change what commands they issue. For example, the general
controller used as the base for the specializations above
employs the tactic of constantly alternating between steering
left and right when driving parallell to a wall, giving the
appearance that the car is shaking. Frequently alternating
between neutral and forward drive is also used a way of
keeping a certain speed; an approach many engineers would
use when designing a controller for a vehicle that can only
be controlled with discrete inputs. Doing so is however
practically impossible for a human driver, and analysis of
action traces for human drivers shows much fewer changes
to the commands given; as an example, one of the authors
changes drive command only four times per lap when racing
on track 3.
VIII. C ONCLUSIONS AND FUTURE WORK
We believe that the results presented above answer, at least
in part, the several questions posed in section I-B. Different
tracks can indeed be constructed that have various difficulty
levels, in terms of the probability that an evolutionary run
starting from scratch will produce a proficient controller
within a given number of generations, and the mean fitness of
evolved controllers. Difficulty levels range from very easy,
where the evolutionary algorithm almost always succeeds,
to very hard, for which no successful controllers have been
found, and agree with intuitive human difficulty ratings of
the same tracks. These skills are, however, not transferable:
a controller evolved from scratch to perform well on a
given track usually performs very poorly on all other tracks.
Evolving sensor parameters along with network weights
makes for fewer proficient controllers (probably because of
more local optima), a results which is not inconsistent with
the good controllers that do emerge being slightly superior
to fixed-sensor ones, as found in [4].
As for the question on whether we can automatically
create controllers with driving skills so general that they can
proficiently race all tracks in our training set, this can be done
by using incremental evolution, going from simpler to more
complex tracks, with sensor mutation turned off. Attempts
to evolve general controllers with sensor mutation turned
on failed, as did attempts to evolve controllers on all tracks
simultaneously. Once a general controller has been created,
its fitness can be increased through continued evolution with
sensor mutation turned on. Specialized controllers can be
created by further evolving a general controller, using only
one track in the fitness function. These specialized controllers
invariably have very high fitness. Much to our surprise, this
was true even for one hard track which the general controller
had not been evolved on and which it had very low fitness
on, and for which we have not been able to evolve proficient
controllers from scratch. Apparently, the general controller
is somehow closer in search space to a proficient controller
for that track, even though it has no proficiency itself on that
track. Exactly how this works remains to be found out.
We hope that our results are relevant to both game AI
development, as it suggests a way of using already evolved
solutions as bases for further evolution to quickly and reliably
produce specialized solutions for particular tasks, and to
evolutionary robotics, as it goes some way to demonstrate
the scalability and generality of car racing as an ER testbed.
Currently, our efforts are focused on extending the model
to allow competitive coevolution between several vehicles on
the same track. Further, we are planning to use this model to
investigate controller architectures permitting internal state,
such as plastic networks[15] or recurrent networks. We are
also planning to compare evolutionary learning to other
forms of reinforcement learning, such as TD-learning, which
has been shown to be considerably faster in some domains.
However, to be able to handle really complex environment, the controller will need high-bandwidth sensor data
of some kind, without which complex object recognition
and resolution of perceptual aliasing is impossible. We are
therefore working towards incorporating visual or visuallike input, using either the current 2D simulation, or some
other 3D-enabled simulator. We have previously tried the
“naive” approach of connecting high-bandwidth (on the order
10,000 inputs) 2D vision directly to evolvable single- or
multi-layer perceptrons, but only had limited success. An
integral part of the ongoing project is therefore to develop
a modular architecture which can reuse weights so as to
reduce the dimensionality of space in which to search for
such controllers.
R EFERENCES
[1] DARPA, “Grand challenge web site,” http://www.grandchallenge.org/,
2005.
[2] S. Nolfi and D. Floreano, Evolutionary robotics. Cambridge, MA:
MIT Press, 2000.
[3] B. F. Skinner, Science and Human Behavior. The Free Press, 1953,
ch. Behaviorism.
[4] J. Togelius and S. M. Lucas, “Evolving controllers for simulated car
racing,” in Proceedings of the Congress on Evolutionary Computation,
2005, pp. 1906–1913.
[5] K. O. Stanley, N. Kohl, R. Sherony, and R. Miikkulainen, “Neuroevolution of an automobile crash warning system,” in Proceedings of the
Genetic and Evolutionary Computation Conference (GECCO-2005),
2005.
[6] “Rars
(robot
auto
racing
simulator)
homepage,”
http://rars.sourceforge.net/, 1995.
[7] D. Floreano, T. Kato, D. Marocco, and E. Sauser, “Coevolution of
active vision and feature selection,” Biological Cybernetics, vol. 90,
pp. 218–228, 2004.
[8] M.
Hewat,
“Carworld
driving
simulator,”
http://carworld.sourceforge.net/, 2000.
[9] I. Tanev, M. Joachimczak, H. Hemmi, and K. Shimohara, “Evolution
of the driving styles of anticipatory agent remotely operating a scaled
model of racing car,” in Proceedings of the 2005 IEEE Congress on
Evolutionary Computation (CEC-2005), 2005, pp. 1891–1898.
[10] K. Wloch and P. J. Bentley, “Optimising the performance of a
formula one car using a genetic algorithm,” in Proceedings of Eighth
International Conference on Parallel Problem Solving From Nature,
2004, pp. 702–711.
[11] D. A. Pomerleau, “Neural network vision for robot driving,” in The
Handbook of Brain Theory and Neural Networks, 1995.
[12] “Forza motorsport drivatars,” http://research.microsoft.com/mlp/forza/,
2005.
[13] D. M. Bourg, Physics for Game Developers. O’Reilly, 2002.
[14] M. Monster, “Car physics for games,” http://home.planet.nl/ monstrous/tutcar.html, 2003.
[15] D. Floreano and F. Mondada, “Evolution of plastic neurocontrollers for
situated agents,” in From Animals to Animats IV: Proceedings of the
Fourth International Conference on Simulation of Adaptive Behavior,
P. Maes, M. Mataric, J. Meyer, J. Pollack, H. Roitblat, and S. Wilson,
Eds. Cambridge, MA: MIT Press-Bradford Books, 1996.

Download Report

Togelius2006Evolving

Paperzz.com

Your Paperzz