Real-time neuroevolution in the NERO video game

1
overview






Background on video games
Neural networks
NE
NEAT
rtNEAT
NERO
2
background

The global video game market, according to
analysts, will increase from $25.4 billion in
revenue in 2004 to almost $55 billion in 2009 ,
larger than even that of Hollywood.

Techniques in artificial intelligence can
potentially both increase the longevity of video
games and decrease their production costs.
3
background

In most games character behavior is scripted. no
matter how many times the player exploits a
weakness, that weakness is never repaired.

So what to do?
4
The solution



Machine learning can keep video games
interesting by allowing agents to change and
adapt.
The agents can be trained to perform complex
behaviors.
if they would be trained offline (out-game
learning) and then “freeze” and put in the game,
they will not adapt and change in response to
players in the game.
5
The solution



That’s why agents should adapt and change in
real-time, and a powerful and reliable machine
learning method is needed.
Prior examples in the MLG genre include the
Tamagotchi virtual pet
and the video “God game” Black & White.
6
The solution


Here I introduce the machine learning : rtNEAT
– real time Neuroevolution Of Augmenting
Topologies.
And the game that implement the rtNEAT:
NERO – neuroEvolving robotic operatives.
7
Neural Networks


input
A try to somehow mimic the neural network of a
human being in a simplistic way.
The network is a collection of simple units that
mimic a neuron:
weights
Weighted
weights
Computing
unit
output
2
threshold
0.2
1.5

Σ
Weighted
weights

8
Neural Networks

The network adapts as follows: change the weight
by an amount proportional to the difference
between the desired output and the actual output:
Wi   * ( D  Y ) * I i

where η is the learning rate, D is the desired
output, and Y is the actual output
9
NE – Neuroevolution

The artificial evolution of neural networks using
an evolutionary algorithm.

The network can compute arbitrarily complex
functions, can both learn and perform in the
presence of noisy inputs, and generalize its
behavior to previously unseen inputs.
10
Demands of video games AI
Large state/action space: high dimensional
space and having to check the value of every
possible action on every game tick for every
agent in the game.
- In NE agents are evolved to output only a single
requested action per game tick
1)
11
Demands of video games AI
2) Diverse behaviors: Agents should not all
converge to the same Behavior - A
homogeneous population would make the
game boring.
- In NE Diverse populations can be explicitly
maintained through speciation
12
Demands of video games AI
3) Consistent individual behaviors: Characters
should take a random action in order to
explore new behaviors, but not periodically
making odd moves.
- In NE The behavior of an agent does not
change because it always chooses actions from
the same network.
13
Demands of video games AI
4) Fast adaptation and sophisticated
behaviors: not waiting hours for agents to
adapt require simple presentation but that
results in simple behaviors.
- In NE A representation of the solution can be
evolved, allowing simple behaviors in the
beginning and complexified later.
14
Demands of video games AI
5) Memory of past states: in order to react more
convincingly to the present situation requires
keeping track of more than the current state.
- In NE Recurrent neural networks can be
evolved, what gives us a memory of past
situations.
15
NEAT



A technique for evolving neural networks using
an evolutionary algorithm.
The correct topology not needed to be known
prior to evolution.
Unique in that it begins with minimal networks
and adds connections and nodes to them over
generations, allowing complex problems to be
solved based upon simple ones.
16
Genetic encoding - Genome
17
Genetic encoding - Mutation
18
Genetic encoding - Mutation
19
Genetic encoding - results


The splitting inserting into the system a
nonlinearity.
Old behaviors that were in the pre existing
networks are not vanished and their quality stay
almost the same, with the opportunity to refine
those behaviors.
20
Cross over – historical markings



When a mutation occurs a global innovation
number is incremented and assigned.
When cross over happens the offspring will
inherit the innovation numbers.
Thus the historical origin of every gene is
known throughout the evolution.
21
Crossover – historical markings


With innovation numbers we don’t need to do a
topological analysis.
Two genes with the same historical origin
represent the same structure since they were
both derived from the same ancestral gene at
some point in the past
22
Crossover – historical markings
23
Speciation



NEAT divides the population into species.
The individuals compete primarily with their
own niches.
this way topological innovations are protected
and have time to optimize their structure before
competing with other niches, like new ideas that
reach a potential before elimination.
24
Speciation

Another advantage that Species with smaller
genomes survive as long as their fitness is
competitive, ensuring that small networks are
not replaced by larger ones unnecessarily.
25
Speciation
26
Speciation

To examine the similarity of networks a distance
is formulated:
c1 E
c2 D
 

 c3W
N
N


E = number of excess, D = number of disjoint,
W = average weight differences of matching
genes, N = num of genes in the larger Genome
(a normalizing factor)
c1,c2 and c3 defines the importance of the three
factors.
27
Speciation



if a genome’s distance to a randomly chosen
member of the species is less than δt, a
compatibility threshold, the genome is placed
into this species.
δt can be dynamic, be raised if there are too
many species and lowered if too few.
If a genome is not compatible with any existing
species, a new species is created.
28
Speciation – adjusted fitness

The fitness of an organism is also defined by its
niche:
fi
f 'i 
 sh( (i, j ))
n
j 1


Where
sh  
0 if    t
1 else
And you sum over all the distances from the
organisms in the population resulting in the
number of organisms in the same species as
organism i.
29
Speciation - offspring

The number of offspring distributed to each
species k is:
Fk
nk 
P
F tot

Where Fk is the average fitness of species k,
Ftot = Σk Fk is the total of all species fitness
average, and |P| is the size of the population
30
Speciation - offspring


First we eliminate the lowest performing
members of the population.
The whole generation is then replaced by the
new offspring.
31
NEAT – minimization and
complexification




NEAT begins with uniform simple networks
with no hidden nodes, differing only by their
initial random weights.
It adds connections and nodes incrementally.
Only the most fitted structures remains.
NEAT searches for the optimal topology by
complexifying existing network.
32
NEAT - performance


In experiments it has been shown that all three
main components of NEAT are essential for it
to work – the historical markings, speciation,
and starting from minimal structure.
It was also shown that NEAT outperform other
NE methods, especially those that are fixed
topology evolution because it find faster
complex solutions by starting with simple
networks and expanding only when beneficial.
33
rtNEAT


In order for players to interact with evolving
agents in real time NEAT was improved to
rtNEAT.
Now fitness statistics are collected constantly as
the game is played, and the agents evolved
continuously as well.
34
rtNEAT – replacement cycle

Every few game ticks the worst individual is
replaced by an offspring of parents chosen from
among the best.
35
rtNEAT - algorithm

Because rtNEAT performs in real time it cannot
produce the whole generation at once like
NEAT, and so the algorithm loop must change.
36
rtNEAT - algorithm
37
rtNEAT - algorithm
1) Calculating adjusted fitness:
fi
f 'i 
S

Where |S| is the number of individuals in the
species.
38
rtNEAT - algorithm
2) Removing the worst Agent:
a) if we remove the worst unadjusted fitness
agent, innovation preservation will be
damaged, because new small species will be
eliminated as soon as they appear.
That’s why the worst adjusted fitness agent will
be removed.
39
rtNEAT - algorithm
2) Removing the worst Agent:
a) if we remove the worst unadjusted fitness
agent, innovation preservation will be
damaged, because new small species will be
eliminated as soon as they appear.
That’s why the worst adjusted fitness agent will
be removed.
40
rtNEAT - algorithm
2) Removing the worst Agent:
b) agents must also have time to evaluated
sufficiently before removed, because unlike
NEAT that each individual live the same
amount of time, in rtNEAT different agents
have been around for different amount of
time.
That’s why rtNEAT only removes agents that
have played for more than the minimum
amount of time m.
41
rtNEAT - algorithm
2) Removing the worst Agent:
c) we must re-estimate the average fitness F,
because a specie now have one less member,
and it’s average most likely been changed.
42
rtNEAT - algorithm
3) Creating offspring:
the probability to choose a parent specie is
proportional to it’s average fitness compared
with the total of all species’ average fitnesses:
Fk
Pr ( S k ) 
F tot
And a single new offspring is created by
recombining two individuals from the parent
species
43
rtNEAT - algorithm
4) Reassigning agents to species:
because minimizing the number of species is
important in real time environment, when
CPU time is not allocated all to evolution, the
threshold δt must be dynamically adjusted.
But also the entire population must be
assigned again to species.
44
rtNEAT - algorithm
4) Replacing the old agent with the new one:
replacing depends on the game.
You may just replace the neural network in the
body of the removed agent. Or you may kill it
and replace a new agent instead.
45
rtNEAT – loop interval

The loop interval should be every n ticks. For n
to be chosen a “law of eligibility” is formulated:
I 

m
Pn
Where m is the minimum time alive, n is the
number of ticks between replacement (the loop
interval), |p| is the population size, and I is the
fraction of the population that is too young to
be evaluated.
46
rtNEAT – loop interval
m
I 
Pn

m
n
PI
And I left for the user to determine, because its
most critical for performance.
47


the idea of the game is to train a team of agents
by designing a training program.
In order for the agents to learn, the learning
algorithm is of course rtNEAT.
48
NERO - fitness

In the training there are sliders that lets you
configure what attributes you want your agents
to have.
49
NERO - fitness

fitness is the sum of all the components ci
multiply by the sliders value vi:
s 
t

c *v
i
components
i
To put in also the rate of forgetting r into the
fitness this formula is given:
st  f t
f t 1  f t 
r

Where ft is the current fitness.
50
NERO - sensors


Agents have several types of sensors.
Their output is: the direction of movement, and
whether or not to fire.
51
NERO - sensors

Enemy radar:
divides the 360º area around the agent into
slices. Each slice activates a sensor in proportion
to how close the enemy in that slice.
52
NERO - sensors

Range finder:
projects ray at several angles, and the distance
the ray travels before it hits something is
returned as the sensor value.
53
NERO - sensors

On-target sensor:
returns full activation only if a ray projected
along the front heading of the agents hits an
enemy. This sensor tells the agent whether it
should attempt to shoot
54
NERO - sensors

Line-of-fire sensor:
detect where a bullet from the closest enemy is
heading. Thus, can be used to avoid fire. They
work by computing where the line of fire
intersects rays projecting from the agent.
55
NERO – evolved networks

Seekers VS Wall-fighters
56
57