Creating Human-like AI Movement in Games Using - CSC

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING,
SECOND CYCLE, 30 CREDITS
STOCKHOLM, SWEDEN 2017
Creating Human-like AI Movement
in Games Using Imitation Learning
CASPER RENMAN
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION
Creating Human-like AI Movement in Games
Using Imitation Learning
May 31, 2017
CASPER RENMAN
Master’s Thesis in Computer Science
School of Computer Science and Communication (CSC)
Royal Institute of Technology, Stockholm
Swedish Title: Imitation Learning som verktyg för att skapa människolik rörelse för
AI-karaktärer i spel
Principal: Kristoffer Benjaminsson, Fast Travel Games
Supervisor: Christopher Peters
Examiner: Olov Engwall
iii
Abstract
The way characters move and behave in computer and video games are important factors in their believability, which has an impact on the player’s experience. This project explores Imitation Learning using limited amounts of data
as an approach to creating human-like AI behaviour in games, and through
a user study investigates what factors determine if a character is human-like,
when observed through the characters first-person perspective. The idea is to
create or shape AI behaviour by recording ones own actions. The implemented
framework uses a Nearest Neighbour algorithm with a KD-tree as the policy
which maps a state to an action. Results showed that the chosen approach
was able to create human-like AI behaviour while respecting the performance
constraints of a modern 3D game.
iv
Sammanfattning
Sättet karaktärer rör sig och beter sig på i dator- och tvspel är viktiga faktorer
i deras trovärdighet, som i sin tur har en inverkan på spelarens upplevelse. Det
här projektet utforskar Imitation Learning med begränsad mängd data som ett
tillvägagångssätt för att skapa människolik rörelse för AI-karaktärer i spel, och
utforskar genom en användarstudie vilka faktorer som avgör om en karaktär
är människolik, när karaktären observeras genom dess förstapersonsperspektiv.
Iden är att skapa eller forma AI-beteende genom att spela in sina egna handlingar. Det implementerade ramverket använder en Nearest Neighbour-algoritm
med ett KD-tree som den policy som kopplar ett tillstånd till en handling. Resultaten visade att det valda tillvägagångssättet lyckades skapa människolikt
AI-beteende samtidigt som det respekterar beräkningskomplexitetsrestriktioner som ett modernt 3D-spel har.
Contents
Contents
v
1 Introduction
1.1 Artificial Intelligence in games
1.1.1 Imitation Learning . . .
1.1.2 Human-likeness . . . . .
1.2 Objective . . . . . . . . . . . .
1.3 Limitations . . . . . . . . . . .
1.4 Report outline . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
3
4
4
2 Background
2.1 Imitation Learning . . . . . . . . . . . . .
2.1.1 Policy . . . . . . . . . . . . . . . .
2.1.2 Demonstration . . . . . . . . . . .
2.1.3 State representation . . . . . . . .
2.1.4 Policy creation . . . . . . . . . . .
2.1.5 Data collection . . . . . . . . . . .
2.1.6 Demonstration dataset limitations
2.2 Related work . . . . . . . . . . . . . . . .
2.2.1 Summary and state of the art . . .
2.3 Performance in games . . . . . . . . . . .
2.4 Measuring believability of AI . . . . . . .
2.4.1 Turing test-approach . . . . . . . .
2.4.2 Automated similarity test . . . . .
2.5 Conclusion . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
5
6
6
6
7
7
10
11
11
11
12
12
.
.
.
.
.
.
13
13
13
14
14
15
16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Implementation
3.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Method motivation . . . . . . . . . . . . . . . . . . .
3.3 Implementation . . . . . . . . . . . . . . . . . . . . .
3.3.1 Summary . . . . . . . . . . . . . . . . . . . .
3.3.2 Recording movement and state representation
3.3.3 Playing back movement . . . . . . . . . . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
19
21
24
27
28
29
29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
34
34
34
36
36
37
40
41
41
41
43
43
43
5 Conclusions
5.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Use outside of games . . . . . . . . . . . . . . . . . . . . . . .
45
46
47
Bibliography
49
3.4
3.3.4 Policy . . . . . . . . . . . . . . . . . . . .
3.3.5 Feature extraction . . . . . . . . . . . . .
3.3.6 Avoiding static obstacles . . . . . . . . . .
3.3.7 Avoiding dynamic obstacles . . . . . . . .
3.3.8 KD-tree . . . . . . . . . . . . . . . . . . .
3.3.9 Discretizing the environment . . . . . . .
3.3.10 Additional details . . . . . . . . . . . . .
3.3.11 Storing data . . . . . . . . . . . . . . . .
3.3.12 Optimization and measuring performance
Overall implementation . . . . . . . . . . . . . .
4 Evaluation
4.1 User study . . . . . . . . . . . . . . . . . . .
4.1.1 The set-up . . . . . . . . . . . . . .
4.1.2 Participants . . . . . . . . . . . . . .
4.1.3 Stimuli . . . . . . . . . . . . . . . .
4.1.4 Procedure . . . . . . . . . . . . . . .
4.1.5 Hypothesis . . . . . . . . . . . . . .
4.2 Results . . . . . . . . . . . . . . . . . . . . .
4.2.1 User study . . . . . . . . . . . . . .
4.2.2 Imitation agent performance . . . .
4.3 Discussion . . . . . . . . . . . . . . . . . . .
4.3.1 The imitation agent . . . . . . . . .
4.3.2 The user study . . . . . . . . . . . .
4.3.3 Creating non-human-like behaviour .
4.3.4 Performance in relation to games . .
4.3.5 Ethical aspects . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
This chapter gives a brief overview of Artificial Intelligence in games, Imitation
Learning and human-likeness. It also presents the objective, limitations and the
outline of the project.
1.1
Artificial Intelligence in games
Computer and video games produce more and more complex virtual worlds. This introduces new challenges for the characters controlled by Artificial Intelligence (AI),
also known as agents [20] or NPC’s (Non-Player Characters), meaning characters
that are not being controlled by a human player. The way characters move and
behave in computer and video games are important factors in their believability,
which has an impact on the player’s experience. Being able to interact with NPC’s
in meaningful ways and feel that they belong in the world is important [4]. In
Virtual Reality (VR) this is even more important, as the gaming experience is even
more immersive. The goal of many games’ AI is more or less the same as attempts
to beat the Turing test - to create believable intelligence [12].
A popular genre in computer and video games is First-person shooter (FPS). In an
FPS game the player experiences the game through the eyes of the character the
player is controlling, also known as a first-person perspective. Typically a player is
at most able to see the hands and arms of the character the player is controlling. The
player can however see the whole bodies of characters of other players’ characters
or NPC’s (Non-Player Characters). This is visualized in Figure 1.1.
1
2
CHAPTER 1. INTRODUCTION
Figure 1.1: An example first-person perspective game scenario,
seen from the eyes of the character that the player controls. The
blue and red characters are NPC’s.
AI in games is traditionally based on Finite State Machines (FSM), Behaviour Trees
(BT) or other hand-coded techniques [27]. In these techniques, a programmer needs
to explicitly define rules for what an agent should do in different situations. An
example of such a rule could be: "if the character’s health is low and the character
sees a hostile character, the character should flee". These techniques work in the
sense that the agent is able to execute tasks and adapt its behaviour to its situation,
but the result is predictable and static [11]. For example, if a player sees an NPC
react to a situation the same way it did in an earlier similar situation, the player
can be quite sure that the NPC will probably always react like that given a similar
situation. In 2006, Orkin [17] said: “in the early generations of shooters, such as
Shogo (1998) players were happy if the A.I. noticed them at all and started attacking.
. . . Today, players expect more realism, to complement the realism of the physics
and lighting in the environments”. In order to get more realism and unpredictability
in order to increase the entertainment for the player, it would perhaps be a good
approach for agents to imitate human behaviour.
1.1.1
Imitation Learning
Imitation Learning (IL) is a technique where the agent learns from examples, or
demonstrations, provided by a teacher [1]. IL is a form of Machine Learning (ML).
ML has been defined as the “field of study that gives computers the ability to
learn without being explicitly programmed” [14]. Unlike Reinforcement Learning
algorithms, IL does not require a reward function to be specified. Instead, an IL
algorithm observes a teacher perform a task and learns a policy that imitates the
teacher, with the purpose of generalizing to unseen data [28]. IL is regarded as a
promising technique for creating human-like artificial agents [3]. Some approaches
have shown to be able to develop agents with good performance in non-trivial tasks
using limited amounts of data and computational resources [3]. It is a technique
1.2. OBJECTIVE
3
which also can be used to dynamically change game play to adapt to different players
based on their play style and skill [7].
1.1.2
Human-likeness
Shaker et al. [24] describe character believability, which says that an agent is believable if someone who observes it believes that the agent is a human being. Player
believability on the other hand, says that the agent is believable if someone observing the agent believes that it is a human controlling it. It is player believability that
is meant by human-like in this project.
1.2
Objective
The primary goal of this project is to describe a method for creating human-like
agent movement using IL with limited amounts of data. The idea is to create an
agent by recording one’s own actions, shaping it with desired behaviours. Most
related works in the field of IL in games want to create competitive AI, meaning
AI that is good at beating the game. This is not the case in this project. The
goal is to create AI that lets an agent imitate a demonstrating human well, while
respecting the performance requirements of a modern 3D game. A hope is that
this will lead to a more unpredictable and human-like agent which in turn could
lead to better entertainment for a player playing the game. Lee et al. [9] say that
human-like agent behaviour leads to a raised emotional involvement of the player,
which increases the players immersion in the game. Whether it is more fun or not
to play with a human-like agent will not be explored.
This project aims to answer the following question:
– Q1: How can IL be used to create human-like agent behaviour, using limited
amounts of data?
This question is further split up into two sub-questions:
– Q1.1: How to create an agent that imitates demonstrated behaviour,
using IL with limited amounts of data?
– Q1.2: What determines if a character is human-like, when observed
through the character’s first-person perspective?
The human-likeness of the agent will depend on how human-like the human is when
recording itself. This means that behaviour that is non-human-like will also be
possible to create. Suppose that it is desired to create a behaviour for a dog in
a game. A human would then record itself playing the game, role-playing a dog
and behaving like it wants the dog to behave. If the intended behaviour is that
the dog should flee when it sees something hostile, then so should the human when
recording itself. The outcome should then be an agent that behaves like a dog.
4
CHAPTER 1. INTRODUCTION
1.3
Limitations
By agent movement is meant that the actions that the agent can execute are limited
to movement including rotation, i.e. moving from one position to another. As
contrast, actions that are not considered movement in this project could for example
be shooting, jumping or picking up items. The simulations will be done in a 3D
environment but the movement of the implemented agent will be limited to a 2D
plane. This means that the agent will not be able to walk up a ramp or climb stairs
for example. The movement behaviour of the agent will be limited by the feature
extractors implemented, as described in the implementation chapter. In theory, any
behaviour which only requires the agent to be able to move could be implemented,
like path-finding and obstacle avoidance for example.
The project will use limited amounts of data, meaning that it should be possible
to create agent behaviour using the framework created in this project, by recording
one’s own actions for a couple of minutes. The motivation for this is that if game
developers should be able to design their own agent behaviour for a game, there
will not exist data for them to use. Some works listed in the related works section
perform their experiments in big games such as Quake III1 , where there is a lot of
saved data available. Quake is a first-person shooter video game. This allows them
to use complex algorithms which perform better with more data. Not requiring a
lot of data is also thought to make the contributions of this work more attractive
to the gaming industry, as it will require less time and effort to be able to utilize.
1.4
Report outline
The report starts with presenting background information about the areas of Imitation Learning and measuring believability of AI, and related work. Following is the
implementation chapter which motivates the choice of methods and describes the
implementation process. The evaluation chapter describes the user study which was
conducted in order to evaluate the human-likeness of the resulting imitation agent.
It also presents the results of the user study and a brief performance measurement
of the imitation agent, as well as summarizes what was done in the project and
discusses the results. Finally conclusions are made in the conclusions chapter.
1
https://en.wikipedia.org/wiki/Quake_III_Arena/
Chapter 2
Background
This chapter presents background knowledge and related works about Imitation
Learning and measuring believability of AI controlled characters. It also presents
why heavy computations with long computational times are particularly bad in
games.
2.1
Imitation Learning
The work by Argall et al. [1] is frequently cited and is a comprehensive survey of IL.
The survey is the biggest source of background knowledge in the area of IL in this
project. They describe IL as a subset of Supervised Learning, where an agent learns
an approximation of the function that produced the provided labeled training data,
called a policy. The dataset is made up out of demonstrations of a given task.
2.1.1
Policy
A policy π is a function that maps a state x to an action u. A policy allows an
agent to select an action based on its current state. Developing a policy by hand
is often difficult. Therefore machine learning algorithms have been used for policy
development [1].
2.1.2
Demonstration
A demonstration is a sequence of state-action pairs that are recorded at the time of
the demonstration of the desired behaviour [1]. This way of learning a policy through
examples, differs from learning it based on data collected through exploration such
as in Reinforcement Learning [25]. A feature of IL is that it focuses the dataset
to areas of the state-space that is actually encountered during the execution of the
behaviour [1]. This is a good thing in games where computation time is very limited,
as the search space of appropriate solutions is reduced.
5
6
2.1.3
CHAPTER 2. BACKGROUND
State representation
A state can be represented as either discrete, e.g. can see enemy or cannot see
enemy or continuous, e.g. 3D position and rotation of the agent.
2.1.4
Policy creation
Creating a policy can be done in different ways. A mapping function uses the
demonstrated data to directly approximate the function mapping from the agent’s
state observations to actions (f () : Z → A) [1]. This can be done using either
classification where the output is class labels, or regression where the output consists
of continuous values. A system model uses the demonstrated data to create a
model. A policy is then derived from that model [1]. Plans use the demonstrated
data together with user intention information to learn rules that associate pre- and
post-conditions with each action. A sequence of actions is then planned using that
information [1].
2.1.5
Data collection
The correspondence problem [16] has to do with the mapping between the teacher
and the learner (see Figure 2.1). For example, a player playing an FPS game using
a mouse and keyboard sends inputs which are processed by the game and translated
into actions. An NPC in the same game, is controlled by AI which sends commands
to control the character several times per second, which is not directly equivalent
to keystrokes and mouse movements of a human player.
Figure 2.1: Visualization of the record mapping and embodiment
mapping.
The record mapping is the extent to which the exact states/actions experienced by
the teacher during demonstration are recorded in the dataset [1]. If there is no record
mapping or a direct record mapping, the exact states/actions are recorded in the
dataset. Otherwise some encoding function is applied to the data before storing the
data. The embodiment mapping is the extent to which the states/actions recorded
within the dataset are exactly those that the learner would observe/execute [1]. If
there is no embodiment mapping or a direct embodiment mapping, the recorded
states/actions are exactly those that the learner will observe/execute. Otherwise
there is a function which maps the recorded states/actions to actions to be executed.
Two data collection approaches are demonstration and imitation [1]. In demonstration, the teacher can operate the learner through teleoperation where the record
2.2. RELATED WORK
7
mapping is direct. There is also shadowing where the agent tries to mimic the
teachers motions by using its own sensors. Here the record mapping is non-direct.
Within imitation the embodiment mapping is non-direct, and the teacher execution
can be recorded either with sensors on the teacher where the record mapping is
direct, or external observation where the record mapping is non-direct.
2.1.6
Demonstration dataset limitations
In IL, the performance of an agent is heavily dependent on the demonstration
dataset. Low learner performance can be due to areas of the state space that
have not been demonstrated. This can be solved by either improving upon the
existing demonstrations by generalizing them or through acquisition of new demonstrations [1]. As mentioned, low performance can also be caused due to low quality
of the demonstration dataset [1]. Dealing with this involves eliminating parts of
the teacher’s executions that are suboptimal. Another solution is to let the learner
learn from experience. If feedback is provided on the learners actions, this can be
used to update the policy [1]. The demonstration dataset limitations are not dealt
with in this project, as it is considered out of scope. It is however mentioned as a
possible extension in the Future work chapter.
2.2
Related work
This section gives an overview of the related work in the field of Imitation Learning
in games in chronological order.
Thurau et al. [26] in "Imitation In All Levels of Game AI" create bots for the
game Quake II1 . Different algorithms are presented that learn from human generated
data. They create behaviours on different levels: strategic behaviour used to achieve
long-term goals, tactical behaviour used for localized situation handling such as
anticipating enemy movement, and reactive behaviour like jumping, aiming and
shooting.
The generated bots are compared to the existing Quake II bots. It is shown that
Machine Learning can be applied on different behavioural layers. It is concluded
that Imitation Learning is well suited for generating behaviour for artificial game
characters. The bots created with Imitation Learning outperformed the Quake II
bots. It should however be taken into consideration that these results are thirteen
years old at the time of writing this report.
Priesterjahn et al. [20] in "Evolution of Reactive Rules in Multi Player Computer
Games Based on Imitation" propose a system in which the behaviour of artificial
opponents is created through learning rules by observing human players. The rules
are selected using an evolutionary algorithm with the goal of choosing the best and
most important rules and optimizing the behaviour of the agent.
1
https://en.wikipedia.org/wiki/Quake_II/
8
CHAPTER 2. BACKGROUND
The paper shows that limited learning effort is needed to create behaviour which is
competitive in reactive situations in the game Quake III. After a few generations of
the algorithm, the agent was able to behave in the same way as the original players.
In the conducted experiments, the generated agent outperformed the built in game
agents.
The world is simplified to a single plane. The plane is divided into cells in a grid,
with the agent centered in the grid. The grid moves relative to the agent. Each
frame, the agent checks each cell if it is empty or not and scores it accordingly.
They limit the commands to moving and attacking or not attacking. A rule is
a mapping from a grid to a command. Human players are recorded and a basic
rule set is generated by recording the grid-to-command matches every frame of the
game. An evolutionary algorithm is then used to learn the best rules and thus the
best competitive behaviour.
Saunders et al. [21] in "Teaching Robots by Moulding Behavior and Scaffolding
the Environment" teaches behaviour to robots by moulding their actions within
a scaffolded environment. A scaffolded environment is an environment which is
modified to make it easier for the robot to complete a task, when the robot is at a
developmental stage. Robot behaviour is created by teaching state-action memory
maps in a hierarchical manner, which during execution are polled using a k-Nearest
Neighbour based algorithm. Their goal was to reproduce all observable movement
behaviours. Their results show that the Bayesian framework leads to human-like
behaviour.
Priesterjahn [19] in "Imitation-Based Evolution of Artificial Players in Modern
Computer Game" which is based on the paper by [20], proposes the usage of imitation techniques to generate more human-like behaviours in an action game. Players
are recorded, and the recordings are used as the basis of an evolutionary learning approach. The approach is motivated by stating that to behave human-like, an agent
should base its behaviour on how human players play the game and try to imitate
them. This as opposed to a pure learning approach based on the optimization of
behaviour, which only optimizes the raw performance of the game agent.
The authors present the result of the conducted experiments and explain that the
imitation-based initialization has a big effect on the performance and behaviour of
the evolved agents. The generated agents showed a much higher level of sophistication in their behaviour and appeared much more human-like than the agents
evolved using plain evolution, though performing worse.
Cardamone et al. [3] in "Learning Drivers for TORCS through Imitation Using
Supervised Methods" develop drivers for The Open Racing Car Simulator (TORCS)
using a direct method, meaning the method uses supervised learning to learn driving
behaviour from data collected from other drivers. They show that by using highlevel information about the environment and high-level actions to be performed, the
developed drivers can achieve good performance. High-level actions mean that they
learn trajectories and speeds along the track, and let controllers achieve the target
2.2. RELATED WORK
9
values. This as opposed to predicting/learning low-level actions such as pressing
the gas pedal an amount, or rotate the wheel an amount of degrees.
It is also stated that the performance can be achieved with limited amounts of
data and limited computational power. The learning methods used are k-Nearest
Neighbour and Neural Networks with Neuroevolution. The performance is measured
in how fast a driver completes a race, which means they want to create an AI that
is good at playing the game. It is compared to the best AI driver.
Munoz et al. [15] in "Controller for TORCS Created by Imitation" create a controller for the game TORCS using Imitation Learning. They use three types of
drivers to imitate: a human player, an AI controller created with Machine Learning and one hand-coded controller which performs a complete lap. The imitation
is done on each of the drivers separately and then a mix of the data is combined
into new controllers. The aim of the work is to create competitive NPCs that imitate human behaviour. The learning method is feed-forward Neural Networks with
Backpropagation.
The performance of the driver is measured by how fast a driver completes a race.
It is compared to other AI and human drivers. They conclude that it is difficult to
learn from human behaviour, as humans do not always perform the same actions
given the same situation. Humans also make mistakes, which is not good behaviour
to learn if the goal is to create a driver that is good at playing the game.
Mehta et al. [13] in "Authoring Behaviors for Games using Learning from Demonstration" is similar to [21] in that behaviour is taught by demonstrating actions and
annotating the actions with a goal. Here, the learning involves four steps:
– Demonstration: Playing the game.
– Annotation: Specifying the goals the teacher was pursuing for each action.
– Behaviour learning: Using a temporal reasoning framework.
– Behaviour execution: Done through a case-based reasoning (CBR) technique,
case-based planning.
The goal of this project was to create a framework in which people without programming skills can create game AI behaviour by demonstration.
The authors conclude that by using case-based planning techniques, concrete behaviours demonstrated in concrete game situations can be reused by the system in a
range of other game situations, providing an easy way to author general behaviours.
Karpov et al. [8] in "UT2 : Believable Bot Navigation via Playback of Human
Traces" create the UT2 bot for the BotPrize competition2 , a Turing-like test where
2
http://botprize.org/
10
CHAPTER 2. BACKGROUND
computer game bots compete by attempting to fool human judges into thinking
they are just another human player. UT2 broke the 50% humanness threshold and
won the grand prize in 2012. The bot has a component called the Human Trace
Controller, which is inspired by the idea of direct imitation. The controller uses a
database of recorded human games in order to retrieve and play back segments of
human behaviour.
The results show that using direct imitation allows the bot to solve navigation
problems while moving in a human-like fashion.
Two types of data are recorded, pose data and event data. The pose includes
position, orientation, velocity and acceleration. An event is for example switching
weapons, firing weapons or jumping. All of the pose and event data for a player
in a particular game form a sequence. Sequences are stored so that preceding and
succeeding event and pose data can be retrieved from any given pose or event.
In order to be able to quickly retrieve the relevant human traces, they implemented
an efficient indexing scheme of the data. The two most effective indexing schemes
used were Octree based indexing and Navigation Graph based indexing using a KDtree.
Ortega et al. [18] in "Imitating Human Playing Styles in Super Mario Bros"
describe and compare different methods for generating game AI based on Imitation
Learning. Three different methods for imitating human behaviour are compared:
Backpropagation, Neuroevolution and Dynamic scripting. The game is in 2D.
Similarity in playing style is measured through comparing the play trace of one
or several human players with the play trace of an AI player. The methods compared are hand-coded, direct (based on supervised learning) or indirect (based on
maximizing a similarity measure). The conclusion is that a method based on Neuroevolution performs best both when evaluated by the similarity measure and by
human spectators. Inputs were the game state, e.g. enemies, obstacles and distance
to gaps and outputs were actions.
2.2.1
Summary and state of the art
In 2006, Gorman et al. [6] stated that every particular game is different from the
other, and claimed that it thus probably is impossible to suggest an ultimate approach. They said that "Currently, there are no generally preferred knowledge representation data structures and machine learning algorithms for the task of creating
believable behaviour". They claim that believable characters should possess certain
features, that hardly can be achieved without observing and/or simulating human
behaviour. Imitation Learning is listed as a proven human behaviour acquisition
method.
Few of the works listed here have the sole aim of creating an agent that imitates
demonstrated behaviour as well as possible, and no such works could be found. Most
2.3. PERFORMANCE IN GAMES
11
have another aim, such as performing as well as a human, or performing well after
being inspired by human behaviour. The most popular and successful approach
in these works are using Neural Networks with Neuroevolution, which is a form of
Machine Learning that uses evolutionary algorithms to train Neural Networks [18].
The Human Trace Controller in the work by Karpov et al. [8] however, is the most
recent and successful found work which aims to imitate demonstrated behaviour,
without doing it in a "beating the game"-manner.
2.3
Performance in games
In games it is important to keep computational times low and a high and stable
frame rate, usually measured in frames per second (FPS). The frame rate is the
frequency at which frames (images) in a game (or video) are displayed. A high
frame rate typically means about 60 FPS for normal computer games and about 90
FPS for VR games, in order to have objects on the screen appear to move smoothly.
Games usually contain a function called the update or tick function, which runs
once every frame. The game will wait for the update function to finish before
processing the next frame. If the calculations made in the update function take
longer than the time slot for one frame (in order to keep 90 FPS, one frame has
1/90 ≈ 0.11 ms to run its calculations) the game will not be able to stay at its
target FPS and will not run as smoothly.
2.4
Measuring believability of AI
Umarov and Mozgovoy [27] study current approaches to believability and effectiveness of AI behaviour in virtual worlds and gives a good overview of different
approaches. They talk both about measuring believability as well as various implementations for achieving it in games.
It is stated how believability is not the only feature that makes AI-controlled characters fun to play with. A game should be challenging, so the agent should also
be skilled or effective. However, they explain that the goals of believability and
effectiveness are not always the same. A skilled agent is not necessarily believable,
and a believable agent might be a weak opponent.
2.4.1
Turing test-approach
To evaluate the believability of an AI controlled character, Umarov and Mozgovoy [27]
refer to a Turing test-approach, where a human player (judge) plays a game against
two opponents, where one opponent is controlled by a human and one is controlled
by an AI. The judge’s task is to determine which one is human. A simplification of
this test is also mentioned, where the judge instead watches a game between two
players which both can be controlled either by a human or an AI. The judge’s task is
12
CHAPTER 2. BACKGROUND
to identify game participants. Lee et al. [9] learn human-like behaviour via Markov
decision processes in the 2D game Super Mario. They evaluate the human-likeness
by performing a modified Turing test [22] as well.
Gorman et al. [6] performed an experiment which [27] refers to. Quake II agents were
evaluated by showing a number of people a series of video clips as seen by the characters first-person camera. The task was to identify whether the active character is
human. The different characters were controlled by a real human player, a Quake
agent and a specifically designed imitation agent that tried to reproduce human
behaviour using Bayesian motion modeling. The imitation agent was misidentified
as a human 69% of the time and the Quake agent was mistaken as a human 36% of
the time. "Sample evaluators’ comments, quoted in (Gorman et al., 2006), indicate
that quite simple clues were used to guess human players (’fires gun for no reason,
so must be human’, ’stand and wait, AI wouldn’t do this’, ’unnecessary jumping’)".
2.4.2
Automated similarity test
One way to compare human player actions and agent actions, is by comparing
velocity direction angle changes and frequencies of angles between player direction
and velocity direction. Another is to compare pre-recorded trajectories of human
players with those of agents [27].
2.5
Conclusion
This chapter presented Imitation Learning and the different challenges that it involves. Then related works were listed and the state of the art was determined. It
seems like a direct imitation method is a good approach as used by Karpov et al. [8].
Since no learning is done the approach should give a lot of control, which is good as
the computational performance of AI in games is important. The choice of method
is described in detail in the next chapter. In order to evaluate the believability
of the agent, a Turing test-approach is described as an option. The evaluation is
described in Chapter 4.
Chapter 3
Implementation
This chapter describes the implementation of the Imitation Learning framework
and thereby aims to answer Q1.1. Section 3.3.1 provides a summary of what was
implemented.
Throughout an iterative implementation process it was determined what to implement, in order to create an agent with behaviour which can be evaluated. The agent
created in this process will be referred to as the agent when no other type of AI
controlled character is in the same context. Otherwise it will be referred to as the
imitation agent.
3.1
Setting
The implementation was carried out in the Unity®Pro game engine1 . Unity is a
cross-platform game engine developed by Unity Technologies and used to develop
video games for PC, consoles, mobile devices and websites.
3.2
Method motivation
To keep the complexity of the framework low, and to allow for quick evaluation
and iteration, it was decided to go with a Nearest Neighbour (NN) classification
approach as used by Cardamone et al. [3]. Policy creation is thus done through
a mapping function. No learning is done, and the collected data represents the
model. Argall et al. [1] state that regardless of learning technique "minimal parameter tuning and fast learning times requiring few training examples are desirable".
This speaks against more sophisticated algorithms such as Neural Networks, which
require a lot of data to perform well. Cardamone et al. [3] claim that it is desirable
to have the output of the agent be high-level actions, such as a target position and
velocity as opposed to low-level actions such as a certain key press for a certain
1
https://unity3d.com/
13
14
CHAPTER 3. IMPLEMENTATION
amount of time. Other classification techniques may perform as well or better than
Nearest Neighbour algorithms, but the focus of the thesis is not to compare or find
the best classification algorithm. It is however important that the algorithm is fast,
as there is not much time for heavy calculations in a game. Karpov et al. [8] show
that using direct imitation, i.e. playing back recorded segments of human gameplay
as they were recorded, allows the bot to solve navigation problems while moving
in a human-like fashion. Their work passes the test of a structured and recognized
competition aimed at measuring human-likeness, which gives the work high credibility. It is also one of the most recent works. This project was therefore inspired
by their solution.
The implementation used imitation as the data collection approach, where the
record mapping is direct and the embodiment mapping is indirect. This is described
in more detail in the next section.
3.3
3.3.1
Implementation
Summary
An Imitation Learning framework was created which allows a human to create
human-like agent behaviour by recording its own actions. Below is a summary of
the implementation of the imitation agent. Details are described in the subsections
following this summary.
– Recording movement: The human is in control of the agent and the agent’s
state is continuously recorded.
– Playing back movement: The agent moves by executing actions. An action
is a set of states. An action is chosen by classifying the agent’s state and
weighing actions. Classification is done using a Nearest Neighbour algorithm.
– Feature extraction: The agent uses sensors to sense the environment. Reading the sensors results in a feature vector that is a representation of the environment.
– Avoiding static obstacles: If there is recorded data which corresponds to
the agent’s current state, the agent will be able to avoid obstacles by executing
the nearest neighbour action. If that is not the case, static obstacles are
avoided by checking if an action goes through a static obstacle or not in the
Nearest Neighbour algorithm. If it does the action is not considered a near
neighbour and is not chosen.
– Avoiding dynamic obstacles: Dynamic obstacles are avoided like static
obstacles, but a different feature extractor is utilized which extracts different
features. The dynamic obstacle avoidance was the last part of the implementation process.
3.3. IMPLEMENTATION
15
– KD-tree: A KD-tree is used to speed up the Nearest Neighbour algorithm.
– Grid: The environment is discretized into a grid of cells. The grid is used in
weighing actions. An action is weighted with the score a cell. The grid can
be manipulated to make the agent move to a destination.
3.3.2
Recording movement and state representation
Figure 3.1: Flowchart visualizing the record mode.
The agent can be in either Record or Playback mode. During recording, a human
is in control of the agent from the agent’s first-person perspective using a mouse and
keyboard. The record mapping was direct, meaning that the exact states/actions
were recorded in the dataset. Data was recorded when the direction vector of the
agent changed, and the distance between the agent’s current position and the last
16
CHAPTER 3. IMPLEMENTATION
recorded position was bigger than a set threshold. The policy that an IL algorithm is
meant to learn, maps a state x to an action u. Adopting this terminology, one record
of data was structured as a state. Several states make up an action. A state consists
of two parts. The first part is the agent’s position, rotation and direction (i.e.
the agent’s forward vector), called the pose state. The state representation is thus
continuous. The pose state also contains the time passed between the previous state
and the current state. The second part is a feature vector of floats, corresponding to
a representation of the environment at the current pose state. This second part is
called the sensor state. How the sensor state is created is explained in further detail
in the section Avoiding static obstacles. Karpov et al. [8] similarly use sequences of
states for representing the stored human traces, separating them into a pose state
and an event state. The data is stored by writing all recorded states as binary data
to a file. When more data is recorded, the data is appended to the existing file.
Figure 3.2 shows one environment, or scene, used during development at an early
stage of the implementation process. The aim here was to play back recorded data
by having the agent move to the closest position in the recorded data.
Figure 3.2: The scene. Recorded trajectory data in black and the
agent’s trajectory in blue.
3.3.3
Playing back movement
During Playback the agent moves on its own by executing actions. Executing an
action means moving from one recorded pose state to the next, interpolating between
states to achieve a position and rotation that approximate the recorded data. This
interpolation/approximation is a form of embodiment mapping, as the agent maps
the recorded data into movement. The embodiment mapping was therefore nondirect, meaning that the recorded states/actions were not exactly those that the
agent would execute. To find an action to execute, the agent’s sensor state is
classified using a NN algorithm. The algorithm returns the nearest recorded action
to the agent’s current sensor state. This action is then applied relative to the agent’s
current pose state so that the action’s first state is the same as the agent’s current
3.3. IMPLEMENTATION
17
rotation. To create smooth rotation between states the following was done: Suppose
that the agent is at the first state a where it has the correct rotation r1 , and the
next pose state is b containing rotation r2 . When moving from a to b, the rotation
of the agent is set to be the value of the interpolation between r1 and r2 by the
distance traveled from a to b. Upon reaching b the rotation is therefore r2 . Slight
errors in the imitation occurs here, since the human most likely did not rotate at a
constant speed when demonstrating. However, making the distance between states
short made it hard to tell a difference when observing the agent. When the agent
has finished executing an action, meaning it has reached the final pose state position
of an action, the process is repeated by classifying the sensor state again.
3.3.4
Policy
A policy is a function that maps a state to an action. The NN algorithm receives a
state as input, efficiently finds the best action with the KD-tree data structure and
returns it. Thus the NN algorithm with the KD-tree can be said to be the policy.
3.3.5
Feature extraction
An IL algorithm learns a policy that imitates the teacher, with the purpose of
generalizing to unseen data. In order to generalize, the agent had to sense its
environment and represent it in a way which allows for recognizing similar states.
The feature extraction process uses sensors on the teacher to sense the environment
and represents it as a vector of floats, called the feature vector or simply the features.
When recording a state or classifying a state, the sensor state is created by extracting
features for the agent’s current pose state.
3.3.6
Avoiding static obstacles
In many games, a desirable skill for an agent to have is to be able to avoid obstacles,
so called obstacle avoidance. In order to be able to avoid static (non-moving)
obstacles, such as walls, sensors were implemented similar to the ones used by the
authors of [8] in [23]. They show a figure similar to Figure 3.3a which represents the
sensors they use on their Quake III bot. Their motivation was that there are more
sensors near the front so that the agent can better distinguish locations in front of
it.
18
CHAPTER 3. IMPLEMENTATION
(a)
(b)
Figure 3.3: Sensors similar to those used by Schrum et al. [23] (a)
were added to the agent (b).
The feature extractor creates the sensor state by ray casting in all sensor directions
using Unity’s function Physics.Raycast. The function returns information about
what was hit, including the distance to the hit obstacle/collider. This results in
a feature vector v containing the distances x1 , ..., x6 to obstacles in the different
directions.
Figure 3.4 shows how data could be recorded in one environment (Figure 3.4a) and
played back in another (Figure 3.4b), thus showing that the approach generalizes
to new environments.
(a)
(b)
Figure 3.4: Recorded traces in black, the chosen action in green,
the chosen action applied to the agent in blue and sensors in white.
3.3. IMPLEMENTATION
19
Figure 3.4b shows how the agent currently is in the top right corner. When classifying its state, it is determined that an action should be chosen as if it currently
was in the lower left corner (the action is highlighted in green). This makes sense,
as it is a similar situation.
If there is recorded data which corresponds to the agent’s current state, the agent
will be able to avoid obstacles by executing the nearest neighbour action. However,
that may not always be the case, as there probably will not be recorded data for
every possible state. Therefore the NN algorithm checks if actions go through a
static obstacle, and if so does not consider them near neighbours and they will not
be chosen.
3.3.7
Avoiding dynamic obstacles
Another common task for game AI is to be able to avoid moving (dynamic) obstacles.
A new feature extractor was created which sensed the environment in a different
way. The area within a certain radius around the agent was sensed with the purpose
of sensing moving obstacles, visualized in Figure 3.5a.
To be able to recognize a state correctly, it was needed to be able to differentiate
between obstacles moving in different directions. For example if an obstacle is
close and headed straight towards the agent, the agent should probably dodge the
obstacle somehow. If the obstacle is headed away from the agent however, no
particular action needs to be taken. Intuitively when an agent should avoid an
obstacle, it would be important to know:
• How close is the obstacle to the agent?
• Is the obstacle moving towards or or away from the agent?
• Will the obstacle hit the agent if the agent does not move?
What is important is to be able to distinguish one state from another. The resulting
extractor extracts three features per moving obstacle within the sensor. This is
described in Algorithm 1 and visualized in Figure 3.5b. The features are
20
CHAPTER 3. IMPLEMENTATION
Algorithm 1 Dynamic obstacle extractor
1:
2:
3:
4:
5:
6:
7:
function ExtractFeatures(agent)
Sort obstacles in sensor by distance
for each moving obstacle obstacle at index i in sensor do
velocitySimilarity ← dot(agent.velocity, obstacle.velocity)
sqrDist ← sqrDist(agent, obstacle)
diffVector ← obstacle.position - agent.position
velPosSimilarity ← dot(diffVector, agent.velocity)
8:
9:
10:
11:
features[3 * i] ← velocitySimilarity
features[3 * i + 1] ← sqrDist
features[3 * i + 2] ← velPosSimilarity
(a)
(b)
Figure 3.5: The new sensor (a) and visualization of the vectors
used in calculating features for the dynamic obstacle extractor (b).
The velocitySimilarity is the dot product of the agent velocity and the obstacle
velocity. It will tell whether an obstacle is heading in the same direction as the
agent or not. velPosSimilarity is the dot product between the diffVector and
the agent’s velocity. This value says whether the obstacle lies in the agent’s current
path or not. If this value is 1 it means that the two vectors are in the same direction.
This means that the agent is headed straight towards the obstacle. sqrDist could
act as a weight for how crucial the situation is.
The proposed approach is by no means the correct or the best solution. Different
approaches similar to the above were tried, but these values were able to distinguish
the agent’s state the best out of the tried values. Using this with recorded data
containing around 100 actions demonstrating how to avoiding a single obstacle, the
21
3.3. IMPLEMENTATION
agent was able to avoid a single obstacle efficiently. Attempts were also made with
more obstacles at the same time. In many situations, the agent would avoid obstacles well, but in some it would not. In theory, like with static obstacle avoidance, if
there is data for every situation, the feature extractor separates different situations
well and the quality of the data is good, then the agent should be able to always
avoid obstacles. Good data is meant in the sense of the current goal behaviour. If
the goal behaviour is obstacle avoidance, the data is good if the recorded human
performed good/avoiding actions and did not walk into an obstacle while recording.
(a) t = 1
(b) t = 2
Figure 3.6: The agent avoiding an obstacle (blue square) moving in
the opposite direction. The blue curve is the agent’s chosen action
trajectory that it chose at t = 1 when it sensed the obstacle. At
t = 2 the agent has moved further along the trajectory and the
obstacle has moved further to the right.
3.3.8
KD-tree
It was decided to implement a data structure to make the NN algorithm more efficient. Karpov et al. [8] use a KD-tree as one of their approaches to efficiently
retrieve recorded data. KD-tree is a common approach to making NN algorithms
more efficient. Weber et al. [29] showed that if a nearest neighbour approach is
used in a space of magnitude higher than ten dimensions, it better to use a naive
exhaustive search. The reason is that the work of partitioning the space becomes
more expensive than the similarity measure. The number of features was six (distance to walls in six directions), which is less than ten, so a KD-tree should speed
up the NN algorithm.
A KD-tree is a space-partitioning data structure for organizing points in k-dimensional
space. During construction, as one moves down the tree, one cycles through the
axes used to select the splitting planes that divide the space. In the case of a twodimensional space, this could be the x and y coordinates (Figure 3.7). Points are
inserted by selecting the median point from the list of points being inserted, with
22
CHAPTER 3. IMPLEMENTATION
root
(7, 2)
X
Y
X
(5, 4)
(2, 3)
(4, 7)
(9, 6)
(8, 1)
(2, 7)
Figure 3.7: The points (7, 2), (5, 4), (2, 3), (4, 7), (9, 6), (8, 1), (2,
7) inserted in the KD-tree.
respect to the coordinates in the axis being used. If one starts with the x axis, the
points would be divided into the median point with respect to the x coordinate and
two sets: the points with an x coordinate less than the median and the points with
an x coordinate bigger than the median. Then, recursively the two sets do the same
thing, cycling on to the next axis (y).
This would correspond to cycling through the features representing distances to
walls in different directions. Algorithm 2 describes the construction of the KD-tree.
Algorithm 2 Construction of the KD-tree
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
function BuildTree(actions, depth = 0)
dimensions ← numFeatures(actions)
axis ← depth % dimensions
sort(actions) by comparing feature[axis] for actions
median ← median element in sorted actions
if median is the only element then
return TreeNode(median, null, null, axis)
a ← actionsBeforeMedian
b ← actionsAfterMedian
return TreeNode(median, BuildTree(a, depth + 1),
BuildTree(b, depth + 1), axis)
The nearest neighbour algorithm using the KD-tree is described in Algorithm 3.
The search time is on average O(log n).
3.3. IMPLEMENTATION
Algorithm 3 The Nearest Neighbour algorithm
1:
2:
3:
4:
5:
6:
function NN(node, inputState, ref nearestNeighbour, ref nearestDist)
if node is null then
return
searchPointAxisValue ← inputState[node.axis]
dist ← ∞
nodeAxisValue, index ← 0
7:
8:
9:
10:
11:
12:
13:
14:
15:
// Determine how near current action is to input
for state s at index i in node.action do
if dist(inputState, s) < dist then
dist ← dist(inputState, s)
nodeAxisValue ← node.action.state(i)[node.axis]
index ← i
if node.leftChild is null && node.rightChild is null then
return
16:
17:
18:
// Applying the action on the current state
appliedAction ← applyActionOnState(inputState, node.action)
19:
20:
21:
22:
// Let calling model weigh action (it may i.e. go through an obstacle)
weight ← weighAction(callingModel, appliedAction)
dist ← weight
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
// Determine the nearest side to search first
nearestSide, furthestSide ← null
if searchPointAxisValue < nodeAxisValue then
nearestSide ← node.leftChild
furthestSide ← node.rightChild
else
nearestSide ← node.rightChild
furthestSide ← node.leftChild
NN(nearestSide, inputState, nearestNeighbour, nearestDist)
if dist < nearestDist then
// Update nearest neighbour as recursion unwinds
nearestNeighbour ← node.action
nearestDist ← dist
37:
38:
39:
40:
41:
42:
43:
// Check if it is worth searching on the other side
nearestAxisValue ← nearestNeighbour.state(index)[node.axis]
splittingPlaneDist ← dist(inputState, splittingPlane)
nearestNeighbourDist ← dist(inputState, nearestNeighbour)
if splittingPlaneDist < nearestNeighbourDist then
NN(furthestSide, inputState, nearestNeighbour, nearestDist)
23
24
CHAPTER 3. IMPLEMENTATION
Following is a short and slightly simplified explanation of the algorithm. An extended description of how the algorithm works can for example be found in the
Wikipedia article2 .
The algorithm recursively moves down the tree, starting from the root. When it
reaches a leaf, that leaf is set as the current best. As the recursion unwinds, each
node compares its distance to the input to the current best. If the distance is
smaller than the current best, then the node is set to the current best. It also
checks whether it is possible that a nearer neighbour can be on the other side of
a node. If the distance between the current best node and the input search point
is bigger than the distance from the input search point to the current node, then
there might be a nearer neighbour on the other side of the current node, so that
side is searched. When the search reaches the root node, the search is done.
3.3.9
Discretizing the environment
In games, it is desirable to be able to tell an AI to go to a position. This diverges
from the Imitation Learning, as the sensor state is not used to decide what action
to execute. Instead an external input says what position to go to. It was decided
to implement it however, for the sake of practical usability. One could argue that
the agent still moves in a human-like fashion, as it executes actions the same way
the actions were recorded, and the only way for the agent to move is by executing
actions.
A first approach in making the agent go to a goal was to weigh the actions by how
close an action would take the agent towards the goal. This worked to some extent,
but the agent did not register where it had been or if it walked into a dead end.
This resulted in it sometimes walking around in the same area for a long time,
without realising that it did not get closer to the goal. The phenomenon is shown
in Figure 3.8. It was therefore concluded that some sort of path finding was needed
and that it would help to be able to say if a position on a map was good or bad, or
close to the goal or not.
2
https://en.wikipedia.org/wiki/K-d_tree/
3.3. IMPLEMENTATION
25
Figure 3.8: Problem with getting stuck. The blue lines show traces
of the agent trying to get to the white goal.
Priesterjahn et al. [20] used a grid to represent a state in their Neuroevolution
approach. Inspired by them, the map was discretized into a grid of cells where
each cell had a score which represented the distance from the cell to the goal.
Actions were then weighted by the score of the cell that the action ended up in. A
lower score means closer to the goal (greener in Figure 3.9a). As the agent moved
around the map, the score of the nine adjacent cells to the agent were increased,
thus decreasing the chance of picking an action which ended up in one of those
cells again. Spending time in a corner would result in those cells getting a higher
score, which would lead to the agent not going there again. This is visualized in
Figure 3.9.
26
CHAPTER 3. IMPLEMENTATION
(a) The grid.
(b) t = 1
(c) t = 2
(d) t = 3
Figure 3.9: As cells are visited, their scores are increased.
This approach solved the problem of the agent getting stuck in corners or close to
the goal but on the wrong side of a wall. This was however more of an exploring
approach, which could be used if the agent does not know where the goal is. Unless
the agent is meant to be blind, this strategy would need to be improved by scoring
cells which the agent can see. Telling the agent to go to a position means that
the agent knows where the goal is. Therefore a better path finding strategy was
implemented. Using the classic A* algorithm3 , the grid would calculate the shortest
path from the agent to the goal, and score each cell the shortest path touches with
its path distance to the goal. Other cells were scored with a bad score. This is
visualized in Figure 3.10. The grid is the tool a programmer/user would use to
influence what the NN algorithm should consider a good action to be. In the NN
algorithm, actions are weighted according to the cell score at the action’s last pose
state position.
3
https://en.wikipedia.org/wiki/A*_search_algorithm/
3.3. IMPLEMENTATION
27
Figure 3.10: Cells that touch the A* path from the agent to the
goal are scored with a low score (green).
3.3.10
Additional details
The length of an action could be chosen, which would split up the recorded data
into actions of the given length. States in an action were recorded in sequence after each other, so while executing an action, the agent moves like the human who
recorded itself did. Choosing a big action length would result in long actions, and
thus longer continuous segments of the agent behaving human-like. The downside
of long actions is that they might not be able to get the agent out of certain situations without hitting an obstacle. They may also take the agent to worse locations.
If there is no recorded data similar to the agent’s current state, the returned action probably does not suit the situation well. A longer action would then result
in a bigger bad investment, whereas a shorter action would be able to re-classify
the state sooner and hopefully get a better suiting action. Short actions however
would result in shorter continuous segments of the agent behaving human-like. It
would also require the state to be classified more often, which has an impact on the
performance. Classifying often however, increases the chance of choosing a correct
action for the situation. An action length that was somewhere in between long
and short was chosen at first. Later, support for splitting up the data into several
action lengths at the same time was implemented. This would help by making long
actions available for areas without obstacles and short actions available for trickier
situations.
In practice, for an AI to be useful in a game, it should be possible to define different
types of behaviour and be able to switch between them depending on the situation.
The implementation was structured to allow for several types of actions and models,
resulting in a loop described in Algorithm 4. Data was recorded separately for each
28
CHAPTER 3. IMPLEMENTATION
behaviour.
Algorithm 4 The agent loop
1: function Update
2:
if recording then
3:
// Recording
4:
features ← featureExtractor.ExtractFeatures(agent)
5:
recorder.Record(agent, features)
6:
else
7:
// Playback
8:
if action is done executing or was aborted then
9:
features ← featureExtractor.ExtractFeatures(agent)
10:
action ← model.Classify(agent, features)
11:
else
12:
action.Execute(agent, destination)
The agent used a controller for deciding which feature extractor to use. When a
dynamic obstacle would come within a certain distance, the agent would switch
to the feature extractor for dynamic obstacle avoidance with the corresponding
recorded actions. Otherwise it would use the static obstacle avoidance model.
3.3.11
Storing data
The recorded data was stored as raw binary data. A file containing data for 1000
recorded actions á 50 states per action corresponding to about 25 minutes of recording has a size of approximately 3 Mb. The stored data per state (pose + sensor
state) is described in Table 3.1.
Pose state
Vector3 position
Quaternion4
float posx
float posy
float posz
rotation
Sensor state
time5
Vector3 direction
Delta
float rotx
float dirx
float time
float roty
float diry
...
float rotz
float dirz
float nnumf eatures
float rotw
Table 3.1: The data stored for one state.
4
5
https://en.wikipedia.org/wiki/Quaternion
The time between the previous state and this state.
Feature vector
float n0
3.4. OVERALL IMPLEMENTATION
3.3.12
29
Optimization and measuring performance
For usage in a proper game, the computational time of the AI should be as low
as possible. The bottleneck was to apply an action on the agent’s current state
in the NN algorithm since it was checked for each traversed action if it would go
through an obstacle if applied to the agent’s current state. This was improved by
instead of checking for collision between every state in an action, the check was
approximated by only checking for collision between the first state and the middle
state, and middle state and the last state in an action. To ensure the agent did not
get stuck by picking an invalid action, it was forced to update its current action at
a certain time interval.
The performance of the imitation agent was measured by measuring the average
computational time per game frame for different amounts of data; 100, 200, 500 and
1000 recorded actions with an action length of 50. 1000 recorded actions correspond
to about 25 minutes of recording.
3.4
Overall implementation
The framework allows a user to create an agent which imitates demonstrated movement behaviour. To create a behaviour, a user creates a feature extractor which
defines what environmental features should be classified. The user then chooses
when the behaviour should be activated. The user collects data for the behaviour
by recording itself. Finally the behaviour can be played back. An agent can possess
several behaviours at once, and it is up to the user to define when which behaviour
should be activated.
This chapter described how IL can be used to create an agent that imitates human
demonstrations using a direct imitation approach and limited amounts of data. In
the next chapter, the evaluation of the imitation agent is described.
Chapter 4
Evaluation
This chapter presents the user study that was conducted in order to answer the
project’s stated questions. The results of the study are presented thereafter along
with a performance measure of the imitation agent. Following that is a discussion
section which presents and discusses what was done in the project, what the study
found to be important in looking human-like and the performance of the imitation
agent in relation to games. Finally some ethical aspects are discussed.
4.1
User study
Recall that the objective of the project (see Section 1.2) is to answer the following:
– Q1.1: How to create an agent that imitates demonstrated behaviour, using
IL with limited amounts of data?
– Q1.2: What determines if a character is human-like, when observed through
the character’s first-person perspective?
A user study was conducted in order to answer Q1.2 and to contribute to the answer
to Q1.1 by asking humans how well imitation agent imitates demonstrations. The
method chapter describes how IL can be used to create behaviour by imitating
recorded human behaviour, but no evaluation of whether the behaviour is humanlike or not. The user study aimed to evaluate the human-likeness of the agent and
to evaluate in a qualitative manner how well the agent imitates the recorded human.
As a reminder, an agent is said to be human-like if it looks like it is being controlled
by a human. The layout of the study was inspired by [27] which as presented in the
background chapter describe a simplification of a Turing test-approach. It was also
inspired by [9] which gave users statements to agree or disagree with.
31
32
CHAPTER 4. EVALUATION
4.1.1
The set-up
The study consisted of videos of three different character controllers: The imitation
agent, a human and Unity’s built in NavMeshAgent. These controllers will be labeled
Imitation Controller (IC), Human Controller (HC) and NavMesh Controller (NC)
respectively. The human provided the demonstrations for the imitation agent to
imitate. The NC was intended to act as a sanity check. A person with a lot of
gaming experience would be able to easily tell that the NC was not being controlled
by a human, as it moves very statically, does no unexpected movements and turns
with a set speed. Three different settings were set up:
Setting 1
– A simple environment like during development (Figure 4.1). When the character reaches the goal, the goal gets randomly positioned somewhere on the
map.
Figure 4.1: Setting 1.
Setting 2
– An even simpler environment but with a single moving obstacle (Figure 4.2).
33
4.1. USER STUDY
Figure 4.2: Setting 2 with a moving obstacle (blue) and the goal
(white).
Setting 3
– Same concept as Setting 1, but different map (Figure 4.3). Here, the goal
positions were deterministic, meaning that when the character reaches the
goal, the goal gets positioned at the next index in the goal positions list. This
means that all characters take the same path.
(a)
(b)
Figure 4.3: Setting 3 from a top-down view with the corresponding
first-person perspective.
One video was recorded for each of the settings and for each character controller,
resulting in a total of nine videos. The videos were recordings of the controllers
moving around in the three different settings, from a first-person perspective (Figure 4.3b). In most games, a player would observe an NPC from a third-person
perspective. Using a third-person perspective requires the observed character to be
modeled and potentially animated. Whether a user wants to or not, these things
will most likely affect the users thoughts on how the character should behave. It
is also more difficult to spot detailed movement and rotation from a third-person
perspective. In first-person perspective however, a user does not need to know or
34
CHAPTER 4. EVALUATION
see what the character looks like, and it is easier to register the characters exact
movement and rotation. Most importantly, it is easier to spot differences between
different controllers.
Figure 4.4 illustrates one trajectory of the IC from a top-down view. This trajectory
does not correspond to the one it took in the video in the study.
Figure 4.4: The (black) trajectory of the IC in the third setting.
Visited goals in red, current goal in white.
4.1.2
Participants
The user study had 32 participants of varying age and with varying gaming and AI
experience. The majority was between 20-40 years old with high gaming experience
and a moderately high understanding of what game AI is. 62.5% considered themselves to have a lot of gaming experience. 46% considered themselves to have a lot
of experience with how AI controlled characters move in games.
4.1.3
Stimuli
The users watched nine 25 seconds long video clips of three different controllers in
three different settings, from a first-person perspective. The IC used data consisting
of about 150 recorded actions, which corresponds to a couple of minutes of recording.
The data was recorded by the author of the project. The clips were shown in a Latin
square order1 .
4.1.4
Procedure
In the first part, the aim was to understand what the factors are that determine
whether the controller looks like it is being controlled by a human or not and thus
to answer Q1.2. In this part, the users were not told which controller they were
watching. They were told the following about the characters in the video clips:
1
https://en.wikipedia.org/wiki/Latin_square/
35
4.1. USER STUDY
– It can either be controlled by AI or by a human.
– There is no requirement of getting to the goal as fast as possible or taking the
shortest path.
After each video clip, the users agreed or disagreed to six different statements. The
statements were presented as five-point Likert scales2 shown in Table 4.1. The users
were asked if anything seemed unclear.
Response
Statement
1
2
3
4
5
Its movement is human-like
Disagree completely
Disagree
Neutral
Agree
Agree completely
It rotates in a human-like fashion
Disagree completely
Disagree
Neutral
Agree
Agree completely
It looks around in a human-like fashion
Disagree completely
Disagree
Neutral
Agree
Agree completely
Its pathing is human-like
Disagree completely
Disagree
Neutral
Agree
Agree completely
It avoids walls in a human-like fashion
Disagree completely
Disagree
Neutral
Agree
Agree completely
Overall, the behaviour of the agent is
Human-like
Artificial
Table 4.1: The questionnaire to be filled in by the participants.
The statements will be labeled MOVE, ROTATE, LOOKS, PATH, WALLS respectively.
Movement means the forwards, backwards and sideways movement typically using
the keys WASD on a keyboard. Rotation is done using a mouse. Looking around
means that a character could perhaps look up to the sky while walking, or quickly
turn to look at a wall behind it. Pathing means the path a character takes from
one point to another. The NC for example, always takes the fastest path.
In the second part, the aim was to determine how well the IC imitated the human
that had recorded it, and thus to contribute to the answer to Q1.1. The users were
told which character was controlled by which controller. They were then shown the
video clips of the HC and the IC and agreed or disagreed to the similar statements
as before, shown in Table 4.2. It refers to the IC and the trainer refers to the HC.
2
https://en.wikipedia.org/wiki/Likert_scale/
36
CHAPTER 4. EVALUATION
Response
Statement
1
2
3
4
5
Its movement is like its trainers
Disagree completely
Disagree
Neutral
Agree
Agree completely
It rotates like its trainer
Disagree completely
Disagree
Neutral
Agree
Agree completely
It looks around like its trainer
Disagree completely
Disagree
Neutral
Agree
Agree completely
Its pathing is like its trainers
Disagree completely
Disagree
Neutral
Agree
Agree completely
It avoids walls like its trainer
Disagree completely
Disagree
Neutral
Agree
Agree completely
Overall, the behaviour of the agent is like its trainers
Disagree
Agree
Table 4.2: The questionnaire in the comparison.
4.1.5
Hypothesis
The NC makes no effort in looking human-like. It takes the shortest path and makes
no unnecessary movement or rotations. Therefore it was believed to be rated as not
looking human-like. The HC was an actual human using a mouse and keyboard,
which causes jitter in the rotation. Gorman et al. [6] found that simple clues such
as standing and waiting were used to guess human players. The HC and the IC are
more likely than the NC to show such clues. The transition from one of the human’s
movement actions to the next is seamless and actions come in natural sequence after
each other since it is a human in control. This as opposed to the IC, which picks an
action according to its current situation. The IC does however imitate the human
while executing an action. With this reasoning it was believed that the HC would
be rated as human-like and that the IC would be rated somewhere in-between the
NC and the HC.
4.2
Results
This section presents the results of the user study and a brief performance measurement of the imitation agent. The user study results are averaged over all settings for
each question. The standard deviation describes how the number of votes differed
for the different settings. Figure 4.5 shows the results of the final question, if the
behaviour of the characters are human-like or artificial. The chart is split up into all
people, people who considered themselves to be experts and people who considered
themselves to be non-experts. These three groups consisted of 32, 20 and 8 people
respectively.
37
4.2. RESULTS
4.2.1
User study
Overall, the behaviour of the agent is human-like
IC
NC
HC
Percentage
80
60
40
20
All
Experts
Non-experts
Figure 4.5: Average human-likeness ratings from the three groups.
Experts are people who rated themselves as having a lot of gaming
experience, 4 and 5 on the scale 1-5. Non-experts are people who
rated themselves as having very little gaming experience, 1 and 2
on the scale 1-5. The vertical lines/bars are the standard deviation
which describe how the number of votes differed for the different
settings.
The results show that averaged over the three different settings, overall 48% of the
people who participated in the user study found the IC to be human-like, 73%
found the HC to be human-like and 16.5% thought that the NC was human-like
(see Figure 4.5).
Out of the experts, 68.3% thought that the HC was human-like with a standard
deviation of 5.7. Examples of what users commented are "Good mix of rotation/strafing", "Felt like it was me playing the game".
On the IC, users commented things like "Stared at the floor for some reason, but
humans do that", "Some things made it feel like human, some like artificial. Overall
perception is that it was more human than artificial", "Hard to say either. Looked
like an inexperienced player but could also be an AI...". Gorman et al. [6] presented
38
CHAPTER 4. EVALUATION
similar comments on their imitation agent, like "fires gun for no reason, so must be
human" and "stand and wait, AI wouldn’t do this". 40% of the experts found the
IC to be human-like with a rather high standard deviation of 20.0, which perhaps
reflects the insecurity in the comments.
Some comments on the NC were "Too efficient", "Constant rotation", "It feels too
precise and robot-like". 10% of the experts found it to be human-like, with a
standard deviation of 5.0.
Its pathing is human-like
IC
NC
HC
18
16
14
Votes
12
10
8
6
4
2
0
1
Disagree completely
2
3
4
5
Agree completely
Figure 4.6: Pathing. The standard deviation describes how the
votes varied for the different settings.
In the detailed questions, the IC achieved its highest human-likeness score on its
pathing (PATH in Figure 4.7), i.e. the path it takes from one point in the environment to another, where 51% (4 and 5 in Figure 4.6) thought it to be human-like.
39
4.2. RESULTS
IC
NC
HC
Mean Likert response
5
4
3
2
1
MOVE
ROTATE
LOOKS
PATH
WALLS
Figure 4.7: Mean points for the three different controllers on the
five-point Likert scale, summed over the three settings. The standard deviation shows how the answers differed from the mean value.
Figure 4.7 shows the results of the detailed questions. The variation in the scores
were all similar to PATH with slight differences. Figure 4.6 displays why the standard deviation is in this figure is quite high.
40
CHAPTER 4. EVALUATION
Comparison: it behaves like its trainer
IC
Votes
15
10
5
0
1
Disagree completely
2
3
4
5
Agree completely
Figure 4.8: The second part of the user study, where it was asked
how well the IC imitates the demonstrating HC.
On average 75% (4 and 5 in Figure 4.8) of all people agreed that the IC behaves
like the demonstrating human.
4.2.2
Imitation agent performance
The Unity engine has a built in profiler which shows the exact amount of time each
function takes each frame. That data is however only available during runtime and
cannot easily be extracted. Upon manual inspection of the profiler over time, using
100, 200, 500 and 1000 recorded actions with an action length of 50, and using
up to five agents at the same time, the computational time stayed below 0.1ms on
average. In the cases where multiple agents would classify their state on the same
frame, the computational time would go up to 1.5ms that frame.
The experiments were made on a computer with 64-bit Windows 10, Intel Core
i5-6500 @ 3.20 GHz (4 CPUs), 8 GB of RAM and a GeForce GTX 1060 6GB.
4.3. DISCUSSION
4.3
41
Discussion
Research has shown Imitation Learning to be a successful technique for creating
agent behaviour [26][3][15] and also human-like agent behaviour [20][19][8][18]. This
project described a method for how Imitation Learning can be used to create agent
behaviour using limited amounts of data. This was demonstrated through the
created framework described in the method chapter. The framework allows for
recording human demonstrations and playing back agent behaviour which imitates
the demonstrations. A user study was conducted in order to evaluate the humanlikeness of the imitation agent and in a qualitative manner determine how well it
imitates the demonstrating human.
4.3.1
The imitation agent
The framework similarly to [13] specifies the target behaviour before recording. It
is heavily inspired by Karpov et al. [8] who use their Human Trace Controller for
playing back recorded sequences of human traces for getting unstuck. They were
able to create an agent that solves navigation problems while moving in a humanlike fashion and were able to fool 50% of judges in their human-likeness test. Their
approach does not generalize to new environments however, as they store positions
of specific environments. When replaying a sequence they look for sequences with
stored positions close to the agents current position.
This project ended up achieving something similar to [8]’s controller, but doing it in
custom arbitrary environments with no pre-existing data. About 40% of the users
in the user study believed that the imitation agent was controlled by a human.
The framework would be well suited for people who for example develop games
in Unity and would like to quickly have an NPC with more interesting behaviour
than the standard NavMeshAgent, without having to spend time and resources on
programming it. The agent’s Playback loop can easily be paused if the imitating
behaviour would not always be required.
4.3.2
4.3.2.1
The user study
Reliability of the user study results
The user study was not a big study and could be improved. Some users said that
it is hard to say if something moves human-like or like they would move without
having tried to control an agent or "felt" the controls themselves. The HC was used
to and comfortable with the controls and the project. Perhaps the study should
have let several different humans record themselves and create their own imitation
agents, that they then could watch and evaluate. However, the purpose of the
created framework is to be used by an expert, who knows how it wants the agent to
behave. In that sense the videos were recorded correctly. Munoz et al. [15] imitate
a human, a ML AI and a hand-coded AI, but their goal is different. They want to
42
CHAPTER 4. EVALUATION
create competitive behaviour by imitating three different good controllers. When
measuring what is human-like and what is not in a more general fashion though,
more variation in human behaviour would probably have been better.
With the current length of the video clips, the average time to complete the study
was around twenty minutes. It was determined that the study should not take
longer, in order to keep a user’s focus and to make more people willing to participate.
Longer videos would perhaps have given better results though, as the users would
have been able to observe the characters for longer periods of time.
4.3.2.2
What is important in looking human-like?
The question of what is human-like is could perhaps be partly answered by determining what is not human-like. Comments from users in the study were the
most consistent regarding the NC not being human-like, frequently stating that it
was too efficient, rotated with a constant speed, was predictable and did nothing
unexpected.
The HC was rated as being human-like by 70% of the users. A guess without
evidence is that this is a result of the users being unsure on the IC and do not want
to be fooled, so they "play it safe" and say that it is controlled by AI.
It is also interesting to investigate what it is in the IC that makes it not look as
human as the HC does. Many of the comments describe feelings, as in it felt like AI
behaviour, rather than specifying exactly what they mean. The short length of the
video clips may have been a reason for this, as the impression of a character becomes
more of a summarized feeling of what had been observed, rather than exact details
about different scenarios. Some people noted on the IC that it would sometimes
get too close to a wall. This is a result of the agent executing an action whose last
state ended up being close to a wall. When the state gets classified again the agent
will pick an action which probably makes a sharp turn or in some way avoids the
wall. A human would perhaps have seen the wall coming "several actions" away
and planned a different path. Planning a longer path or trajectory is an extension
that would potentially make the agent appear more human-like. Other comments
similarly pointed out single events as breaking the illusion of the character being
human-like, like one too fast rotation.
Actions without a clear purpose seem to correspond to human behaviour according
to the study, like looking down at the floor for no reason. The reason could be
that these kinds of behaviour are not typically implemented by an AI programmer.
There were also comments on strafing (sideways movement), and that doing it in
correct situations was something that was human-like behaviour. Strafing around a
corner to have vision of what is behind the corner is one such example. When using
the grid with the IC to specify a destination, an action which involved strafing could
sometimes be weighted as the best action, because it got the agent closer to the
destination. However, a strafing motion may not have been a human-like thing to
4.3. DISCUSSION
43
do in that situation. This is something that users used as motivation to why the IC
sometimes did not look human-like. How actions are weighted is something that can
be tweaked, and is a trade-off between the agent getting to a specified destination
faster and thereby ignoring "learning", and behaving similar to the demonstration
given the current state.
4.3.3
Creating non-human-like behaviour
As mentioned in the introduction, there is not anything stopping a user of this
framework from creating agent behaviour that is not human-like. The framework
is made to imitate the demonstrated behaviour.
4.3.4
Performance in relation to games
The performance of the IC presented in the result chapter shows that it can operate
in about 0.1ms per frame. This does however depend on several factors, such as what
action length is chosen, how many simultaneous agents are used and how often it
should check for moving obstacles if using the obstacle avoidance behaviour. There
are probably ways of optimizing it further, but the framework created in this project
is a prototype which shows that it is possible to use it in a modern game.
4.3.5
Ethical aspects
There is a debate about the increased use of computer and video games3 , whether
the content of the games change the behaviour and attitude of a player or not. For
example there are theories that violent games could influence aggression in players.
Making agents in games more human-like is not likely to affect this. It is up to the
creator of the agent to decide whether the agent should act violently or not. A more
human-like agent would perhaps appear more real though, making the violence more
real. Either way, the human-like agents will not be more human-like than actual
human players, so if this is a problem it already existed before human-like agents.
In the possibility that more human-like agents in games make the games more
interesting and fun to play, one could argue that the agents contribute to peoples’
potential addiction to games, which could be considered negative from a societal
or social sustainability point of view. It could also lead to decreasing the need for
multiplayer support in games, since the human-like agents to some extent could
replace other humans, thus decreasing an important social aspect of gaming. It
could however add an extra social aspect to games that otherwise would have had
none. A bigger interest in the games could also lead to better sales for the game
making companies, leading to more and better games.
3
https://en.wikipedia.org/wiki/Video_game_controversies/
Chapter 5
Conclusions
This chapter presents the conclusions of the project and future work.
The method chapter describes the implementation of the imitation agent using IL
and limited amounts of data. The only way for the agent to move is by executing
actions. During the execution of an action, the agent imitates the recorded behaviour. Therefore, if the recorded behaviour is human-like, the agent behaviour
will be human-like. Additionally, the comparison made in the user study shows that
the majority of the people who participated think that the agent behaves like the
human who recorded it did. IL proves to be a good technique for creating humanlike behaviour, like previous research has stated. This project shows one way of
doing it, using a Nearest Neighbour algorithm with a KD-tree as the policy that
maps a state to an action. Q1.1: How to create an agent that imitates demonstrated
behaviour, using IL with limited amounts of data? can thus be considered answered.
If a character appears to be human-like when observed through the characters firstperson perspective (Q1.2: What determines if a character is human-like, when
observed through the character’s first-person perspective?) seems to be determined
by several factors. According to the user study, human-like traits are actions without
a clear purpose, such as looking down at the floor for no particular reason. Also
moving with correct usage of timing and motion, like strafing around corners appears
to be important. Another thing is to have varied but consistent movement. If
movement is too static, with for example a constant rotation speed or a too straight
and precise path, it does not look human-like. On the other hand, if the rotation
speed and timing is consistent but occasionally twitches or breaks the pattern, it
also does not look human-like.
Combining the answers to Q1.1 and Q1.2 answers Q1: How can IL be used to
create human-like agent behaviour, using limited amounts of data?.
45
46
5.1
CHAPTER 5. CONCLUSIONS
Future work
The question of what makes a character human-like would be better answered if the
study contained more different recorded humans. Given more time and resources
it would have been interesting to do a bigger user study. As mentioned in the
discussion, letting several different humans record themselves and create their own
imitation agents that they then could watch and evaluate would be one interesting
experiment. A question could then be if the human feels like it recognizes itself
play when watching the agent. It would also be interesting to dig deeper into why a
person thinks that a character looks human-like or not, and to what extent a person
thinks a character looks human-like or not because it behaves or does not behave
like the person itself would. Some of the users’ comments in the user study motivate
an agent looking human-like because it moves like they would have moved.
The study mostly investigates what low-level actions make a character look humanlike, such as rotation and movement. One extension would be to have a more realistic
and complex game scenario with a dynamic environment and more sophisticated
behaviour like in [8], such as shooting, jumping and avoiding enemies. Then a more
general behaviour and the agent’s decision making could be evaluated.
Since it seems that the imitation agent shows some human-like traits, it would be
interesting to investigate whether human players think that it is more entertaining
to play with an agent that imitates human behaviour than with an agent that does
not. Especially in VR games this could be of interest, as the player is the virtual
character as opposed to controlling the character with a mouse and keyboard.
The framework could be extended by path planning further ahead to prevent ending up too close to a wall and plan according to what is visible to the agent, in
order to make it appear more human-like. The efficiency of the action selection
could be improved by improving the data. Common actions could be identified and
reduced into fewer actions to reduce the data set. A possible extension is also to
modify demonstrations and improve their efficiency, in terms of faster getting the
agent a longer distance. Argall et al. [1] mentions eliminating parts of the teacher’s
executions that are suboptimal as an approach to dealing with low quality of the
demonstration dataset. Chen and Zelinsky [5] present a method for identifying and
eliminating noise in a demonstration which could be a useful technique. This would
however possibly remove some of the human-likeness and make the agent become
more similar to traditional AI.
It would be interesting to see if implementing actual learning, with for example a
Neural Network, would yield the same results in a similar user study. If the results
are similar or better with a learning approach, it would be interesting to compare the
complexity of the framework and amount of control in shaping behaviour of the two
approaches (learning/not learning). Perhaps the framework created in this project
could act as a good tool for prototyping agent behaviours, or to get a working agent
with human-like behaviour up and running with little effort.
5.1. FUTURE WORK
5.1.1
47
Use outside of games
Imitation Learning has been most widely used in the area of robotics [1]. It has
been discussed that since robots one day could be able exist and work together
with humans, it would facilitate the interaction if the movements of the robot are
human-like and look natural [2], where Imitation Learning is one approach to achieve
that. There is also an interest in the area of human crowd simulation, to make the
individual humans in a crowd behave in a human-like or natural fashion [10].
Bibliography
[1]
Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and autonomous systems,
57(5):469–483, 2009.
[2]
Tamim Asfour, Pedram Azad, Florian Gyarfas, and Rüdiger Dillmann. Imitation learning of dual-arm manipulation tasks in humanoid robots. International
Journal of Humanoid Robotics, 5(02):183–202, 2008.
[3]
Luigi Cardamone, Daniele Loiacono, and Pier Luca Lanzi. Learning drivers
for TORCS through imitation using supervised methods. In Computational
Intelligence and Games, 2009. CIG 2009. IEEE Symposium on, pages 148–
155. IEEE, 2009.
[4]
Yu-Han Chang, Rajiv T Maheswaran, Tomer Levinboim, and Vasudev Rajan.
Learning and evaluating human-like NPC behaviors in dynamic games. In
AIIDE, 2011.
[5]
Jason Chen and Alex Zelinsky. Programing by demonstration: Coping with
suboptimal teaching actions. The International Journal of Robotics Research,
22(5):299–319, 2003.
[6]
Bernard Gorman, Christian Thurau, Christian Bauckhage, and Mark
Humphrys. Believability testing and bayesian imitation in interactive computer games. In International Conference on Simulation of Adaptive Behavior,
pages 655–666. Springer, 2006.
[7]
Robin Hunicke. The case for dynamic difficulty adjustment in games. In
Proceedings of the 2005 ACM SIGCHI International Conference on Advances
in computer entertainment technology, pages 429–433. ACM, 2005.
[8]
Igor V. Karpov, Jacob Schrum, and Risto Miikkulainen. Believable Bot Navigation via Playback of Human Traces, pages 151–170. Springer Berlin Heidelberg,
2012. URL http://nn.cs.utexas.edu/?karpov:believablebots12.
[9]
Geoffrey Lee, Min Luo, Fabio Zambetta, and Xiaodong Li. Learning a super
mario controller from examples of human play. In Evolutionary Computation
(CEC), 2014 IEEE Congress on, pages 1–8. IEEE, 2014.
49
50
BIBLIOGRAPHY
[10] Alon Lerner, Yiorgos Chrysanthou, Ariel Shamir, and Daniel Cohen-Or. Data
driven evaluation of crowds. In International Workshop on Motion in Games,
pages 75–83. Springer, 2009.
[11] Mei Yii Lim, João Dias, Ruth Aylett, and Ana Paiva. Creating adaptive affective autonomous NPCs. Autonomous Agents and Multi-Agent Systems, 24(2):
287–311, 2012.
[12] Daniel Livingstone. Turing’s test and believable AI in games. Computers in
Entertainment (CIE), 4(1):6, 2006.
[13] Manish Mehta, Santiago Ontanón, Tom Amundsen, and Ashwin Ram. Authoring behaviors for games using learning from demonstration. In Proceedings
of the Workshop on Case-Based Reasoning for Computer Games, 8th International Conference on Case-Based Reasoning (ICCBR 2009), L. Lamontagne
and PG Calero, Eds. AAAI Press, Menlo Park, California, USA, pages 107–
116, 2009.
[14] Andres Munoz. Machine learning and optimization. URL: https://www. cims.
nyu. edu/˜ munoz/files/ml_optimization. pdf [accessed 2016-03-02][WebCite
Cache ID 6fiLfZvnG], 2014.
[15] Jorge Munoz, German Gutierrez, and Araceli Sanchis. Controller for TORCS
created by imitation. In Computational Intelligence and Games, 2009. CIG
2009. IEEE Symposium on, pages 271–278. IEEE, 2009.
[16] Chrystopher L Nehaniv, Kerstin Dautenhahn, et al. The correspondence problem. Imitation in animals and artifacts, 41, 2002.
[17] Jeff Orkin. Three states and a plan: the AI of FEAR. In Game Developers
Conference, volume 2006, page 4, 2006.
[18] Juan Ortega, Noor Shaker, Julian Togelius, and Georgios N Yannakakis. Imitating human playing styles in super mario bros. Entertainment Computing, 4
(2):93–104, 2013.
[19] Steffen Priesterjahn. Imitation-based evolution of artificial players in modern
computer games. In Proceedings of the 10th annual conference on Genetic and
evolutionary computation, pages 1429–1430. ACM, 2008.
[20] Steffen Priesterjahn, Oliver Kramer, Alexander Weimer, and Andreas Goebels.
Evolution of reactive rules in multi player computer games based on imitation.
In International Conference on Natural Computation, pages 744–755. Springer,
2005.
[21] Joe Saunders, Chrystopher L Nehaniv, and Kerstin Dautenhahn. Teaching
robots by moulding behavior and scaffolding the environment. In Proceedings of
BIBLIOGRAPHY
51
the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, pages
118–125. ACM, 2006.
[22] Ayse Pinar Saygin and Ilyas Cicekli. Pragmatics in human-computer conversations. Journal of Pragmatics, 34(3):227–258, 2002.
[23] Jacob Schrum, Igor V Karpov, and Risto Miikkulainen. Human-like combat
behaviour via multiobjective neuroevolution. In Believable bots, pages 119–150.
Springer, 2013.
[24] Noor Shaker, Julian Togelius, Georgios N Yannakakis, Likith Poovanna,
Vinay S Ethiraj, Stefan J Johansson, Robert G Reynolds, Leonard K Heether,
Tom Schumann, and Marcus Gallagher. The turing test track of the 2012
mario AI championship: entries and evaluation. In Computational Intelligence
in Games (CIG), 2013 IEEE Conference on, pages 1–8. IEEE, 2013.
[25] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
[26] Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. Imitation learning at all levels of game-AI. In Proceedings of the international conference on
computer games, artificial intelligence, design and education, volume 5, 2004.
[27] Iskander Umarov and Maxim Mozgovoy. Believable and effective AI agents in
virtual worlds: Current state and future perspectives. International Journal of
Gaming and Computer-Mediated Simulations (IJGCMS), 4(2):37–59, 2012.
[28] Andreas Vlachos. An investigation of imitation learning algorithms for structured prediction. In EWRL, pages 143–154, 2012.
[29] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and
performance study for similarity-search methods in high-dimensional spaces. In
VLDB, volume 98, pages 194–205, 1998.
www.kth.se