Charles University in Prague
Faculty of Mathematics and Physics
MASTER THESIS
Csaba Tóth
Planning systems in game level design
and agent control
Department of Software and Computer Science Education
Supervisor of the master thesis: Mgr. Rudolf Kadlec
Study programme: Informatics
Specialization: Software Systems
Prague 2012
I would like to thank to my supervisor Rudolf Kadlec for his always available
guidance during the preparation of the thesis. I would also thank to my family
and close friends whose advices greatly assisted my work. Finally I would like to
thank my testers who were kind enough to try my game.
I declare that I carried out this master thesis independently, and only with the
cited sources, literature and other professional sources.
I understand that my work relates to the rights and obligations under the Act
No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that
the Charles University in Prague has the right to conclude a license agreement
on the use of this work as a school work pursuant to Section 60 paragraph 1 of
the Copyright Act.
............
signature of the author
Název práce: Plánovánı́ v designu hernı́ch úrovnı́ a řı́zenı́ agentů
Autor: Csaba Tóth
Katedra: Kabinet software a výuky informatiky
Vedoucı́ diplomové práce: Mgr. Rudolf Kadlec, Kabinet software a výuky informatiky
Abstrakt: Plánovacie systémy sú vyspelé nástroje výpočtovej techniky, ale ich
využitie v počı́tačových hrách je zatiaľ obmedzené. Rozhodli sme sa vytvoriť hru,
ktorá je založená na týchto nástrojoch. Ak sa nám podarı́ dokázať, že voľne dostupné, obecné plánovače sú vhodnými náhradami špeciálnych riešenı́ moderných
hier, ukázali by sme nový smer malým vývojovým firmám a popularizovali nový
typ hry, ktorý je riadený plánovačom. Pokúsili sme použiť plánovače v dvoch
úlohách: vo vývoji, na čiastočnú automatizáciu designu levelov a počas skutočnej
hry, vo vytváranı́ správania sa agentov. Náš program je hlavolam, kde má hráč za
úlohu odhaľovať plány a zámery všetkých agentov, spoznať nebezpečia, a použitı́m
zı́skanej znalosti potom upraviť herný svet tak, aby donútil agentov k dosiahnutiu
požadoného výsledku. Ďalšou súčasťou programu je generátor herných levelov.
Dospeli sme k záveru, že určitými ohraničeniami dokážeme plánovač využiť vo
vytváranı́ vierohodného správania sa agentov. Napriek tomu, že určité rozhodnutia plánovač nedokáže za nás vykonať, uľahčı́ prácu designera herných levelov.
Klı́čová slova: plánovanie, hra, level dizajn, umelá bytosť, uveriteľné správanie
Title: Planning systems in game level design and agent control
Author: Csaba Tóth
Department: Department of Software and Computer Science Education
Supervisor: Mgr. Rudolf Kadlec, Department of Software and Computer Science
Education
Abstract: Planners are well developed tools in computer science, but their role
is rather limited in games. We decided to test the possibilities of writing such
a game around them. Proving that freely available, general purpose planning
systems are worthy alternatives to custom made solutions could open a new path
to small developers and popularize a new kind of gameplay powered by these tools.
We attempted to exploit planners in two roles, in the development, to partially
automate the level design process; and in the gameplay as a decision making
tool for the acting agents. Our program is a puzzle game, more precisely an
anticipation game, where the player has to foresee the future actions of the acting
agents, then discover the pitfalls where they are heading. Using this knowledge
he can modify their environment and force them to a desired outcome. A part
of our program is a generator of such game challenges. We came to a conclusion
that within some limitations planners are capable of creating believable agent
behavior and while not all decisions can be handed over to the planning systems,
they can lighten the task of the level designer.
Keywords: planning, game, level design, artificial being, believable behavior
Contents
1 Introduction
1.1 Related work . . . . . . . . . . . .
1.1.1 Random generated levels . .
1.1.2 Off-line planning . . . . . .
1.1.3 On-line planning . . . . . .
1.2 Overview of the following chapters
2 Game world
2.1 Backstory . . . . . . . .
2.2 The protagonis . . . . .
2.3 Dangers . . . . . . . . .
2.3.1 Security cameras
2.3.2 Guards . . . . . .
2.4 Surroundings . . . . . .
2.4.1 Vending machine
2.4.2 Container . . . .
2.4.3 Door . . . . . . .
2.5 Items . . . . . . . . . . .
2.6 The player . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Gameplay
3.1 Running the game . . . . . . . . . .
3.2 Game mechanics . . . . . . . . . . .
3.2.1 Game turns and environment
3.2.2 Victory condition . . . . . . .
3.2.3 Losing a level . . . . . . . . .
3.2.4 Intent visualization . . . . . .
3.3 Player penalties . . . . . . . . . . . .
3.4 Tutorial . . . . . . . . . . . . . . . .
3.5 Gameplay example . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
5
5
5
6
6
.
.
.
.
.
.
.
.
.
.
.
7
7
7
7
7
8
8
8
9
9
9
9
.
.
.
.
.
.
.
.
.
10
10
10
10
11
11
11
12
12
13
4 Problem definition
16
5 Agent control
5.1 Goals . . . . . . . . . . . . . . . . . .
5.2 Knowledge Base . . . . . . . . . . . .
5.3 Self awareness . . . . . . . . . . . . .
5.4 Sensors . . . . . . . . . . . . . . . . .
5.5 Effectors . . . . . . . . . . . . . . . .
5.6 Environment . . . . . . . . . . . . . .
5.7 Decision making . . . . . . . . . . . .
5.8 Translation to planning problems . .
5.8.1 Planning for multiple agents .
5.8.2 When to change plans . . . .
5.8.3 Plan execution and replanning
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
19
19
19
20
20
20
21
21
22
22
23
5.8.4 Time requirements . .
5.9 Characteristics of our method
5.9.1 No planning in time .
5.9.2 Suboptimal planning .
5.9.3 Planning granularity .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
25
25
26
26
.
.
.
.
.
.
.
28
28
28
29
31
31
32
33
7 Implementation
7.1 Program structure . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
35
35
8 Planners
8.1 Requirements . . . . . . .
8.2 Domain and problem sizes
8.3 Tested planners . . . . . .
8.3.1 Blackbox . . . . . .
8.3.2 FF . . . . . . . . .
8.3.3 HSP . . . . . . . .
8.3.4 LPG . . . . . . . .
8.3.5 LPRPG . . . . . .
8.3.6 Marvin . . . . . . .
8.3.7 MaxPlan . . . . . .
8.3.8 MetricFF . . . . .
8.3.9 MIPS-XXL . . . .
8.3.10 SGPlan . . . . . .
8.4 Comparison . . . . . . . .
8.4.1 Test conditions . .
8.4.2 Results . . . . . . .
8.5 Conclusion . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
36
36
37
37
37
37
38
38
38
38
38
38
39
39
39
40
42
.
.
.
.
.
.
.
43
43
43
44
44
45
45
45
6 Creating game levels
6.1 Generator algorithms . . . . . . . . . . .
6.1.1 Leading the agent into traps . . .
6.1.2 Placing traps on the agent’s path
6.1.3 Trap rooms to game objects . . .
6.2 Replanning behavior and level creation .
6.3 Conclusion . . . . . . . . . . . . . . . . .
6.4 Running the level generator . . . . . . .
9 Player responses
9.1 Tester A . . . . . .
9.2 Tester B . . . . . .
9.3 Tester C . . . . . .
9.4 Tester D . . . . . .
9.5 Tester E . . . . . .
9.6 Tester F . . . . . .
9.7 Player performance
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 Conclusion
10.1 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 Alternative problem definition . . . . . . . . . . . . . . . .
10.1.2 Feature requests . . . . . . . . . . . . . . . . . . . . . . . .
47
48
48
49
Bibliography
51
List of Tables
54
List of Abbreviations
55
A Attachments
56
B User documentation – game
B.1 Running the game . . . . . .
B.2 Adding new levels to the game
B.3 Status line . . . . . . . . . . .
B.4 Buttons and shortcuts . . . .
B.5 Context menu . . . . . . . . .
.
.
.
.
.
57
57
57
57
58
58
C User documentation – level generator
C.1 Running the generator . . . . . . . . . . . . . . . . . . . . . . . .
C.2 Usage example . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.3 Level files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
60
60
60
D PDDL domain
61
E Levels in planner tests
67
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1. Introduction
Planning is the process of constructing action sequences that need to be executed
in order to reach a selected goal from a given premise. Software systems, capable
of accomplishing this task are called planners or planning systems. The field
of artificial intelligence that works on designing and implementing such tools is
called automated planning. As Vrakas and Vlahavas [1] writes, this well developed
area is nearly as old, as informatics itself. Its roots are considered to be in the
1960s, with the General Problem Solver program done by Newel, Simon, et al [2],
but there are still yet unexplored areas, and new places of application.1
Planning is an important task in nearly every computer game; however it
was never taken as a fundamental concept. The general approach is to create
a custom built planner for the specific tasks that occur in the game under development. Such narrow focus is both their strength, and weakness. The small
domain implies, they are fast, and in addition, their relatively low complexity
makes them easy to write. On the other hand they are hard to extend, and
require maintenance.
Our theory was, that just as with engines, a game programmer does not need
a custom tool; that off-the-shelf alternatives are capable of providing comparable
results in virtually zero development time. As opposed to their customized counterparts, general purpose tools are inherently slower, but also more flexible, and
most importantly, they are written, and maintained by third party professionals.
We decided to write a whole game around the use of planners. We decided to
exploit them in two distinct way, namely to generate the game levels, and in the
gameplay to act as the decision making tool of the acting agents.
A typical chore when making a game is level design. In modern games it
is one of the most time consuming works of the development, however several
parts of the process does not require actual creativity, only calculating power.
In the developer community there were already some attempts to automate level
creation, for example by random generating them (we will return to these projects
at section 1.1, and again in section 10.1).
We were looking for a way to pass the task of level creation on external
planners. While we didn’t want to completely replace the human hand, we tried
to provide a tool to generate varied player challenges on a single layout provided
by the developer.
Another place where planning may come handy is agent control. It is a well
studied field with many different approaches, to get a good overview, we recommend [3] as a good starting point. In computer games the usual approach is to go
with finite-state machines or decision trees. They are both relatively simple, and
straightforward to use. In comparison with planners, their weakness is their rigidity; every possibility must be accounted for by the programmer, while a planning
system may create seemingly rational sequences of actions without the developer
ever considering them.
We choose the program genre in a way to benefit the most from the problem solving capabilities of our selected tools. Our program is basically a logic
1
For a less brief introduction to the principles of planning, read Ghallab, Nau, and Traverso’s
Automated Planning: Theory and practise [4].
4
game, more precisely an anticipation game. In our previous report Rudolf Kadlec
summarized it as follows:
“Imagine you play a game where the main agent has a mission that he
must accomplish. He creates a plan for this mission but due to incomplete
knowledge of the environment there will be some pitfalls in his plan.
The human player has more complete knowledge of the environment,
thus when observing execution of the agent’s plan, he can anticipate
these pitfalls. Once he identifies a pitfall, he can modify the environment
so that the agent has to replan and the new plan avoids this pitfall.
The player influences the agent only indirectly through changes of the
environment.”[5]
In our implementation the protagonist is a burglar in a foreign building and
the pitfalls are represented as the security system. The player slips into the role
of a god-like being disabling cameras, locking, or opening doors to avoid the
otherwise inevitable capture of the burglar.
1.1
Related work
To our knowledge so far no game have set planning as its core principle; nevertheless there are plenty of projects, and experiments that from some point of
view are similar to ours. They are either trying to make the level designer’s work
easier or they take planning to the game industry.
1.1.1
Random generated levels
Random generation is a widely used, and explored2 concept from the simplest
card games to sophisticated dungeon crawlers, for example the Diablo series3 or
Torchlight 2 4. In such games the program generates each level of the dungeon by
assembling modular “chunks” of the game environment. Each part is designed
by hand; they can contain scripted events and interactive objects. This approach
is intended to create dungeons with always differing, but purposeful design.
Our method differs in the basic principles: while they chain the challenges
together into a new world layout, we take the layout as given, and fill it with
challenges.
1.1.2
Off-line planning
There are a few projects that explored the off-line use of planners before us, but
in a different manner. While we were concentrating on graph problems in our
level generator the following papers approached the problem from the perspective
of story creation.
2
Mostly
documented
only
on
developer
community
http://roguebasin.roguelikedevelopment.org, (10.03.2012)
3
Home page: http://us.battle.net/d3/en, (10.03.2012)
4
Home page: http://www.torchlight2game.com, (10.03.2012)
5
sites,
like
Li, and Riedl [6] used planning methods to generate backstories for role playing game characters. Extended work was done on the fields of interactive storytelling [7, 8] by Porteous, Cavazza et al. They concentrate on translating the
task of good story composition to good plan creating. Probably the most similar
to our work is the automatic storyboard generation, which produces Hitman 2 5
missions in a comic strip format by Pizzi, Cavazza, et al [9]. Here the designer
uses planning to explore the possible solutions, if he finds some unsatisfactory
ones, he changes the initial level setup to fix the problem.
1.1.3
On-line planning
Just as in development we tried to use off-the-shelf, PDDL[10] (Planning Domain Definition Language) compatible planners in our gameplay; we have found
no commercial use of such tools. The planning systems preferred by the game
industry are Goal-Oriented Action Planning (GOAP) and Hierarchical task network (HTN).
GOAP [11, 12] is a simplified STRIPS -like [18] planning architecture specifically designed for real-time control of autonomous characters. Its success illustrates the list of commercial games using this planner6 . HTN organizes tasks into
hierarchical layers; its use in game environments was demonstrated by Hoang,
Muñoz-Avila, et al [13, 14].
We found very few uses of real STRIPS-like PDDL compatible planners directly in gameplay. The works of Bartheye and Jacopin [15, 16] are centered around
real time generation of plans in situations where fluent gameplay is essential.
In accord with International Planning Competition challenges there have been
recently proposed benchmark tasks motivated by needs of First Person Shooter
(FPS) games by Vassos and Papakonstantinou [17]. Opposed to the above mentioned works our proposed game genre has a more relaxed pace and true real time
planning was not a priority to us. We built our program with logical challenges
in mind and gave up some of the premises that current FPS games maintain, like
the agent’s full knowledge of the level layout.
1.2
Overview of the following chapters
The rest of the thesis continues as follows. In chapter 2: Game world we describe
the characters and the environment where our game takes place. The next chapter, titled Gameplay (chapter 3) shows the program from the player’s perspective.
In chapter 4: Problem definition we detail the formal definition of the program’s
concept. Chapter 5: Agent control describes how we integrated the planners into
the gameplay. The following chapter 6: Creating game levels contains our approach towards level design. In chapter 7: Implementation we briefly describe the
third party libraries we used through the development and the basic structure of
the program. The used planners and tests we conducted on them can be found
in chapter 8: Planners. We finish the thesis with chapters 9: Player responses
about users’ experience, and chapter 10: Conclusion.
5
6
No official site found in the time of writing (10.03.2012)
for more details, see http://web.media.mit.edu/∼jorkin/goap.html (23.11.2011)
6
2. Game world
In this chapter we describe the imaginary world that we created to serve as the
background of our game.
2.1
Backstory
Once upon a time there was a slightly challenged burglar with a grand idea. He
wanted to collect the most beautiful treasures of the world. The only flaw in his
plan was his lack of money, so he set out on a great tour to sneak into museums
all around the world and steal the artifacts he craved for. Each time he hatches a
plan to break into a museum to take the valuable item. The most interesting in
his story is that he was never caught; some say that the burglar is watched over
by some kind of an invisible guardian.
2.2
The protagonis
He is an ordinary man shown in figure 2.1 with no special abilities but to move
silently in dark rooms. Stealing a sleeping guard’s hat (figure 2.1b) allows the
burglar to slip through the watch of security cameras unnoticed. He has a basic
idea of the building’s layout, and the whereabouts of the museum’s most precious
artifact. Sometimes he may even know some of the security measures which he
avoids at all cost.
(a)
(b)
Figure 2.1: a) burglar, b) burglar in a stolen uniform
2.3
Dangers
In our world a building’s protection is accomplished in a passive manner by locked
doors, and in an active mannery by patrolling guards and security cameras.
2.3.1
Security cameras
Cameras as shown in figure 2.2 are automatized security systems; they have no
need to sleep, but unfortunately they can be disabled by unknown forces. When
they are active they generate their own light beam that fills a single room they
are protecting. In case a camera is detecting movement it tries to identify it as a
guard, based on his characteristic hat; if the identification process fails the camera
sounds the alarm to trap the intruder preventing any further foreign activity.
7
(a)
(b)
Figure 2.2: a) turned off camera, b) active camera
2.3.2
Guards
A typical member of the night shift has a well recognizable blue hat and uniform
as displayed in figure 2.3. They are patrolling on dedicated territories using their
flashlights to look into every corner. If they see any unrecognized face even if
wearing a familiar guard hat, they sound the alarm and catch the intruder. If
guards bumps into a locked door on their path, they either find an alternative
route or a key to open it. Their only weakness is fatigue; sometimes they can be
seen sleeping on the job (figure 2.3b). In that case they don’t wake up until next
morning.
(a)
(b)
(c)
Figure 2.3: a) patrolling guard, b) dazed guard, c) guard, after its uniform has
been taken
2.4
Surroundings
The environment of the game is set into a museum. It consists of a maze of rooms
containing several static objects, like cameras, vending machines and containers.
2.4.1
Vending machine
Vending machines shown in figure 2.4 hold sweet beverages that every museum
visitor craves. If a drink happens to drop out of the machine (figure 2.4b), no
person is able to resist the urge to pick up and drink it before continuing on its
journey.
(a)
(b)
Figure 2.4: a) powered down vending machine, b) working vending machine
8
2.4.2
Container
The exhibits of the museum are stored in various containers as shown in figure 2.5.
In one of these chests is hidden the artifact the burglar came for. In other
containers there may be keys that open other locked containers (figure 2.5a) or
doors (figure 2.6a).
(a)
(b)
(c)
Figure 2.5: a) locked container, b) closed container, c) opened container
2.4.3
Door
Doors like the one in figure 2.6 are connecting the rooms of the building. They
may be locked (figure 2.6a), in which case the guards and the burglar require a
special key to get through them.
(a)
(b)
(c)
Figure 2.6: a) locked door, b) closed door, c) opened door
2.5
Items
There are three types of items hidden in the inventory of our burglars, guards,
or in the containers: keys, treasure and uniforms.
The keys can be used to open locked doors (figure 2.6a) or containers (figure 2.5a).
One of the treasures hidden in containers is the one the burglar is looking for.
The uniforms can usually be found on guards, but the burglar also can make
good use of them.
2.6
The player
The player can enter into the role of a mysterious ghost entity. This creature sees
and knows everything in the building; it is also able to changes the world to its
liking through manipulating physical objects. The ghost’s only weakness is that
he can’t directly influence the mind of the world’s inhabitants.
9
3. Gameplay
The program is a puzzle game, where the player’s task is to foresee, and respond
the actions of planner controlled agents, and act accordingly to lead them without
their knowledge or any direct control towards a selected aim.
3.1
Running the game
Our program requires a Java executing environment; it can be started as a shell
script. For the list of accepted parameters see subsection B.1 in the use documentation (Appendix B).
3.2
Game mechanics
The basic situation in all game levels is that the burglar – a computer controlled
agent – tries to steal a valuable artifact from a museum secured by an unspecified
number of guards and cameras, which, if turned on, detect the burglar. The
complication is that the burglar knows only some of these dangers. The levels
are designed in a way that without the help of the player the burglar will surely
be caught. The burglar is captured only if he gets into the same room with a
patrolling guard or with an active camera. In the rest of the paper such places
will be called trap rooms. The only agents in the game are the guards and the
burglar.
The human player observes the game world from a bird’s eye view. His role is
to change small details of the environment and prevent the burglar getting caught
on his mission. The player wins when the computer controlled burglar manages
to run away with the artifact, he looses if the protagonist gets caught.
3.2.1
Game turns and environment
We define game turn as the shortest perceivable time quantum in our world. In
a single game turn each acting agent executes a single atomic action (for a list of
these see section 5.5).
The game area is the place where agents are acting out their plans in interaction with the objects of the environment. The layout is split to rooms that are
connected to each other through doors. The rooms can be further divided into
rectangular tiles, which we will call game positions. Each game object and agent
is situated on one of these positions. A single tile may contain an operable object,
or a walkable one (for example a floor). On walkable objects additional agents
may stand, and in case they are immediate neighbors characters may movement
between them. Each such step requires a single game turn.
The objects we may find in the rooms are cameras that catch intruders; containers that can hold keys; an artifact, that is the target for the burglar; vending
machines, that distract the agents for a while; and finally doors between rooms.
If a burglar or a guard has a proper key, he can lock a door or a container to
prevent their usage by other agents. The player has no such limitation on any
10
object; however, each interaction costs him a price in penalty points which he/she
should keep minimal (see Context menu and Player penalties subsections below).
3.2.2
Victory condition
Winning a level we define as enabling the burglar to complete its goal. In other
words, a level is won, when the burglar successfully returns to the door of its
starting room, with the desired artifact in his possession.
3.2.3
Losing a level
The burglar is caught if he gets into the same room with an active camera or a
patrolling guard.
Losing a level can either happen if the burglar is actually caught; or if he runs
out of ideas and gives up its attempts to complete the theft. Note that there is no
necessary connection between the burglar knowing of no available path towards
its desired treasure, and the actual nonexistence of such action sequence. Part
of the player’s task is to manipulate the character into discovering alternative
solutions.
3.2.4
Intent visualization
While looking at the game area the player always sees the true state of the world.
We use color filters and plan visualization drawn over the world layout to let the
player see an agent’s beliefs.
(a)
(b)
(c)
Figure 3.1: a) a known vending machine, b) an active vending machine with
mistaken details, c) an unknown active vending machine
Color filters: We have three different shades to represent the knowledge of
the selected agent. To represent the agent’s beliefs about a given object we use
these filter on it. Objects that match in all their relevant details with the beliefs
of the agent are drawn without discoloring (figure 3.1a). Objects known by the
agent, but differing in some details from the agent’s remembered image we draw
in a slight gray tone (figure 3.1b). Finally objects that’s existence is not known
by the agent are drawn with a dark gray shade (figure 3.1c). Objects that only
exist in the mind of an agent, but not in the game world are not expected to
occur in the game; if they did they would be invisible to the player.
11
Figure 3.2: Burglar’s intent to approach a vending machine
Plan visualization: A series of black arrows (figure 3.2) represent the planned
movement actions of the agent, other intents (e.g. opening or closing something)
are marked with the respective word near the object. The length of the visualized
plan can be increased, or decreased by the player.
3.3
Player penalties
Each of the user actions described in sections B.4 and B.5 which affects a game
object (e.g. activating/deactivating an object), and not the gameplay itself (e.g.
changing game speed) has a penalty value. These penalties are displayed next
to the action’s name in parentheses. After winning or losing a level they are
summed up to help differentiating between alternative solutions. We consider a
victory to be better than another one, if it contains less,or more subtle actions.
This is reflected in smaller penalty values.
3.4
Tutorial
We made five introductory levels that show the main aspects and mechanisms
of the game. These levels can be found in the program’s default map directory
named as “tutorial 01”,. . . “tutorial 05”.
1. Opening doors – It teaches the players how to manipulate with the game
environment. Solving the level requires unlocking of at least two doors. It
also demonstrates the importance of agent beliefs; if the agent does not see
the newly opened door, it will not use the new path.
2. Avoiding locations – On this level the players are introduced to the concept of disabling cameras where they are unavoidable and locking doors to
change the burglar’s path where cameras can’t be avoided. It also shows
the importance of different object shadings, and hints the player how to
increase the visible plan distance.
3. Avoiding moving guards – On this level the players learns how to switch
between the plans of multiple agents. They must plan ahead and anticipate the agents’ future movements in order to let the burglar slip through
unnoticed. We introduce the vending machines.
4. A guard needs some sleep – This level brings together both cameras and
patrolling guards. The players see how a guard can be dazed and that the
burglar is able to slip through camera protected rooms in a guard uniform.
12
5. Hidden keys – The last tutorial level illustrates the use of keys, and how
agents can open doors for themselves, if the player manages to guide them
to those keys. The layout is the second largest we have created so far.
3.5
Gameplay example
In this section we provide a step by step demonstration of the gameplay on a very
simple level: “map - paper.xml”. It is a one of our smallest layouts, but contains
many of our gameplay objects. A stationary guard, an active camera, a locked
door with the appropriate key hidden in a container, a burglar and of course a
treasure to find. The trap positions are placed knowing the order of preferred
paths generated by the agent, so the player has to make at least a single expensive
or two more subtle changes in the environment to successfully solve the level.
The burglar starts out at the entrance of the building, which is also the
place he needs to return at the end of
the level. It is a good idea to look up
every trap position, to help the player such rooms are well light, while the
rest of them is dark. On figure 3.3a we
switch on intent visualization for the
burglar through its content menu.
Figure 3.3b shows the intent of the
burglar as a series of black arrows
and its beliefs as color filters. From
this screenshot we may see, that the
protagonist does not know about the
guard in the left side path neither the
camera in the middle one. A lighter, (a) The burglar’s content menu at the beginbut still visible shade on the right side ning of the level
path’s locked door suggests that he
knows about the door’s existence, but
is mistaken about some details. The
coloring does not give away the exact
nature of the burglar’s mistake, but
knowing a door object’s possible properties we may conclude that he either
thinks it’s already opened or simply
just unlocked.
Looking at the intent line we may
see that the agent plans to reach the
container using the middle path leading him through a camera protected
(b) The burglar’s beliefs and plan visualised
room.
Figure 3.3: example solution (continued
on the next page)
13
Figure 3.4a shows the level after we
have locked the first door on the middle path to prevent the burglar from
entering an observed room. Note that
the color of the door darkened, and the
agent’s predicted path did not change
yet. He has to reach the door first to
discover that his original plan is not
executable any more.
Figure 3.4b shows the updated situation after the burglar reached the
door locked on figure 3.3b. Now he
already knows the middle path is inaccessible, so the agent has generated
an alternative solution through the left
side room that unfortunately leads him
to meet a guard.
Figure 3.4c was taken several game
turns after the previous one. In between we locked the door leading to
the room on the left and the agent had
to reconsider its actions. It choose the
only remaining path leading through
the rooms on the right.
(a) Door locked to force the burglar to an alternative path
(b) New path selected through the guard protected room
(c) The burglar going through the only remaining path to the treasure
Figure 3.4: example solution (continued
from the previous page)
14
Figure 3.5a shows the burglar as
it returns for a key after it discovers that the door on the right was all
along locked with the key he happens
to know is in the upper right container.
Figure 3.5b shows the burglar returning to the starting position after
it has successfully picked up the treasure it was looking for.
Figure 3.5c shows the successfully solved level with the summed up
penalty points.
(a) The burglar returning for a key
(b) The burglar on its way back to the level
entrance
(c) The level is successfully solved
Figure 3.5: example solution (continued
from previous page)
15
4. Problem definition
The following chapter is strongly based on a technical report titled “BurglarGame: Planning Used In A Level Design And During A Gameplay” [5] written
in cooperation with Rudolf Kadlec.
In order to formally define the planning problem solved in our game we will
use following notation:
M = hRooms, Doorsi is a planar graph representing the map of the level
where Rooms is the set of all rooms and Doors is a the set of all door objects in
the layout. A state of each door ∈ Doors is given by a function Ds : Doors →
{locked, unlocked, opened};
Objects is a set of operable objects; each object o ∈ Objects can also have
its internal state given by function Os : Objects → propositions about objects.
In the implementation of our game for example we have containers that can be
closed or opened ; sleeping guards that may or may not have uniforms; or cameras
and a vending machines that may be active or disabled.
The set of active agents we define as Agents = Guards ∪ {burglar} where
Guards is a set of patrolling guards and burglar marks the single burglar agent.
Each agent in Agents is specified by its starting room, planning goal and knowledge of objects’ states, we call this the agent’s belief base.
E = Objects ∪ Agents stands for a set of all entities in the level; function
EtoR : E → Rooms assigns each entity to one room; A stands for a set of possible
instantiated player’s actions on a member of E like deactivating a camera object
or locking a door. The full list of such action types is described in section B.5.
Now we can define a world state on map M as S M = hE, Os, EtoR, Dsi ∈ S M ,
where S M denotes a set of all possible world states on the given layout M.
M
Based on the above definition we will set Sburglar
∈ S M as the state of the
M
world believed to be true by the burglar, and Sreal
∈ S M as the real state of the
world perceivable by the human player.
M
M
A game level L we define as a triplet hM, Sreal
, Sburglar
i. Predicate f lawed(P, S M )
is true if execution of a plan P in a state of the world S M leads to a situation
when the burglar is caught. In the game such situation occurs when the burglar
and a patrolling guard
or an active camera occupies the same room, formally:
true P results in ∃camera : (Os(camera) = active ∧
EtoR(camera) = EtoR(burglar)) ∨
f lawed(P, S M ) =
∃guard : EtoR(guard) = EtoR(burglar)
f alse otherwise
Such rooms where the burglar gets caught we will call trap rooms.
From now on we omit the upper index in S M since we take M as fixed.
To capture the effect of the player’s actions on the burglars knowledge, we
′
define an auxiliary function H(Ā, Sburglar ) = Sburglar
. It models a situation where
the player executes actions Ā ⊆ A and due to theirs effect burglar updates his
′
belief to the new state Sburglar
.
Now we can define the problem that has to be solved in design time as finding
at least some solutions of function F : S × N → 2S . F gets an initial world state
Sinit and a number of pitfalls n both provided by the designer and outputs the set
of world states that can be known to the burglar at the beginning of the game.
16
We will now discuss the definition of function F that is implemented by our
prototype:
Sburglar ∈ F (Sreal , n) ≡
∃P : ¬f lawed(P, Sburglar ) ∧ f lawed(P, Sreal ) ∧
′
∃Ā ⊆ A : H(Ā, Sburglar ) = Sburglar
∧
(4.1)
(4.2)
|trapRoomsP resentInP lan(P )| = n ∧
′
∃P : ¬f lawed(P ′ , Sburglar
) ∧ ¬f lawed(P ′ , Sreal )
(4.3)
(4.4)
′
We require that there is a plan that seems to be solving the task given the
burglar’s initial knowledge, but it contains pitfalls in reality (Condition 4.1),
further there must be the user’s actions that make the burglar change its belief
′
base to the new state Sburglar
(Condition 4.2), the number of trap rooms in the
plan must be n (Condition 4.3) and there must be the new plan P ′ that is without
′
trap rooms both in Sburglar
and Sreal (Condition 4.4).
It must be noted that our definition guarantees only the creation of traps,
but not required user’s actions. It can lead to creating levels where the initial
burglar’s plan contains n pitfalls, but it can be solved with just one user’s action
′
leading the burglar to the plan P ′ that is solving correctly the problem in Sburglar
.
For the details of our implementation see chapter 6.
17
5. Agent control
Agents are autonomous entities acting in the game environment. Their decisions
are determined by their perception of the environment, and by their decision
making mechanism.
The definition of an ideal rational agent by Russell, and Norvig [29] states:
“For each possible percept sequence, an ideal rational agent should do
whatever action is expected to maximize its performance measure, on
the basis of the evidence provided by the percept sequence and whatever
built-in knowledge the agent has.”
In our case, we defined performance measure in the traditional way: the best
plan is the one containing the least actions (for more about performance measure,
and results, see subsection 5.9.2).
Agents can be classified into three categories based on their decision making
behavior. They can be reactive agents, deliberative agents, and finally hybrid
agents. A reactive agent is the simplest type, it has no foresight, in fact not even
memory, nor an explicit goal set is strictly necessary. It processes environmental
input from its sensors, and produces a direct effect in reaction. Deliberative agents
on the other hand necessarily require own internal world state representation, and
clearly defined goals to achieve. They use this knowledge to construct multiple
step plans to fulfill their goals. A hybrid agent is a mixture of the previous two
types that follows an own action sequence, but directly reacts to some external
events without deliberation.
As it will be visible from section 5.8.2, our agents are hybrid ones, primarily
working as deliberative agents, with occasional direct reactions for the sake of
playability.
Figure 5.1: Agent model used in our program
18
Our agents lack learning. Their decision making is limited to the hard coded
set of possible actions; but the combination of these effectors is just limited by
the capabilities of the used planning system, and the available knowledge of the
environment.
In the following we detail the workings of our agent model displayed in figure 5.1.
5.1
Goals
Our agents have a primary and secondary desires.
The primary goal is to visit any active vending machine as soon as they see
it.
The secondary goal sets in our case were implemented in a very straightforward
way. It’s virtually the only difference between our two agent types (burglar and
guards). The burglar’s goal is to gather a selected item (a treasure), and a desire
to return to a predetermined room. The guards have a set of places to oversee;
their goal is to visit each of these rooms once, then start the patrol again.
It is worth to note, in the final version guards have no desire to catch or
follow the burglar. Interaction between agents is managed completely by the
game environment.
5.2
Knowledge Base
“A system is autonomous to the extent that its behavior is determined by its
own experience.”[29] Our agents are not completely autonomous. They start
each level with some predetermined knowledge of the environment, defined in the
map file. The flaws in this belief guarantee that the burglar will need the player’s
assistance to complete the level. However as they explore the game world, they
extend, and update their initial beliefs, and become more and more autonomous.
In our implementation the knowledge base contains information about the
locations of other known agents, and objects. It also notes the holding entities of
all items, they have seen so far.
The known objects are grouped into two categories: the ones examined closely,
and the others seen from a distance (explained in section 5.4). This importance
of this lies in that this way the agent may have an idea, how credible certain
details of his beliefs are.
5.3
Self awareness
Our agents have very little self awareness. Their knowledge is limited to their
position in the world layout, and the contents of their inventory. In our kind of
game this is perfectly enough, and if necessary, it could have been easily extended.
19
5.4
Sensors
Our agents have two ways of perceiving their environment. The senses involuntarily update their knowledge base in each game turn (see subsection 3.2.1).
One perception, that we shall call close examination grants the agent complete
and correct knowledge of every detail of the object he is currently operating with
while the other one that we shall call looking around lets the agent notice an
incomplete set of details of the surrounding objects. An object has to be in the
same room with the agent to be perceived. There is no way for an agent to see
anything beyond the borders of the currently occupied room.
5.5
Effectors
Interaction with the agent’s environment consists of one of the following actions:
• enter – enters a door leading into another room
• approach – moves near to a selected object, or agent
• lock – locks a door, or a container with a key
• unlock – unlocks a door, or a container with a key
• open – opens a door, or a container
• close – closes a door, or a container
• pick up – picks up an item from an opened container, or from another agent
• use – uses a vending machine
In each game turn exactly one of these action is accomplished, with the exception
of approach. That is always broken into sub-actions before execution based on
the layout of the current room. This means that the rooms are divided into
game positions and the agents are moving between such neighboring positions
(see section 3.2.1).
Actions directly connected to an object can be successfully completed only
if the agent is standing next to or on the position that the particular object
occupies.
It must be noted that there is no explicit examination action. Sensors are
invoked automatically after each step of the agent.
5.6
Environment
In this section we classify our environment from the agent’s perspective by the
world definition found in [29].
To our agents our game world is inaccessible, meaning their sensors do not
provide full access to all the relevant data in any given moment. The agents need
an internal representation of it and memory to remember previously accessed
details. The environment is also nondeterministic and dynamic; with other words,
20
actions may fail and the world state may change without the contribution of the
observing agent; for example by acts of other agents or the player. Finally the
world is discrete, which means that the game is broken into turns and each agent
has a fixed set of possible actions in each of these turns until the level is won or
lost.
As opposed to our program, most other games provide an implicit layout
knowledge to their agents. In our case the agents start out with a basic knowledge
that some rooms make up the world, but they have no information about their
position nor content. They start with a belief large enough to find some path (in
the burglar’s case a flawed one) to their intended destination and the rest may
be revealed only through exploration.
5.7
Decision making
As we mentioned before, we set out to write a game with extensive use of planners.
This choice determined the type of decision making process we ended up using
in our program. Planning systems are capable of creating seemingly rational
sequences of actions without the developer ever considering them. On the other
hand, on-line use of planning during the actual gameplay is limited by real-time
requirements (to read more about this problem and our proposed solutions see
section 5.8.4).
In the followings of the chapter we go into detailed descriptions of the problems
we faced while trying to match the needs of our agents with the problem resolving
capabilities our planners provide.
5.8
Translation to planning problems
There are some differences between the game world perceived by the agent, and
the one forwarded to the planner. Figuring out this translation was a major
challenge in the development process. Giving more than absolutely required
floods the problem file with irrelevant data that slows down the planning systems
and reveals their shortcomings (see sections 5.8.4 and 5.9.2).
In order to avoid or at least postpone these effects we made three major
simplifications.
We decided to plan on the level of significant positions (doors, containers,
. . . ); this means that the planner does not perceive distances as they are in the
real world and may generate seemingly illogical behavior (see section 5.9.3).
A similar simplification is connected to the agent’s senses. We decided to
omit viewing angles and viewing distances; and limit the agent’s sensors by the
borders of the room it accommodates.
To avoid the task of planning in an environment that supports durative actions
we made another simplification. The planner sees the world as a set of static entities. If finds an inconsistency between its beliefs and the nondeterministic world,
he simply replans as described in section 5.8.2. We favored this method because
each of the planning problems are relatively simple and relatively fast to resolve;
in addition in our type of program agents are expected to frequently reconsider
themselves so the quality of the resulted action sequences is still acceptable.
21
5.8.1
Planning for multiple agents
At the beginning of our experiments with behavior generating it seemed to be
a great idea to let the planner create action sequences for multiple agents in a
single run. In the gameplay each agent is a different resource, that is acting
simultaneously, and we also needed to plan for agents whose goals are in conflict
with each other. However the classic planning systems couldn’t cope with these
requirements.
The planner was requested to create action sequences for our agents with plans
meeting in a selected room, but our trials resulted in unconvincing behavior. With
different strategies we received different, but equally flawed action sequences; for
example a burglar patiently waiting for a guard to get caught, or a guard who
appears in the game area with perfect timing to surprise the burglar, or a burglar,
who for seemingly no reason turns around to walk to the nearest guard.
From these experiments we arrived to the conclusion, that at least the two
opposing agent types must be planned separately. In that way we can generate
a seemingly rational action sequence for the burglar, and in a separated run we
can plan for the guards with the knowledge of the burglar’s course.
For the sake of simplicity in the final version we completely abandoned the idea
of simultaneous planning, and we do it for a single agent at a time. Cooperative
guard planning may be an interesting future improvement to the program, but
that would take us to the fields of multi-agent systems.
5.8.2
When to change plans
The question – when do the agents change their plans – was crucial in the development. There are two causes that may trigger a planning event, these are the
agent’s primary and the secondary goals.
The primary goal is triggered when the agent discovers an active vending machine. This event forces him to immediately approach the object and deactivate
it. In order to ensure this behavior we had to invalidate the agent’s previous plan
and generate a new one containing only the primary goal. Simply extending the
original goal-set frequently results in unacceptable results like the agent leaving
the room then later returning to the vending machine. After the primary desire
is satisfied, agents return to the secondary ones.
While striving to reach its secondary goals the agent may choose to change
plans in two situations: when it failed, or when there is a better solution available.
When the plan failed : At first we define the meaning of a plan failure as
an instruction in the agent’s action sequence, that the he is unable to execute
(the executing function returns failure). It happens when the expected results of
the action are differing from the actual results. For example opening an already
opened door would not be a plan failure, while opening a locked one would.
Presuming an optimal action sequence such failure makes renders continuing plan
execution impossible – all following actions would return failure. In practice our
plans might be suboptimal and the failed action might have been unnecessary,
but we have no simple way to tell whether it is so. In these cases we have no
other choice but to terminate the old plan and and generate a new one.
When there is a better solution available: Such situation may occur for three
reasons: the original plan was not optimal or the agent gathered some additional
22
knowledge or the world changed in a positive manner.
Finding a more optimal solution could be achieved by repeating the planning
process with an additional parameter to set the requested plan length to be shorter
than the original one. Unfortunately not all of our planning systems support such
parameters, so we did not implement this behavior.
From the agent’s point of view gathering additional data and a favorable
external change in the world state is the same. Both manifest as an update in
the belief base. Comparing new information with the premises of our existing
plan may reveal shortcuts; for example finding an unlocked door might mean
that we can skip a plan section to go and retrieve the key that would unlock
it. On the other hand there may be shortcuts that cant’t be identified without
replanning. Such situation would be finding an opened door leading the agent
directly to a room that he expected to reach by crossing several others. In this
case there is no part of the original action sequence that we can simply skip, we
need completely new actions namely approach the door and enter it that would
replace the walk around section.
Our agents are implementing two of the above mentioned behaviors. They
update their plans when failing to execute an action, and they are capable of
replanning when new information is available to them. For reasons explained in
chapter 6.2 their default behavior is to follow their original plan until a failure
occurs. On activating the alternative planning rules see section B.1.
5.8.3
Plan execution and replanning
An agent simply follows actions of the computed plan and checks whether it
should generate a new one in every game turn. The method used to decide
whether the agent has to replan is described in Algorithm 1.
Algorithm 1 One game turn of a single agent
Require: P — plan that is being executed
Require: Sagent — agent’s prior knowledge of the level
Require: replanOnNewKnowledge — flag to replan if recieves new information
Ensure: the agent executes a step and replans if necessary
1: actionResult ← executeNextActionFromPlan(P )
′
2: Sagent
← Sagent
3: Sagent ← updateBelief(Sagent )
4: activateP rimaryGoal ← seesNewVendingMachine(Sagent )
5: if activateP rimaryGoal then
6:
P ← replanPrimaryGoal(Sagent )
7:
return
8: end if
W ′
9: if actionResult == failed
Sagent ⊂ Sagent then
10:
P ← replanSecondaryGoal(Sagent )
11: end if
In Algorithm 1, the agent at first attempts to execute an instruction from its
list. If the plan is empty or the action fails the agent will have to generate a new
plan. After each executed step the agent explores the surrounding environment
23
and if spots an active vending machine drops the action sequence and replans
with the primary goal. Finally if the user requires it the agent replans upon
finding new information.
5.8.4
Time requirements
The length of time required to determine the next action is an important factor
while selecting a controlling mechanism for the agents. What’s probably even
more important is the consistency of the time requirements.
From this point of view planning is unreliable. In our case an average planning run requires about 150 ms, but if the task is particularly difficult, planning
continues way beyond that period. Sometimes it takes so long, that for the sake
of the player the planning process has to be canceled by the main program (we
choose this terminating limit to be at 8 s). In these worst case scenarios, where
we have to cut the planner short, we do not receive any action sequence to guide
the agent. We can’t even tell the difference between a problem that has no solution, and a problem that’s solution requires more time to be calculated. In these
situations we presume that no solution exists and the agent gives up trying. In
case it is a guard, he becomes immobile for the rest of the level, while if it’s the
burglar the level is declared to be lost.
On the other hand planning is not a regularly repeating task, and each successful planning run produces a full sequence of actions to the end of the level. If
nothing goes wrong in the execution, and no new information is received, there
is no further need to replan, only execute the produced sequence.
In the followings of this section we will show four methods how we considered
to cope with the time requirements of planning. All the described approaches are
intended to cope with the replanning that occurs at plan failures.
Our idea to hide or at least shorten the waiting period was to complete the
replanning while the game was still executing the previous action sequence up
until the failure point (where the agent will need the new plan).
As a heuristic solution it would be possible to compare the agent’s freshly
constructed plan (based on its current beliefs) to the current world state, and
find the first failure point in it. With that knowledge we could initiate an early
replanning process, so when the command execution finally reaches the failure,
we would have to simply insert the new action sequence.
In practice there are two problems with this method that impede its usability.
First, the agents are continuously gathering information through their journey.
Premature planning might fail to take into consideration important data that the
agent has not yet gathered.
Second, the environment is changing dynamically; other agents and the player
himself are actively updating the world state. In the most extreme cases even the
not yet executed actions of the current agent may trigger replanning in different
game characters, whose new action series will in turn affect the world state,
and through that our agent’s beliefs. This dynamic behavior may, and often
does postpone or expedite the failure in our agent’s plan or changes the action
sequence the planner would have generated with the updated knowledge.
A more dependable solution would be to update our early plans as the environment changes. A relatively simple approach would be to apply the above
24
described checking algorithm on each agent after every relevant change in the
game world. We define relevant change, as one, where an operable object updates some property or an agent switches rooms. The weakness of this method is
in the relatively high frequency of such changes; for example a given agent moves
to a different room roughly in every 5 s on an average level and adding more
agents this frequency is even higher.
An improvement on the previous solutions would be to execute a whole world
simulation with all the agents in the background of the program. Using this
method we could eliminate all the non-deterministic factors of the environment
(changes caused by other agents), except for the player himself. We would have
to repeat the simulation process only on the occasions when the player executes
an action.
While implementing level generation features (see subsection 6.1.1 in chapter 6) we have implemented such background simulation, but we ended up abandoning it. It showed to be too expensive of an overhead in our architecture. The
root of the problem was that while a single replanning operation for a single agent
needs to construct a PDDL file, run the external planner, and interpret its result
once, a full blown forward simulation contains multiple replannings for each of
the agents on the map.
In general cases when our planning pauses take about 150 ms they easily fit
into our game turn refresh rate thus trying to hide them is unnecessary; waiting
is imperceivable anyway. On relatively complex levels (for example tutorial05)
waiting for a new plan may be unpleasant. In the worst cases where a single
planning approaches 8 s it’s questionable whether we are able to calculate ahead
long enough in an environment where the primary source of replanning is the
player himself.
With a world representation optimized to support such simulations, probably
the last method would provide the most seamless planner integration into a real
time game environment. Unfortunately however, our game was not started out
with that in mind.
In the current version while the planner is working, we simply pause the game
and no early planning is done. Waiting until the last possible moment with
replanning has the additional feature, that we can be sure, we have to plan only
for a single agent, and the knowledge base we are using is identical to the one
that the agent will have at the moment of failure.
5.9
5.9.1
Characteristics of our method
No planning in time
In our department there were experiments (for example [30]), which included
planning with durative events and timed literals. These requirements greatly
burdened their planning process and knowing that in our semi-realtime environment long waiting periods are not acceptable we avoided any use of such planning
structures.
25
5.9.2
Suboptimal planning
As we mentioned before we define performance measure in length of action sequences.
By this measure planners we are using do not produce optimal sequences.
While the generated plan does solve the given problem, it may contain irrelevant
actions, and/or inefficient orderings of relevant actions. With solving increasingly
complex problems the algorithm-characteristic “flaws” of each produced plan also
gets more and more visible. Linear increase in the complexity of the world state
raises the the number of possible solutions to explore exponentially. This is a
fully anticipate experience; with improved planning algorithms and increasing
processing power the “flaws” may be less sever, but without the guarantee of
optimal planning they will never completely disappear.
On the other hand if we define the performance measure in a way that “plans
should be human-like”, these “flaws” could be called features. They seem to the
user as if the agent was actively exploring its surrounding; however no such thing
is programmed into its behavior.
For example in some levels agents open doors that they don’t intend to go
through, or in more alarming cases they even enter rooms just to return and
continue in some other direction. In our rule-set the first scenario has no dangers:
the guards can’t look through opened doors to discover the burglar, however the
second action may obviously result a lost level. In practice the burglar have never
entered a trap room in this manner, not even on complex levels, nevertheless the
possibility is there and the players should be aware of it.
As a future work, specific cases, like with the burglar’s “explorations” may
be eliminated, by iterating through the planner output looking for loops in the
agent’s path, where inside the loop no “pick up” action occurs.
5.9.3
Planning granularity
In our program we used single level planning that is directly operating with objects and underlying A* pathfinding between those objects. To keep the problem
files relatively small we had to allow some imperfections in the agent’s movement
behavior.
The current implementation has two undesirable features: one on object level,
and another on room level.
On object level : To reduce the number of predicates in the planning problems
we do not require the planner to generate exact paths between objects in the
same room. We have a single action — approach — to reach any position. The
planner has no knowledge of the room layout, so it generates the visiting order
by random.
For this reason the players might sometimes notice that an agent that needs
to visit multiple objects in the same room chooses the order in an obviously
ineffective order. Instead of always moving to the closest one, it moves back and
forth between several locations.
With separating the planning process into map level- and room level planning
we probably could have eliminated this behavior, but its occurrence is rare.
We have a seemingly similar problem with the sizes of rooms. The planner
does not know their dimensions and it always considers a path through a single
26
room to be superior to another through multiple rooms even if the actual path
is longer. This issue could only be addressed by extending the planner domain,
or breaking the rooms to smaller sections, but as visible from the example of
map-chess-board level it would greatly affect the planner performance.
27
6. Creating game levels
One of our aims was to easy the task of the level designers and generate some of
the player challenges automatically. In this section we describe our methods of
finding solutions of function F from chapter 4.
In our design process, the designer specifies the initial state of the world Sinit
and number of pitfalls n ∈ N. Sinit contains a fully specified map of the level
M. M remains fixed in the design process; however, the set of entities E and
theirs assignment to rooms EtoR and doors’ resp. objects’ states (Ds resp. Os)
can be modified. The burglar does not have to know about all these objects and
guards. The output of the design process is the real state of the world observable
by the player, Sreal , the state known to the burglar, Sburglar , and a number of
successfully assigned pitfalls k ≤ n. k is less than n if the algorithm fails to add
all the required pitfalls.
6.1
Generator algorithms
The key to our automated level design is to correctly select the trap rooms. If
accomplished, the exact trap types can be easily selected, the exact algorithm
we describe in section 6.1.3 and the rest of the section will describe our room
selecting methods.
Through the development we tried several different approaches, but most of
them proved to be inappropriate for our purposes. We can categorize our methods
into two groups: One that leads the burglar agent into a trap; and one that places
the traps in the agent’s path.
6.1.1
Leading the agent into traps
In one type of experiment we tried to randomly generate a trap layout, then
through the gameplay lead the burglar through them.
There were two problems with this approach; one from the point of gameplay
and a technical one.
Laying out the traps and forcing the burglar to them may result in unrealistic
paths. Sometimes an agent made an obviously useless loop just to stumble into
a dangerous situation. These situations may be avoided by either limiting the
possible layouts or the places in it that can be used as trap rooms. For example
we can’t allow the agent to discover shortcuts, the level between each perilous
situation must be linear.
The second problem with the method we called counting traps. As the burglar
moves through the level at each replanning we had to know which trap was still
ahead to be visited and which one was already avoided. Deciding whether a trap
was successfully avoided while the agent has still not reached its final goal is
a surprisingly hard task. Traps can be disabled in two basic ways – disabling
the object that would catch the burglar, which can be obviously checked, or
by evading the room containing it, which is significantly harder to spot. We
attempted to solve this problem using 3 different approaches: comparing plan
28
lengths, directly searching for unnecessary room entries, checking the existence of
a plan without being caught.
The basic idea behind all three methods is to discover whether the burglar
made some actions thats only point is to get caught. In our game such actions
manifest as entering the trap room then returning to the previous room without
executing any other significant action (one changing the world state S M as defined
in chapter 4).
Comparing plan lengths: if we had optimal planners (as always in the thesis
performance is meant as action sequence length) than we could requested a plan
through the trap to the agent’s goal and another one without it. If the second
sequence was shorter we would know, that the player has already avoided the
trap therefore we should not include it in future planning runs. Unfortunately,
we don’t have assurance that the generated plans are optimal, thus the method
can’t guarantee correct trap creation.
Directly searching for the unnecessary room entries: In this method we would
have been looking for the actual entering and leaving action sequence in the
selected trap room. In practice this method was not implemented because we
abandoned the whole principle of leading the burglar into a trap.
checking the existence of a plan without being caught: The third and last
method we have implemented in the early versions of the level generator. We
were comparing a plan generated with the explicit prohibition of the trap room
and the one that contained neither prohibition, nor requirement to enter the trap.
If the resulted plans matched, we concluded that the observed trap was avoided.
In the final version we completely abandoned this solution. Our methods of
checking the avoidance of traps have frequently failed if applied on more than a
single trap and the resulted agent behavior was far from believable.
6.1.2
Placing traps on the agent’s path
The other approach we tried and proved to be a reliable was to organize the traps
around the burglar’s path instead of changing it.
In the following we describe two algorithms we used in the generator to construct game levels and a further improvement we proposed.
The default design algorithm described in Algorithm 2 generates trap rooms
that can be avoided without truly disabling them by the player. It uses the
planner to determine the expected action sequence P of the burglar in Sinit that
reveals the set of rooms the agent plans to visit. In the next step, it filters out
the rooms that the burglar is unable to avoid, like graph choke points, or rooms
that hold objects necessary for completion of the burglar’s mission; these rooms
will not contain traps and the rest of the rooms we call possible trap rooms. From
this reduced set it generates all subsets with the size smaller or equal to the
requested number of traps. Finally it uses the planner to find the largest of these
subsets that can be removed from Sinit while there still exists a P ′ plan solving
the Sreduced reduced map. If such subset is found, it can be returned as the set of
trap rooms.
The reason why such a design method works is based on the deterministic
behavior of the used planner. After diverting from his original plan and thus
avoiding a single trap, the agent will try to return to his original path. This of
29
course presumes that the planner used at the design time is the same as the one
used through out the gameplay itself.
Algorithm 2 Level design using combinations
Require: Sinit — initial world state by the designer
Require: n — number of required pitfalls in the level
Ensure: trap rooms are assigned to the world state
1: P ← createPlan(Sinit )
2: rooms ← allRoomsIn(P )
3: possibleT rapRooms ← rooms\ unavoidableRoomsIn(P )
4: for k = n to 1 do
5:
for all trapRooms ⊆ possibleT rapRooms ∧ |trapRooms| = k do
6:
Sreduced ← Sinit without trapRooms in layout
7:
if ∃P ′ : P ′ = createPlan(Sreduced ) then
8:
Sreal ← Sinit with traps in trapRooms
9:
Sburglar ← Sinit without traps in trapRooms
10:
return hSreal , Sburglar , ki
11:
end if
12:
end for
13: end for
14: return f ailure
Algorithm 3 Level design using incremental trap creation
Require: Sinit — initial world state by the designer
Require: n — number of required pitfalls in the level
Ensure: trap rooms are assigned to the world state
1: Sworld ← Sinit
2: trapRooms ← emptySet
3: for k = 1 to n do
4:
P ← createPlan(Sworld )
5:
rooms ← allRoomsIn(P )
6:
possibleT rapRooms ← rooms\ unavoidableRoomsIn(P )
7:
repeat
8:
trapRoom ← selectOneFromTheBeginning(possibleT rapRooms)
9:
Sreduced ← Sinit without trapRoom in layout
10:
until not ∃P ′ : P ′ = createPlan(Sreduced )
11:
trapRooms ← insert(trapRooms, trapRoom)
12:
f ollowingRoom ← followingRoom(possibleT rapRooms, trapRoom)
13:
Sworld ← Sworld without (possibleT rapRooms \ {f ollowingRoom})
14:
Sworld ← moveBurglarTo(Sworld , f ollowingRoom)
15: end for
16: Sreal ← Sinit with traps in trapRooms
17: Sburglar ← Sinit without traps in trapRooms
18: return hSreal , Sburglar , ni
30
Our improved attempt to generate trap rooms is visualized in Algorithm 3.
In a single step of trap selection the method generates a room sequence and
choses a single avoidable one that appends to the list of traps; then it places the
burglar after the generated trap and prevents him from continuing on the same
path that was planned before. This step is repeated until all the requested traps
are successfully placed.
For the sake of simplicity the algorithm we presented presumes that it is
possible to insert n traps into the layout; furthermore there are n distinct, but
interconnected paths leading to the burglar’s aim, that are not necessarily true.
The one used in practice must cope with these possibilities, but it is based on the
same principle.
The main differences between the two algorithms are manifested in performance and in gameplay experience. The first one has exponential time complexity
based on the length of the burlgar’s path (its execution takes about 5-10 minutes
on the test machine); it goes through all the possible room combinations until a
sufficient one is reached. The other one must cope with the possibility that by
selecting the first several rooms in an inefficient way may leave no place to put
the rest of the traps.
The second important difference is in the user impression. The first method
results in traps chained in distinct line while the rest of the world is safe. The
second algorithm places traps in a more even and less predictable fashion. On
the other hand Algorithm 2 requires the game itself to be aware of the fact that
it must invalidate the burglar’s future plans whenever replanning is necessary.
In practice both algorithms – the second one to a greater extent – suffer from
the possibility that the player may change the environment in an unexpected
way avoiding all the placed traps in a single clever stroke. Whether allowing this
possibility is a bug or a feature depends on the reader.
6.1.3
Trap rooms to game objects
We solved the translation of trap rooms to game objects in a very simple fashion.
We have two types of objects that are perilous to the burglar: cameras and
guards.
Cameras are static, they can be placed into any room to turn it into a trap.
Guards on the other hand are mobile, so a single one may cover multiple
neighboring rooms. Their weakness is planning in time; their decision making (a
planning system) acts as if the world was a static environment. To avoid leaving
rooms temporarily unprotected and letting the burglar slip through without the
player’s intervention we defined guard territories in a way that every room in
them would lay on the path of the burglar as visualized on figure 6.1.
We examine each designated trap room, if it has no neighboring dangers we
place there a single camera, otherwise we add it to the closest guards territory.
6.2
Replanning behavior and level creation
Our agents’ planning behavior was described in section 5.8.2. The reason to prefer
the strategy of pure failure replanning is in the method of our level creation. As
31
Figure 6.1: Selecting guard territory:
a) rooms hard to slip through unnoticed – ideal for guard territory
b) rooms easy to slip through unnoticed
c) the burglar’s path
d) a single room
we described above, we were examining the future path of the burglar using a
single planning process and build our pitfalls around it.
Giving the agent more freedom to reconsider its actions than absolutely necessary is making the trap placement harder. To maintain correctness we had to
simulate the whole agent path with discovering new pieces of information and
the same replanning strategy as the one used in the gameplay; however it still
wasn’t visibly better than the single planning method. The problem was that we
have failed to cope with all the possible actions of the player and the ways that
may directly or indirectly change the behavior of our agents (for example locking
different combination of doors, or influencing other agents that in turn influence
each other, . . . ). Instead of a single world simulation we would have needed a
full world state tree of states S M that is branching by the possible user actions
from each state S M .
Such an extensive spatial search was beyond the reach of our thesis, so in the
final version we use the simplest of the above methods, the single planning run.
To reinforce the validity of its resulted levels in the gameplay previous versions
of the program even prohibited the agents to look around and discover their
environment so they wouldn’t find unknown shortcuts; later however this feature
was returned to the gameplay because it greatly enriches the player’s possibilities.
6.3
Conclusion
In the development process we soon realized that our tested planning systems
are not producing optimal action sequences, deviation from the minimal number of required actions changes based on the current problem and the used tool.
Furthermore some details in our agent guidance (detailed in sections 5.9.2 and
5.9.3) makes our level creation vulnerable to seemingly insignificant implementation details of the planners (for example choosing between equivalent paths or
action ordering, . . . )
At first we tried to request multiple solutions from the planners and at least
32
select the best solution from the results, but we found this process time consuming
and superfluous. We do not need the planning systems to produce optimal action
sequences only that their new plans are consistent with the previous ones. This
can be achieved by using the same tool in both the level generation and through
out the gameplay.
Finally we must note that all the above described algorithms are producing
traps that are in principle avoidable. In the later stages of the development we
introduced user actions that can simply disable them and the burglar needs no
route around them. Our program is a puzzle game, there is very little player
challenge in disabling unavoidable traps, so the creation of such traps had no
priority in our work and ultimately it was left to the human designer.
6.4
Running the level generator
Our program requires a Java executing environment; it can be started as a shell
script. For the list of accepted parameters see Appendix C.
33
7. Implementation
The game has been implemented as a Java 1 application. It uses an external game
engine Slick 2 which is a 2D game library based on LWJGL3 OpenGL4 binding for
Java.
In the interface we were using a modified version of the Sticky 5 button library.
For the required path finding tasks we used a Java implementation of the A* 6
algorithm.
The planning systems of our game can be found in chapter 8. The default one
is SGPlan [28] that can be replaced with any STRIPS-like planner, capable of
solving problems in PDDL 2.2 [10] with a few modifications in the source code.
In fact there is already an interface that can work with several different planners.
The planners were used through a PDDL domain and problem file. The
domain is a centralized in the whole game, and the problem file is dynamically
generated based on the current game situation. The output of the planning
system is system specifically parsed and translated into agent action.
Our program is also capable to use Planning4J 7 which is a Java API for
connecting various AI planners; this tool was recently developed at AMIS group 8 ,
Faculty of Mathematics and Physics 9 , Charles University in Prague.
As a build automation tool we used Maven 10 .
The program was written in a way to be system independent; however the
majority of the tested planners support only UNIX-like systems, so the use of our
program is limited to this platform family.
1
See more at: http://www.java.com/en, (10.02.2012)
See more at: http://slick.cokeandcode.com, (10.02.2012)
3
See more at: http://lwjgl.org, (10.02.2012)
4
See more at: http://www.opengl.org, (10.02.2012)
5
See more at:
http://www.anotherearlymorning.com/2009/01/slick-gui-with-stickybuttons, (10.02.2012)
6
See more at: http://robotacid.com/PBeta/AILibrary/AStar, (10.02.2012)
7
See more at: http://code.google.com/p/planning4j, (10.02.2012)
8
See more at: http://amis.mff.cuni.cz, (10.02.2012)
9
See more at: http://www.mff.cuni.cz, (10.02.2012)
10
See more at: http://maven.apache.org, (10.02.2012)
2
34
7.1
Program structure
Figure 7.1: The simplified diagram of our program’s architecture
As represented in figure 7.1 the user interacts with the program through a graphic
interface that connects him with the Player object implementing the actions he
is capable to execute in the game world. The GameMap is a container holding
the world layout and the acting agents. Planning is completed in a dedicated thread (PlanningThread) that connects to external planning system through
several possible interfaces.
7.2
Availability
The game prototype is open-source and it is available for download on its homepage: http://code.google.com/p/burglar-game.
35
8. Planners
In the following we briefly describe the planners we used in our experiments and in
our program. In most cases we used these tools like a black box and tested them
on a single problem type, namely the one found in our game. In all cases they
were accessed through two PDDL input files and their output was also parsed as
a text file.
8.1
Requirements
As the way to communicate with the used planning systems we choose PDDL
2.2 [10] (Planning Domain Definition Language). Writing to a file and passing it
to a different program is slower than direct memory sharing, but choosing differently would greatly limit our alternatives. We were looking for tools supporting
the PDDL requirements STRIPS and typing.
STRIPS (Stanford Research Institute Problem Solver) type domain definition
that is the most basic and widely implemented notation of PDDL.
Typing enabled us to group PDDL objects into types to improve both readability and decrease the number of required predicates.
In several cases, like in generating the required number of problem situations
it would be useful to be able to use numerical quantities, but in order to keep the
planning streamlined we constructed our domain without it.
Unfortunately, we found that most of the planners available to us, that also
supported our requirements were only available on UNIX-type systems. For this
reason the tests we conducted happened in Linux.
8.2
Domain and problem sizes
Except for the early development of the stages of the program we used a single
unified planning domain – agent.pddl – that can be found in the appendix as
figure D.1.
As mentioned above, it uses two requirements: STRIPS and typing. It has 20
types, 12 predicates and 10 actions.
While our program is running we are using two problem files – burglar.pddl to
describe problem definitions for burglar typed agents and guard.pddl to contain
problems for guard agents. These files are updated as required to reflect the game
situation in which we need a new plan. An average planning problem has around
70 PDDL objects and 170 initial facts. The number of goals depends on the agent
type. In case of a burglar it is always 3, for a guard it’s usually about 4, in level
creation it depends on the number of required traps.
36
8.3
Tested planners
The planners we have tested on our problem domain so far are the following:
• Blackbox [19]
• FF [20] (Fast-Forward)
• HSP [22] (Heuristic Search Planner)
• LPG [23] (Local search for Planning Graphs)
• LPRPG [24] (Linear Program Relaxed Planning Graph)
• Marvin [21]
• MaxPlan [25]
• Metric-FF [26]
• MIPS-XXL [27] (Model Checking Integrated Planning System)
• SGPlan [28] (Subgoal Partitioning and Resolution in Planning)
8.3.1
Blackbox
A planning system based on converting problems specified in STRIPS notation into Boolean satisfiability problems and solving them with a variety of satisfiability
engines.
It’s available in both Windows and Linux.
In our tests we used Linux version 43.
The planner failed to parse the domain due to actions containing “or” precondition. After breaking the problematic actions into multiple ones, the tool
terminated on our simplest test problem with no solution in 19 milliseconds.
8.3.2
FF
It is a forward chaining heuristic state space planner. It can handle classical
STRIPS- as well as full scale ADL planning tasks.
In our tests we used version 2.3.
The planner was capable of solving our problems. It proved to be the fastest
on our tests.
8.3.3
HSP
It is a planner based on the ideas of heuristic search. It uses forward search with
heuristics and plan length estimates.
We used version 2.0 for Linux.
The program reported that it had found no solution to our test problems.
37
8.3.4
LPG
It is a planning system based on local search and planning graphs, capable of
solving numerical quantities and durations.
In our tests we used the LPG-td version.
The planner managed to process both the domain and the problem files, however it failed to terminate on our simplest test problem by itself in any reasonable
time (10 minutes), constantly reporting:
".......... search limit exceeded. Restart."
8.3.5
LPRPG
At is a heuristic forward search planner similar to the Metric-FF, but with different handling of numbers. Numeric effects of actions are translated into constraints
in a linear program while constructing the reachability graph.
In our tests we used the version seen on the 2011 International Planning
Competition (IPC-2011).
The tool reported a type error, most possibly LPRPG does not support entities having multiple types simultaneously.
8.3.6
Marvin
It is a planner with action-sequence-memoization (avoiding repeated calculations)
to generate macro-actions, which are then used during the search for a solution.
In our case it failed to produce a valid solution to our test problems.
8.3.7
MaxPlan
A SAT solving planner that decomposes the original problem into a series of
subproblems.
The tool failed to parse our domain file. It terminated with the following
output:
"unknown type ’T_AGENT’"
8.3.8
MetricFF
It is an extension of the FF planner to numerical state variables. It is PDDL 2.1
compatible.
We tested the Linux version that successfully solved our test problems. In the
game itself there is a working MetricFF interface.
8.3.9
MIPS-XXL
It is a planner with PDDL 3 support. It supports the exploration of state spaces
that are much larger than the available RAM.
We tested the 2006, and the 2008 versions, both versions managed to solve
our test problems.
38
8.3.10
SGPlan
SGPlan is a planning system utilizing subgoal partitioning and resolution. This
means splitting a large planning problem into subproblems, each with its own
subgoal, and resolving inconsistent solutions of subgoals using extended saddlepoint conditions [28].
The program is PDDL 3 compatible. On the downside its codebase is not
opened and the program is only available to UNIX-type systems.
We tested both version 5.22, and 6. They were the most prevailed planners
through the early development, so in the later stages SGPlan became the one,
that our program uses by default. One of the useful features of the SGPlan
implementation is not giving up “too quickly” on complicated problems. This
however also means that sometimes through the gameplay we had to terminate
the planning process for the sake of the player.
We have found that the PDDL processing in SGPlan has some unusual characteristics. Defining the existence of action description language (ADL), and
Preferences (soft goal definition) requirements helps to find shorter plans without ever using the required abilities in the domain or the problem file. A less
important fact is that the PDDL parser seems to be less strict, than its most
counterparts. It allows the existence of undefined types in predicates, if those
predicates are never used in actions or in problem definitions.
8.4
8.4.1
Comparison
Test conditions
We have gradually done multiple tests through the development as our domain
changed, however these tests no longer correspond with the current state of the
program, so here we will describe ones done with the final domain version.
At first we took a level from our slightly larger ones (map-complex.xml) that
has 28 rooms, 32 doors, a single agent with full knowledge of the environment
and repeatedly tested a full burglar planning problem on it. The level layout and
the plan generated by the default planner can be found at the end of the thesis in
figure E.1. Translated to a PDDL burglar problem the level contained 74 PDDL
objects, 167 initial facts and 3 goals. In the following we will Test 1 and list the
results in table 8.1.
In the next series of test we used the same level as previously, but without
an important room (room 15) in the domain, so the agent needed to use a longer
path. The required alternative path generated by the default planner can be
found at the end of the thesis in figure E.2. The simplified problem contained
71 PDDL objects, 157 initial facts and 3 goals. The test results can be seen in
table 8.2 for Test 2.
The third test series was conducted on the same level as the previous ones,
but instead of completely removing a choke-point room (room 15) as in the above
series, we only defined a negative goal (like one we often use in the game) requiring
the agent not to visit the selected room. The room to avoid is shown in figure E.3.
The third problem had 74 PDDL objects, 167 initial facts and 4 goals. We will
call describe the results as Test 3.
39
In the fourth series we used a level that looked like a 10x10 chessboard (mapchess-board.xml). The interesting property of this level was its size. It is the
largest layout, so far where we managed to operate our agents. It contains 101
rooms and 181 doors. Its layout is shown in figure E.4 The last test problem had
290 PDDL objects, 912 initial facts and 3 goals. This test series we will name
Test 4 and list the results in table 8.3.
Each planner was run 20 times repeatedly by a script on a given problem, we
measured the number generated actions and the execution time.
The test environment was a Ubuntu 11.10 (64 bit version)Linux in a virtual
machine (VirtualBox) with the virtual parameters of 2 GB of memory and a
single Intel Core i7 CPU X 920 2 GHz.
8.4.2
Results
Legend to the tables:
• Avg. Time – average processing time in seconds.
• Std. Deviation – standard deviation of processing times as the percentage
of the average processing length.
• Actions – number of resulted actions
Test 1
Planner
Avg. Time Std. Deviation Actions
FF 2.3
0.052 s
12.916 %
78
Metric-FF
0.143 s
5.931 %
78
SGPlan 522
0.173 s
3.560 %
78
SGPlan 6
0.207 s
12.418 %
78
MIPS-XXL 2008
0.301 s
6.208 %
82
MIPS-XXL
0.455 s
7.046 %
78
Blackbox
–
–
–
HSP
–
–
–
LPG
–
–
–
LPRPG
–
–
–
Marvin
–
–
–
MaxPlan
–
–
–
Table 8.1: Planner results of test series 1
In our first test series, listed in table 8.1 an interesting fact appeared: the
simple test problem the MIPS-XXL 2008 version generated a solution that was
four steps, longer than the plan done by any other systems.
Fast-Forward implementations seam to dominate our problem domain.
40
Test 2
Planner
Avg. Time Std. Deviation Actions
FF 2.3
0.069 s
25.741 %
98
SGPlan 522
0.162 s
24.285 %
98
Metric-FF
0.214 s
21.069 %
98
SGPlan 6
0.277 s
15.784 %
98
MIPS-XXL 2008
0.599 s
9.929 %
98
MIPS-XXL
0.817 s
23.044 %
98
Blackbox
–
–
–
HSP
–
–
–
LPG
–
–
–
LPRPG
–
–
–
Marvin
–
–
–
MaxPlan
–
–
–
Table 8.2: Planner results of test series 2
As it can be seen from table 8.2 in the second test with the exception of
the MIPS-XXL planners, the planning systems showed no significant increase in
execution times.
All our planners failed in the third test (Test 3). With the exception MIPSXXL 2008 version no system produced any positive result within 6 minutes. However, not even MIPS-XXL was capable to generate a truly valid plan; it ignored
one of our goal conditions and in the returned action sequence let the the burglar
be seen (not observed fact in the PDDL got satisfied).
Test 4
Planner
Avg. Time Std. Deviation Actions
FF 2.3
1.446 s
6.052 %
96
SGPlan 522
7.085 s
2.330 %
96
Metric-FF
7.848 s
5.162 %
98
SGPlan 6
9.413 s
2.837 %
98
MIPS-XXL
49.170 s
5.078 %
96
MIPS-XXL 2008
56.178 s
8.271 %
96
Blackbox
–
–
–
HSP
–
–
–
LPG
–
–
–
LPRPG
–
–
–
Marvin
–
–
–
MaxPlan
–
–
–
Table 8.3: Planner results of test series 4
The fourth test series, listed in table 8.3 was accomplished on a level that was
built to test the limits of our game. Idea behind creating this level was to find the
borders of our default planner (SGPlan 5.22), where it was yet able to operate.
41
As it is visible from the table, the extensive number of predicates significantly
slowed down all planners. It had the most visible impact on our slowest planners,
the MIPS-XXL pair. It’s worth to note that SGPlan 5.22 is barely inside our
arbitrarily selected planning time limit of 8 s, that we use in the game. Any
further increase in the number of predicates may render planning instable on this
level using the default planner.
We omitted from table 8.3, but it’s an interesting fact that without the above
mentioned requirements (section 8.3.10) SGPlan version 5.22 produces a path
with the length of 98 actions. Version 6 has no such behavior. It produces plans
of length 98 consistently.
8.5
Conclusion
Using the above tests we discovered that there is an important difference between
not fulfilled preconditions and negative goals – translated to our specific case,
between a locked door (no fulfilled precondition) and a trapped room (negative
goal requirement). For a human it seams obvious to discover that in our domain
there is no way to negate an already set “observed” predicate, if the final goals
require the falseness of such a predicate. In these cases the plan should never use
an action containing such a result. Finding irrefutable effects is not even a hard
algorithmic task, however the creators of our tested planners did not implement
it. We had to be aware of this curious fact while developing our game.
Another important point clearly visible from our tests is that the tested planning systems are not producing an optimal plan (based on the number of actions).
Each one of them works in a bit different way and we have to be aware which
one of them we used in design time and in online-planning.
From the tests we also conclude that algorithms implementing heuristic bestfirst search are the most fitting to our problem type.
42
9. Player responses
Several testers were asked to play through at least the first 6 tutorial levels, than
continue with other challenges they chose. They had to tell whether they felt the
game enjoyable, and what would they add, or change in order, to make it better.
Each tester saw an updated version of the program, changed according to the
quickly implementable suggestions of previous testers.
All of them reported, that the game in principle was enjoyable, their detailed
remarks can be red below, than at the end of the chapter we provide a brief
overview of their performance.
9.1
Tester A
Our first tester was a 24 year old male, casual gamer with no developer background. He sad that he enjoyed the game and suggested the following improvements:
1. There should be a penalty system for actions made by the burglar, so a
shorter, but in player penalties more expensive path could also be appreciated.
2. High score list would be a great addition to the game.
3. Depending on the level difficulty sometimes guards should possess keys to
the lockable rooms.
4. While the game is paused to replan, it should show a notification counting
down from the maximal possible planning time towards zero.
5. After leaving a level, than returning to it, the game should not restore the
previous state of the level, especially if the level was already lost.
Requests 1, 3 are under further consideration. They are interesting but they
require further user testing.
Request 5 has been implemented.
Features 2, 4 were added to the future works list.
9.2
Tester B
The second subject was a 28 year old male, working as a professional game programmer. He enjoyed the game and gave the following remarks:
1. The notification window should be more visible on the background.
2. The cursor on the game area should disappear when the cursor enters a
button.
3. There was a bug on level 3, that made it impossible to solve.
43
4. The locked doors should be marked visually.
5. The levels should have a title instead of a filename.
6. The “Show Intent” button should work even if the agent is moving.
Change requests, that we have implemented, are: 1, 3, 4, 5 (partially implemented, file extension are hidden in the updated version), 6.
Feature 2 gets into the future works.
9.3
Tester C
The third subject was a 25 year old female, with little game experience. She gave
the following remarks:
1. When the game is in motion, the viewpoint should center on the burglar
and follow him.
2. There should be no need to explicitly click on the Cancel button in the
context menu.
3. There should be more interesting graphics and background.
Requests 1, 2 were added to the future works, improved graphics would require
a larger development team and members with graphics skills.
9.4
Tester D
The fourth subject was a 27 year old male casual gamer. He gave the following
remarks:
1. There should be a way to zoom in and out.
2. Extra player actions would be interesting, for example throwing a banana
peal, that is cheaper than dazing the guard, but requires foresight.
3. It should be visible what the agents are carrying with them.
4. A random map generator would be useful.
5. Agents should have viewing angles instead of room level detection.
6. There should be a way to temporarily immobilize the burglar while the
guards move along.
Feature 1, 2, 3, 4 gets into the future works. 2 is particularly interesting, it
would greatly improve the gameplay. Request 4 was already in our future works
list. The 5. request, to implement viewing angles is hard to implement using
planners; we would require a complementary technology accomplish it.
44
9.5
Tester E
The fifth subject was a 25 years old male casual gamer with a programming
background. He gave the following remarks:
1. The red activity indicator on cameras should be more visible even if the
image is darkened to symbolize, the burglar does not know about it.
2. The notification sequences should be skippable.
3. Don’t carry over the speed settings of the last level to the next one, neither
should be carried over any opened context menu.
4. It should be mentioned in the tutorial, that if a guard is dazed, he wont
awaken through out the whole level.
Features 3 and 4 has been implemented.
9.6
Tester F
The sixth subject was a 26 years old female with little gaming experience. She
gave the following remarks:
1. There is a bug causing plan failure if the agent changes its intentions in a
door to return to the previous room.
2. It would improve the game experience if there would be an always visible
control panel for the agent and object interaction.
3. Entering a new level the screen does not jump to the center of the new map,
it stays where we left it on the previous level.
Fixes 1, 3 has been implemented, feature 2 did not seem to be necessary.
9.7
Player performance
All of our testers managed to solve the tutorial levels. The primary reasons
of level restarts were learning of the controls and mistakes caused by lapse of
attention. After they had completed the tutorials and understood the available
user actions they managed to minimize the restart count. This is partially caused
by the character of the game and levels we created both by hand and with the
level generator tool.
In our program if discovered in time nearly every dangerous situation can
be reverted. Guards and cameras can be turned of even if with great point
penalties; the only exception is when the agent gives up every trap in our game
can be neutralized. This feature and the levels’ modular build allows the players
to instinctively break it to subproblems and solve them separately.
According to us the only major difference between the players performance
was in the preferred method to solve a level.
45
Players B and C, F choose a systematic planning approach. They examined
the whole map and set up all their changes on the level before letting the agents
to perform their plans.
Player A, D and E selected a more interactive approach. They start the
command execution and made their changes a few step ahead of the burglar just
as it closed on a dangerous situation. The difference between the two strategies
was most visible on the 3. tutorial level. Players planning ahead locked the guards
into rooms with 3 doors preventing them from interfering with the burglar’s plans
in any way. This cost them 1200 penalty points. In theory the players delaying
their decisions could have solved the level with less changes but at the end they
ended up closing more than 6 doors to avoid a collision.
46
10. Conclusion
We set out to test our hypothesis that it is possible to write a computer game in
which the agents’ decision making is purely based on readily available planning
systems. We also tried to integrate the same tools into the development process
in the form of level design.
While using planners in agent control we had to realize how completely different approach these tools require than other, more traditional agent systems; this
however does not mean that they would be incapable of filling the role. Their
strength is in the rational decision making they were built to perform. As opposed to finite state machines or decision trees a planner requires only a goal
and a set of possible actions. They are not dependent on the capabilities of the
programmer who wrote the agent and they seamlessly adapt even to changing
environments.
On the other hand compared to the above mentioned tools planners also have a
characteristic weakness: poor scalability. With increasing layout sizes the length
of the planning process increases exponentially and quickly reaches a point where
letting the player wait any longer is not a viable option. Another manifestation
of pure scalability is the increasing visibility of suboptimal action sequences.
The developer has to be aware of these conditions (our largest game levels are
included to demonstrate them) and if required, take actions by either introducing
a multi-layered planning system, or trying to reduce the set of details that is sent
to the planner.
Finally we have to conclude that while planning certainly has its own pitfalls,
we managed to apply it nearly seamlessly into the gameplay (see chapter 5).
In our type of game, designing levels is strongly connected to the field of
agent control, so we met all the advantages and pitfalls of planners we mentioned
above. The ability to produce a full action sequence for the burglar agent in a
single planning run that contains all the actions he intends to execute greatly
simplified our work. On the other hand generating traps for a larger layout easily
takes several minutes. In that time interval a human designer with a good editor
tool can manage the same job, probably even better, more convincing one.
We also have to note that inherent characteristics of our tested planning systems prevent us from passing several important decisions directly to the planner.
Large part of our work on the level creator tool was finding translations of such
decisions into planner problems. For example these systems can’t solve two excluding goals simultaneously in a Minimax fashion. This simple fact in our early
experiments resulted in burglars running into the hands of guards.
To summarize, our expectations were partially fulfilled. We found it possible
to use planners in the design process, however our results are not on equal level
with the work of a human.
The list of considered planning tools that truly fulfilled our requirements is
only 4 entries long and to our knowledge they are available only on UNIX-type
systems.
The user response we received was positive, all of the players reported that
they enjoyed exploring their possibilities in the game environment. Unfortunately
we had no long term testers, so we don’t know how long would their interest
47
sustain. We should not forget that this program is only a demonstration of a
concept. To be truly successful it would need many further improvements, some
of which (advised by the testers) can be found in the Feature requests section.
10.1
Future works
Here we mention some features that we would have liked to implement but limitations in time and human resources prevented us from doing so.
While working on the agent behavior the most interesting question we met
were connected to the replanning rules of the game characters. It would be a
great addition to the program if we could improve the plan optimizing methods
of our agents (described ins sections 5.8.2 and 5.8.4). We implemented and experienced only with a well selected group of methods that we considered to be the
most promising, but there are still others to test out (for example more direct
integration of PDDL actions into our game world with direct representation of
their preconditions and results).
In the gameplay we used the performance measure of traditional planning
systems to measure the quality of our action sequences. It would be interesting
to define a performance measure to human-like agents where for example exploring an area and finding alternative solutions to a problem would be a virtue as
opposed to our measurements.
While developing agent control we also stumbled upon the problem of controlling multiple agents in cooperation. Managing a group of guards to effectively
surround the burglar who is supported by the player would be a great challenge,
but unfortunately we touched the problem only marginally and ended up using
agents that were hardly aware of each other. The task however is still interesting
and probably could be a good ground of a future thesis.
Using planners to partially generate a game level is a nice feature, but we
could have followed automatization a step further. Several commercial games use
random generated levels. It is a long standing and well developed practice, and
for this very reason it wasn’t interesting to us. But used in cooperation with our
level creation methods it would leave hardly any work for the human designer
but some final touches. The developer could select the general properties of the
layout, then let it be generated and forwarded to our program that places the
challenges on it.
10.1.1
Alternative problem definition
The following section is strongly based on a technical report titled “BurglarGame: Planning Used In A Level Design And During A Gameplay” [5] written
in cooperation with Rudolf Kadlec.
In our problem definition a player typically needs several actions to neutralize
a trap, but on the other side whole series of traps might be solved by a single
clever move.
We could propose an alternative, more restricted, definition of F ′ than we did
in chapter 4.
′
First we define an alternative to H. H ′ (a, P, Sburglar ) = Sburglar
models a
situation where the player executes an action a ∈ A and due to the effects of a
48
the burglar can no longer follow its current plan P , hence he stops and updates
its world state with previously unknown effects of a, which results in the new
′
state Sburglar
.
Let for n ∈ N, n ≥ 1 F ′ fulfills the following condition:
Sburglar ∈ F ′ (Sreal , n) ≡ (4.1) ∧
′
∃a ∈ A : H ′ (a, P, Sburglar ) = Sburglar
∧
(10.1)
′
Sburglar
∈ F ′ (Sreal , n − 1) ∧
(10.2)
∀k < n − 1 :
′
Sburglar
′
∈
/ F (Sreal , k)
(10.3)
For n = 0 we define F ′ as:
Sburglar ∈ F ′ (Sreal , 0) ⇐⇒
∃P : ¬f lawed(P, Sburglar ) ∧ ¬f lawed(P, Sreal )
(10.4)
This definition shares the Condition 4.1 with definition of F . Then it requires
that there is single user’s action that makes the burglar stop and update its world
′
′
state to Sburglar
(Condition 10.1); given Sburglar
, the burglar can choose a plan
with n − 1 pitfalls (Condition 10.2), but not less (Condition 10.3). When the
recursion reaches its end we require that the burglar can choose a plan that is
solving the goal not only in his belief base, but also in the real state of the world
(Condition 10.4).
If we compare F and F ′ , F requires the player to make at least one action,
whereas F ′ requires full sequence of n player’s actions. On the other hand F , is
easier to compute and leaves more freedom of alternative solutions. In the future
we want to at least attempt implementing F ′ , but so far our generator uses the
definition F .
10.1.2
Feature requests
The features requested by the testers (in chapter 9) for possible future implementation fall into three categories: interface, gameplay experience, and additional
features.
On the interface it would be a useful to use a full featured game engine
that would implement sub-windows in a standard manner. This would enable
us to make the cursor change seamlessly between the windows and the cursor on
the game area would disappear when moved over a notification. With such an
environment there would be no need to explicitly click on the Cancel button in
the context menu.
The following features in the field of game experience seemed nice, and strongly advised in a full fledged commercial game, but in our proof of concept implementation seemed to be less relevant, so they were left for future improvements:
When the game is in motion, the viewpoint should center on the burglar, and
follow him so the player could always follow the most relevant part of the game
area; the possibility to zoom in and out would also allow a better overview of
the game events. In the current version of the game some details (for example
the context of the agents’ inventory or the keys that unlock certain objects) are
only visible through the status line (section B.3). It would be nice to introduce
49
visible clues representing such details. Finally waiting for the replanning process
might be nerve wrecking to the user; it would be nice to at least hint the maximal
required time they need to wait for the game to continue.
Into the last category we have improvements that would make the game more
interesting and longer lasting fun to the player, for example an increased set of
actions the user is capable to execute would add variability. The most interesting
idea was throwing a banana peal that would be cheaper than dazing a guard. It
would work as a kind of a landmine that is “dangerous” to both the burglar and
the guards and would carry the additional challenge to predict which exact tile
would an agent use to get to a certain place. Finally adding a high score list to
the menu would help to compare game results between multiple players.
50
Bibliography
[1] Vrakas, Dimitris and Ioannis Vlahavas. Artificial intelligence for advanced problem solving techniques. Hersey, New York, 2008, 388 pages. ISBN 1-599-04705-5. chapter 6.
[2] Newell, Allen, John C. Shaw and Herbert A. Simon. Report on a general
problem-solving program. Proceedings of the International Conference on
Information Processing, 1959. pages 256 – 264.
[3] Franklin, Stan. Artificial minds. Fifth printing. Cambridge: MIT Press,
Massachusetts, 1995, 449 pages. ISBN 0-262-56109-3.
[4] Ghallab, Malik, Dana S Nau and Paolo Traverso. Automated planning:
theory and practice. San Francisco: Morgan Kaufmann, 2004, 635 pages.
ISBN 15-586-0856-7.
[5] Kadlec R., Cs. Tóth, D. Toropila and C. Brom. Burglar-Game: Planning Used In A Level Design And During A Gameplay, unpublished technical
report written for ICAPS 2012
[6] Boyang, Li and Mark O. Riedl. An Offline Planning Approach to Game
Plotline Adaptation. Proceedings of the Sixth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2010, October 11 – 13, 2010, Stanford, California, USA. 2010, Stanford, California,
USA, October 11 – 13. pages 96 – 101.
[7] Porteous, Julie and Mark Cavazza. Controlling Narrative Generation
with Planning Trajectories: The Role of Constraints. Interactive storytelling: second joint international conference on interactive digital storytelling, ICIDS 2009, Guimaraes, Portugal, December 9 – 11, 2009. proceedings. New York: Springer, 2009, ISSN 978-3-642-10642-2. pages 234 – 245.
[8] Porteous, Julie, Jonathan Teutenberg, David Pizzi and Marc Cavazza. Visual Programming of Plan Dynamics Using Constraints and Landmarks. Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011,
pages 186 – 193.
[9] Pizzi, David, Marc Cavazza, Alex Whittaker and Jean-Luc Lugrin.
Automatic generation of game level solutions as storyboards. Proceedings of
4th Artificial Intelligence and Interactive Digital Entertainment Conference.
Vol. 8, 2008. pages 96 – 101.
[10] Edelkamp, Stefan and Jörg Hoffmann. PDDL2.2: The Language for the
Classical Part of the 4th International planning Competition, Technical Report No. 195, Institut für Informatik, 2004
[11] Orkin, Jeff. Applying Goal-Oriented Action Planning to Games. AI Game
Programming Wisdom, Charles River Media, Vol. 2, 2003, ISBN 1-584-502894. pages 217 – 227.
51
[12] Orkin, Jeff. Three States and a Plan: The A.I. of F.E.A.R.. In Proceedings
of the Game Developer’s Conference (GDC), 2006.
[13] Hoang, Hai, Stephen Lee-Urban and Héctor Muñoz-Avila. Hierarchical Plan Representations for Encoding Strategic Game AI. Proceedings of
1st Artificial Intelligence and Interactive Digital Entertainment Conference,
Bethlehem, PA 18015-3084, USA, Department of Computer Science & Engineering, Lehigh University, 2005.
[14] Muñoz-Avila, Hector and Hai Hoang (Lehigh University). Coordinating
Teams of Bots with Hierarchical Task Network Planning. AI Game Programing Wisdom, Vol. 3, 2006.
[15] Bartheye, Oliver and Éric Jacopin. A Real-Time PDDL-Based Planning
Component for Video Games. Proceedings of 5th AIIDE, AAAI Press, Menlo
Park, 2009.
[16] Bartheye, Oliver and Éric Jacopin. Connecting PDDL-based off-the-shelf
planners to an arcade game. Workshop AI in Games, ECAI, Vol. 8, 2008.
[17] Vassos, Stavros and Michail Papakonstantinou. The SimpleFPS Planning Domain: A PDDL Benchmark for Proactive NPCs. Workshops at the
Seventh Artificial Intelligence and Interactive Digital Entertainment Conference, Athens 15784, Greece, Department of Informatics and Telecommunications National and Kapodistrian University of Athens, 2011.
[18] Fikes, Richard E. and Nils J. Nilsson. STRIPS: A New Approach to the
Application of Theorem Proving to Problem Solving. Artificial Intelligence,
Vol. 2, Elsevier, North-Holland Publishing Company, 1971. pages 189 – 208.
[19] Kautz, Henry and Bart Selman. Unifying SAT-based and Graph-based
Planning. Proceedings of International Joint Conference on Artificial Intelligence, Stockholm, 1999.
[20] Hoffmann, Jörg. FF: The Fast-Forward Planning System. Journal of Articial Intelligence Research, Vol. 22, 79110 Freiburg, Germany, Institute for
Computer Science, Albert Ludwigs University, 2001. pages 57 – 62.
[21] Coles, Andrew and Amanda Smith. Marvin: A Heuristic Search Planner with Online Macro-Action Learning. Journal of Articial Intelligence Research, Vol. 28, Glasgow, G1 1XH, UK, Department of Computer and Information Sciences, University of Strathclyde, 2007. pages 119 – 156.
[22] Bonnet, Blai and Hector Geffner. HSP: Heuristic Search Planner, 1998.
[23] Gerevini, Alfonso, Alessandro Saetti and Ivan Serina. An Approach to
Temporal Planning and Scheduling in Domains with Predictable Exogenous
Events. Journal of Artificial Intelligence Research, Vol. 25, I-25123 Brescia,
Italy, Dipartimento di Elettronica per l’Automazione, Università degli Studi
di Brescia, 2006. pages 187 – 231.
52
[24] Coles, A., M. Fox, D. Long and A. Smith. A Hybrid Relaxed Planning Graph–LP Heuristic for Numeric Planning Domains. Proceedings of the
18th International Conference on Automated Planning and Scheduling, Glasgow, UK, Department of Computer and Information Sciences, University of
Strathclyde, Glasgow, 2008.
[25] Xing, Zhao, Yixin Chen and Weixiong Zhang. MaxPlan: Optimal Planning by Decomposed Satisfiability and Backward Reduction. Proceedings of
the 5th International Planning Competition, International Conference on Automated Planning and Scheduling, Saint Louis, MO 63130, USA, Department of Computer Science and Engineering, Washington University in St.
Louis, 2006. pages 53 – 56.
[26] Hoffmann, Jörg. The Metric-FF Planning System: Translating ”Ignoring
Delete Lists” to Numeric State Variables. Journal of Articial Intelligence
Research, Vol. 20, 79110 Freiburg, Germany, Institut für Informatik, 2003.
pages 291 – 341.
[27] Edelkamp, Stefan and Shahid Jabbar. MIPS-XXL: Featuring External
Shortest Path Search for Sequential Optimal Plans and External BranchAnd-Bound for Optimal Net Benefit. Proceedings of the 6th International
Planning Competition, Sydney, Australia, TU Dortmund, Germany, Faculty
of Computer Science, 2008.
[28] Chen, Yixin, Chih-Wei Hsu and Benjamin W. Wah. SGPlan: Subgoal
Partitioning and Resolution in Planning. In Edelkamp, Urbana-Champaign,
IL 61801, USA, Department of Electrical and Computer Engineering and the
Coordinated Science Laboratory, University of Illinois, 2004. pages 30 – 32.
[29] Russell, Stuart J. and Peter Norvig. Artificial Intelligence: A Modern
Approach. 2nd ed. Upper Saddle River, New Jersey: Prentice Hall, 2003,
1081 pages. ISBN 0-13-79-0395-2. chapter 2.
[30] Kučerová, L., Brom, C., Kadlec, R. Towards Planning the History
of a Virtual Agent. In Proceedings of ICAPS’10 Workshop on Planning in
Games, 2010.
53
List of Tables
table 8.1 Planner results of test series 1
table 8.2 Planner results of test series 2
table 8.3 Planner results of test series 4
54
List of Abbreviations
ADL Action Description language (a formal language for automatic planning
systems)
API application programming interface
FF Fast-Forward
FPS First Person Shooter
GOAP Goal-Oriented Action Planning
HSP Heuristic Search Planner
HTN Hierarchical Task Network
IPC International Planning Competition
Londex Long-Distance Mutual Exclusion
LPG Local search for Planning Graphs
LPRPG Linear Program Relaxed Planning Graph
MIPS Model Checking Integrated Planning System
PDDL Planning Domain Definition Language
SGPlan Subgoal Partitioning and Resolution in Planning
STRIPS Stanford Research Institute Problem Solver
55
A. Attachments
The thesis should be accompanied by a CD with the folowing contents:
• README.TXT – content guide
• thesis.pdf – the text of the master thesis
• burglar-game/ – source codes of the game, together with the documentation
and the thesis sources
• burglar-game-1.0-SNAPSHOT/ – the compiled version of the program
• planner tests/ – planning systems, test domain, test problems, brief test
results
56
B. User documentation – game
In this section we describe the buttons and methods the player may use to interact
with the game.
B.1
Running the game
Our program requires a Java executing environment; it can be started with the
following command:
run.sh
It has four possible command line parameters:
-width <number> Sets the screen width in pixels. It can be omitted.
-height <number> Sets the screen height in pixels. It can be omitted.
-rate Displays the current frame rate of the program. It can be omitted.
-replan on knowledge Instructs the agents to replan after gathering new knowledge.
-resources <string> Sets the resource folder path, where the program looks for
the maps, image resources, sounds, and planning tools. It can be omitted,
the default value is “./resources/”.
B.2
Adding new levels to the game
To be found, the level files must be placed into the map directory on the resources
path.
B.3
Status line
The lower left corner contains constantly updating status line that provides important additional information about the objects the cursor is idling above. Some
of the details displayed here are unfortunately not available in the graphic environment (for example the identifier of a door’s key, or the contents of a container).
Its format is:
[<planning state>, <running state>] | [penalties: <penalties>] [x, y] |
objectType(<object id>) [inventory or significant properties], ...
Where
• planning state indicates whether the gameplay is running or is paused to
replan agent behavior.
57
• running state indicates whether the game is running or paused by the
player.
• penalties the gathered penalty points on the level.
• [x, y] are the cursor coordinates on the layout.
• objectType(<object id>) the type, and the program-wide unique identifier of the object.
• [inventory or significant properties] the content of the object’s inventory or the relevant details of the object referred to in the object id.
B.4
Buttons and shortcuts
The lower right corner of the screen we call button panel that contains the following control buttons:
• Inc Distance – Increases the length of the visualized section of the currently selected agent’s plan. It costs 20 penalty points to use this button.
• Dec Distance – Decreases the length of the visualized section of the currently selected agent’s plan.
• Speed up – Increases the game speed. Its keyboard shortcut is +.
• Slow down – Decreases the game speed. Its keyboard shortcut is -.
• Pause/Continue – Pauses or continues the execution of agent plans. Its
keyboard shortcut is {Space Bar}.
• Restart – Restores the current level to its starting state.
• Leave – Returns the player to the main menu.
• Exit – Finishes the whole program. Its keyboard shortcut is {Escape}.
• {Direction Keys} – Moves around on the game area.
B.5
Context menu
Clicking on a game tile that contains one or more objects the player can interact
with brings up the context menu with the list of available actions.
• Open – If the selected position contains a door, or a container, opens it.
It costs 100 penalty points to use this button.
• Close – If the selected position contains a door, or a container, closes it.
It costs 100 penalty points to use this button.
• Lock – If the selected position contains a door, or a container, locks it. It
costs 100 penalty points to use this button.
58
• Unlock – If the selected position contains a door, or a container, unlocks
it. It costs 100 penalty points to use this button.
• Close + Lock – If the selected position contains a door, or a container,
closes, and locks it. It costs 200 penalty points to use this button.
• Activate – If the selected position contains a camera or a vending machine,
it gets activated. Activating a camera costs 500 penalty points, while activating a vending machine costs 50.
• Deactivate – If the selected position contains a camera, it gets deactivated.
It costs 500 penalty points to use this button.
• Show Intent – If the selected position contains an agent, visualizes its
intent line. It costs 100 penalty points to use this button.
• Hide Intent – If the selected position contains an agent, it hides its intent
line. It costs 0 penalty points to use this button.
• Daze – If the selected position contains a guard, it gets stunned, and won’t
recover until the end of the level. It costs 1000 penalty points to use this
button.
• Cancel – Closes the context menu.
59
C. User documentation – level
generator
C.1
Running the generator
It is a purely console application without graphic output; it can be started with
a following command:
run.sh -generator -m <STRING> -l <STRING> -t <NUMBER>
It has four possible command line parameters:
-generator Informs the program that the user wants to run the level generator,
not the game itself.
-m <string> Sets the path to the input map file where the starting layout can
be found. If no input is defined, tries to receive it from standard input.
-l <string> Sets the path to the output level file where the resulted layout
should be written. If no output is defined, sends the result to standard
output.
-t <number> Sets the required number of pitfalls to be added.
C.2
Usage example
run.sh -generator -m input_map.xml -l output_level.xml -t 3
Opens the input map.xml file and attempts to add 3 pitfals to the burglar
agent’s path; finaly puts the resulted map to the output level.xml file.
C.3
Level files
Technically a layout file and a level file looks the same. The only difference is
in the existence of traps. In addition a human developer can easily modify the
output of the level generator to add additional player hints, traps, or any other
objects.
The levels and maps are stored as XML files in the preset map directory; their
exact format is described in the “map.dtd” file.
To work correctly these files have to fulfill the following additional requirements:
The world layout must contain at least one room, a single burglar agent,
and a treasure as an aim to collect; every other object is optional. There can’t
be multiple objects with the same unique identifier except for the ones in the
real world and their counterparts in the agents’ beliefs; while these beliefs can’t
contain additional game objects that are non-existent in the true layout.
To be found, the level files must be placed into the map directory on the
resources path.
60
D. PDDL domain
The final unified domain (agent.pddl) used in planning:
Listing D.1: domain file agent.pddl
; a g ent domain :
; Csaba Toth
( d e f i n e ( domain a g ent )
( : requirements : s t r i p s : typing : adl : p r e f e r e n c e s )
; fo r sgplan
( : types
t burglar t dog t guard
t key t t r e a s u r e t uniform
t agent t container
t container t door
t camera t c o n t a i n e r \
t phone t s w i t c h \
t vender t door
\
t f loor t passiveagent
t room
)
−
−
−
−
t
t
t
t
agent
item
inventory
lockable
− t position
(: predicates
( closed ? position − t lockable )
; a lockable position is closed
( locked ? position − t lo c k a ble )
; a lockable position i s locked
( n e e d s k e y ? l o c k a b l e − t l o c k a b l e ? key − t k e y )
; t h e key needed t o u n l o c k t h e l o c k a b l e p o s i t i o n
( a g e n t n e a r ? a g ent − t a g e n t ? p o s i t i o n − t p o s i t i o n )
; t h e a g ent s t a n d s next t o t h e p o s i t i o n
( a g e n t i n ? a g ent − t a g e n t ?room − t room )
; t h e a g ent i s i n t h e room
( p o s i t i o n i n ? p o s i t i o n − t p o s i t i o n ? room − t room )
; t h e p o s i t i o n i s i n t h e room
( d o o r c o n n e c t s ? do o r − t d o o r ? room1 − t room ? room2 − t room )
; t h e do o r c o n n e c t s t h e two rooms t o g e t h e r
( c o n t a i n s ? i n v e n t o r y − t i n v e n t o r y ? item − t i t e m )
; t h e c o n t a i n e r c o n t a i n s t h e item
( v i s i t e d ? a g ent − t a g e n t ?room − t room )
; t h e a g ent has a l l r e a d y v i s i t e d t h e room
( o b s e r v e d ? a g ent − t a g e n t )
; t h e a g ent has been o b s e r v e d by a guard o r a camera
( r o o m o bser ved ? room − t room )
; t h e room i s b e i n g o b s e r v e d by a guard o r a camera
( p o s i t i o n u s e d ? a g ent − t a g e n t ? vender − t v e n d e r )
; t h e a g ent used a vender
)
61
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; actions :
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; move
;;;;
− l e a v e s a p o s i t i o n and g o e s t o a n o t h e r
;;;;
one i n t h e same room
;;;;
( : a c t i o n move
: parameters
(
? a g ent − t a g e n t
? start − t position
? aim
− t position
?room
− t room
)
: precondition
( and
( a g e n t i n ? a g ent ? room )
( p o s i t i o n i n ? aim ? room )
( a g e n t n e a r ? a g ent ? s t a r t )
( not ( a g e n t n e a r ? a g ent ? aim ) )
)
: effect
( and
( a g e n t n e a r ? a g ent ? aim )
( not ( a g e n t n e a r ? a g ent ? s t a r t ) )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; enter observed
;;;;
− c r o s s e s t h e do o r t o an o b s e r v e d room
;;;;
witho ut d i s g u i s e
;;;;
( : action enter observed
: parameters
(
? a g ent
− t agent
? do o r
− t door
? room from − t room
? r o o m to
− t room
)
: precondition
( and
( a g e n t i n ? a g ent ? room from )
( d o o r c o n n e c t s ? do o r ? room from ? r o o m to )
( a g e n t n e a r ? a g ent ? do o r )
( not ( c l o s e d ? do o r ) )
)
: effect
( and
( a g e n t i n ? a g ent ? r o o m to )
( not ( a g e n t i n ? a g ent ? room from ) )
( v i s i t e d ? a g ent ? r o o m to )
( v i s i t e d ? a g ent ? room from )
( o b s e r v e d ? a g ent )
)
)
62
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; enter unobserved
;;;;
− c r o s s e s t h e do o r t o un uno bser ved
;;;;
room o r i n d i s g u i s e
;;;;
( : action enter unobserved
: parameters
(
? a g ent
− t agent
? do o r
− t door
? room from − t room
? r o o m to
− t room
? unifo r m
− t uniform
)
: precondition
( and
( a g e n t i n ? a g ent ? room from )
( d o o r c o n n e c t s ? do o r ? room from ? r o o m to )
( a g e n t n e a r ? a g ent ? do o r )
( not ( c l o s e d ? do o r ) )
( or
( c o n t a i n s ? a g ent ? unifo r m )
( not ( r o o m o bser ved ? r o o m to ) )
)
)
: effect
( and
( a g e n t i n ? a g ent ? r o o m to )
( not ( a g e n t i n ? a g ent ? room from ) )
( v i s i t e d ? a g ent ? r o o m to )
( v i s i t e d ? a g ent ? room from )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; open
;;;;
− o pens a l o c k a b l e p o s i t i o n
;;;;
( : a c t i o n open
: parameters
(
? a g ent
− t agent
? lockable − t lockable
)
: precondition
( and
( a g e n t n e a r ? a g ent ? l o c k a b l e )
( closed ? lockable )
( not ( l o c k e d ? l o c k a b l e ) )
)
: effect
( and
( not ( c l o s e d ? l o c k a b l e ) )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
63
; ; ; ; close
;;;;
− closes a lockable position
;;;;
( : action close
: parameters
(
? a g ent
− t agent
? lockable − t lockable
)
: precondition
( and
( a g e n t n e a r ? a g ent ? l o c k a b l e )
( not ( c l o s e d ? l o c k a b l e ) )
)
: effect
( and
( closed ? lockable )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; unlock
;;;;
− unlocks a lockable position
;;;;
( : action unlock
: parameters
(
? a g ent
− t agent
? lockable − t lockable
? key
− t key
)
: precondition
( and
( a g e n t n e a r ? a g ent ? l o c k a b l e )
( locked ? lockable )
( c o n t a i n s ? a g ent ? key )
( n e e d s k e y ? l o c k a b l e ? key )
)
: effect
( and
( not ( l o c k e d ? l o c k a b l e ) )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
64
; ; ; ; lock
;;;;
− locks a lockable position
;;;;
( : action lock
: parameters
(
? a g ent
− t agent
? lockable − t lockable
? key
− t key
)
: precondition
( and
( a g e n t n e a r ? a g ent ? l o c k a b l e )
( not ( l o c k e d ? l o c k a b l e ) )
( c o n t a i n s ? a g ent ? key )
( n e e d s k e y ? l o c k a b l e ? key )
)
: effect
( and
( locked ? lockable )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; pick up
;;;;
− p i c k s an item from a c o n t a i n e r
;;;;
( : action pick up
: parameters
(
? a g ent
− t agent
? container − t container
? item
− t item
)
: precondition
( and
( a g e n t n e a r ? a g ent ? c o n t a i n e r )
( not ( c l o s e d ? c o n t a i n e r ) )
( c o n t a i n s ? c o n t a i n e r ? item )
)
: effect
( and
( not ( c o n t a i n s ? c o n t a i n e r ? item ) )
( c o n t a i n s ? a g ent ? item )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
65
; ; ; ; take clothes
;;;;
− t a k e s c l o t h e s from a p a s s i v e a g ent
;;;;
i n t h e same room
;;;;
( : action take clothes
: parameters
(
? a g ent
− t agent
? passiveagent − t passiveagent
? unifo r m
− t uniform
)
: precondition
( and
( a g e n t n e a r ? a g ent ? p a s s i v e a g e n t )
)
: effect
( and
( c o n t a i n s ? a g ent ? unifo r m )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ; ; ; use
;;;;
− uses a position
;;;;
( : a c t i o n use
: parameters
(
? a g ent − t a g e n t
? vender − t v e n d e r
)
: precondition
( and
( a g e n t n e a r ? a g ent ? vender )
)
: effect
( and
( p o s i t i o n u s e d ? a g ent ? vender )
)
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
)
66
E. Levels in planner tests
Figure E.1: Level “complex.xml” used in planning system test 1 (results listed in
table 8.1) and the solution generated by SGPlan 5.22
67
Figure E.2: Level “complex.xml” used in planning system test 2 (results listed in
table 8.2) and the solution generated by SGPlan 5.22
Figure E.3: Level “complex.xml” used in planning system test 3 where we marked
the room to avoid with a red X
68
Figure E.4: Level “map-chess-board.xml” used in planning system test 4 (results
listed in table 8.3)
69
© Copyright 2026 Paperzz