Machine Learning-based gameplay - Facultatea de Matematică şi

UNIVERSITATEA BABEŞ-BOLYAI CLUJ-NAPOCA
FACULTATEA DE MATEMATICǍ ŞI INFORMATICǍ
SPECIALIZAREA INFORMATICĂ ROMÂNĂ
LUCRARE DE DIPLOMǍ
Machine Learning-based gameplay
Conducător ştiinţific
Prof. univ. dr. Czibula Gabriela
Absolvent
Cîmpean Alexandru
2012
Table of contents
Introduction ................................................................................................................................... 3
I. Game perspective in Artificial Intelligence ............................................................................. 5
1.1 A general game description................................................................................................. 5
1.2 The context of video games ................................................................................................. 5
1.3 Artificial intelligence in video games ............................................................................... 13
II. Learning to play ..................................................................................................................... 17
2.1 Software agents: autonomous programs ......................................................................... 17
2.2 Machine learning: making computers act ....................................................................... 22
2.3 Embedding machine learning in computer ga mes ......................................................... 23
2.3.1 The history of machine learning and games............................................................. 23
2.3.2 What do the games have to learn? ............................................................................ 23
2.3.3 Creating a Learning Agent ........................................................................................ 25
2.3.4 Proble ms with Learning ............................................................................................. 28
III. Relevant applications of machine learning in games ........................................................ 30
3.1 TD-Gammon ...................................................................................................................... 30
3.2 Samuel's checkers program .............................................................................................. 31
3.3 NeuroChess......................................................................................................................... 32
3.4 SAL...................................................................................................................................... 32
IV. “A Tale of Two Catapults” – machine learning techniques applied in games................ 34
4.1 Proble m statement ............................................................................................................. 34
4.2 Motivation .......................................................................................................................... 34
4.3 Analysis and design............................................................................................................ 35
4.3.1 Proble m analysis ......................................................................................................... 35
4.3.2 Solution’s design ......................................................................................................... 38
4.4 Implementation .................................................................................................................. 42
4.5 User manual ....................................................................................................................... 49
4.6 Results obtained ................................................................................................................. 51
4.7 Future work........................................................................................................................ 53
Conclusions .................................................................................................................................. 54
Bibliography ................................................................................................................................ 55
2
Introduction
Since the beginning of computer games, whether they were inspired from traditional games such
as chess, backgammon, checkers, whether they exploited the new opportunities of the virtual
worlds, creating new and innovative genres, the problem of an artificial player appeared. The
players of those video games desired a program that knows how to play a game, and will play it
in collaboration or against him, using similar skills as themselves.
The first games had a component that would play against the human player, but that proved to
be very poor and limited according to the players’ demands. Then, in order to beat the game,
some games had a collaborative part which helped the player in different situations that mostly
couldn’t be done alone. To make such programs the developers inspired themselves from a
classical branch of artificial intelligence, more exactly game theory. Using this method the
programs used different algorithms to choose the best moves based on different heuristics and
classical moves. The game theory approach proved to be a good one until its limits were
discovered. This method requires a well established strategy and when the state space is big they
will run very slow. Another deficit of this method is the incapacity to adapt to new situations,
making the radical change of the game parameters difficult or even impossible.
Later, the concept of agent, which we will explain on this work, introduced new possibilities for
programs to play games. Thus the artificial intelligence in video games has grown according to
the need of better non-human players and was split in multiple types: traditional game playing,
mimicking intelligence and human player behavior and some of them tried to adapt to methods
from another branch of artificial intelligence – machine learning. Machine learning is a concept
we will explain further in this work and applied in games, agents are trying to learn a game
without knowing its rules and without considering a strategy, only based on the feedback he
receives after playing repeated games with himself.
3
In the first chapter of this work we will present theoretical aspects of video games in general,
artificial intelligence in video games as well as different traditional techniques of making a game
playing program.
In the second chapter we will present other techniques of artificial intelligence. In the first place
we will describe agents and their characteristics; we will see what an agent needs to be
considered intelligent and what features are required; we will classify the agents according to
different criteria and give examples of some of them. Further in this chapter we will present the
concept of machine learning; we will describe its subtypes and characteristics; we will present
the steps needed to make a learning algorithm; also here we will describe the concept of learning
to play, give examples of learning methods used in game play and see how they work;
In the third chapter of this work we will talk about the recent work in this domain and analyze
some of the most relevant learning games together with their characteristics.
The fourth chapter of this work is a practical application of machine learning in games. The
game is called “A Tale of Two Catapults”, it is a game in which the player controls a catapult
and in order to win the game he has to hit the opponent catapult. He can choose the required
angle and initial velocity to shoot a projectile at some distance the opponent is situated.
Furthermore the wind can blow from any direction and influence the trajectory of the projectile.
In order to win the game the player has to hit the opponent three times. The opponent is in fact a
program designed to learn to shoot after playing many games with himself. In the fourth chapter
we will present all the steps in developing such a program including the design, the algorithms
and methods used, difficulties met and the results obtained.
4
I. Game perspective in Artificial Intelligence
1.1 A general game description
Below we have some definitions of games from different theoreticians and specialists in games:

"A game is a system in which players engage in an artificial conflict, defined by rules,
that results in a quantifiable outcome." (Katie Salen and Eric Zimmerman)[1]

"A game is a form of art in which participants, termed players, make decisions in order to
manage resources through game tokens in the pursuit of a goal." (Greg Costikyan)[2]

"A game is an activity among two or more independent decision- makers seeking to
achieve their objectives in some limiting context." (Clark C. Abt)[3]

"At its most elementary level then we can define game as an exercise of voluntary control
systems in which there is an opposition between forces, confined by a procedure and
rules in order to produce a disequilibrial outcome." (Elliot Avedon and Brian SuttonSmith)[4]

"A game is a form of play with goals and structure." (Kevin J. Maroney)[5]
1.2 The context of video games
There are two principal reasons to continue to do research on games ... First, human fascination
with game playing is long-standing and pervasive. Anthropologists have catalogued popular
games in almost every culture....Games intrigue us because they address important cognitive
functions....The second reason...is that some difficult games remain to be won, games that people
play very well but computers do not. These games clarify what our current approach lacks. They
set challenges for us to meet, and they promise ample rewards [6]. - Susan L. Epstein, Game
Playing: The Next Moves
5
Computer game designer Chris Crawford attempted to define the term game using a series of
dichotomies:

Creative expression is art if made for its own beauty, and entertainment if made for
money.

A piece of entertainment is a plaything if it is interactive. Movies and books are cited as
examples of non- interactive entertainment.

If no goals are associated with a plaything, it is a toy. (Crawford notes that by his
definition, (a) a toy can become a game element if the player makes up rules, and (b) The
Sims and SimCity are toys, not games.) If it has goals, a plaything is a challenge.

If a challenge has no "active agent against whom you compete," it is a puzzle; if there is
one, it is a conflict. (Crawford admits that this is a subjective test. Video games with
noticeably algorithmic artificial intelligence can be played as puzzles; these include the
patterns used to evade ghosts in Pac-Man.)

Finally, if the player can only outperform the opponent, but not attack them to interfere
with their performance, the conflict is a competition. (Competitions include racing and
figure skating.) However, if attacks are allowed, then the conflict qualifies as a game.[7]
Video Games encompass a range of electronic games, including coin-operated arcade games,
console and cartridge games, various handheld electronics, and games on diskette and CD-ROM
ranging in sophistication from simple geometric shapes to virtual reality programs with
movielike qualities. [8]
Most games fall within a particular category. Some different gaming styles and, thus, could
appear under more than one category simultaneously. And others pioneer new approaches to
electronic entertainment. Those often fall outside of any pre-conceived genre. If such a title
becomes popular enough that others try to duplicate the experience, a new genre comes into
being.[9]
Below is a list of some of the most common video game genres with a brief description and some
examples of games that fit within that category. This list is not comprehensive and is limited to
larger classifications. It is easy to classify particular titles much more narrowly and, thus, create
6
dozens of genres and/or sub- genres. This is an attempt to give a broader perspective of types of
video games.[9]

Shooter:
One of the oldest genres of video game is the classic shooter. It has roots in the early 60s with
Steve Russell's Spacewar! Shooters are games that require the player to blow away enemies or
objects in order to survive and continue gameplay. They usually fall into one of two categories:
1) horizontal, or 2) vertical. However, like Spacewar!, Star Castle, and Asteroids, there are
shooters that are neither horizontal or vertical. These involve moving around the screen and
shooting in whatever direction necessary to keep from being destroyed. Other classic examples
include Defender, Galaga, R-Type, Phoenix, Space Invaders, Tempest, Xevious, and Zaxxon.

First-Person-Shooter (or FPS):
This is an example of a sub-genre that has grown enough to become its own genre. In fact,
because of the prevalence of these games, many people use the term "shooter" to refer to firstperson-shooters. These games are realtime fast-paced action games in which the player navigates
an environment from a first-person perspective and, usually, blows everything and everyone
away whenever possible. Though Wolfenstein 3D is regarded as the first sucessful example of
this genre, it wasn't until the release of Doom that people began to recognize the true potential of
this type of gaming. Doom enabled multiple game players to share in the same game
simultaneously via modem and LAN. This would become the standard of this genre, opening the
game format up to multi-player deathmatches that would become so important to the format that
some put little effort into story and the single-player experience in general (i.e., Unreal
Tournament and Quake III). Though this is a relatively new genre (since the early 1990s), it has
grown in popularity. Examples of first-person-shooter franchises include Wolfenstein 3D, Doom,
Duke Nukem 3D, Descent, Marathon, GoldenEye, Halo, Quake, and Time Splitters.
7

Adventure:
Another of the first video game genres, especially from the computer platforms, was the
adventure game. These were initially text-based games like Will Crowther's Collossal Cave and
the original Zork games. However, as the power of the gaming systems grew, developers tried to
tap into the visual capabilities of each consecutive platform. The Atari VCS offered a game
entitled Adventure. Roberta Williams began develping the King's Quest series for Sierra Online
in an attempt to add interactive graphics and point-and-click funtionality to the more puzzleoriented traditional text-based adventure. There has always been a strong following for this genre
because of the challenge of puzzle-solving and the general lack of violence. This has also made it
popular for many non-traditional gaming demographics. In recent years, LucasArts and Cyan
have been known for their contributions to the adventure genre. Other examples of adventure
franchises include Gabriel Knight, Indiana Jones, Maniac Mansion, Monkey Island, Myst, Police
Quest, and Syberia.

Platform:
It is believed that the platform genre began in 1981 with the release of the games Donkey Kong
and Space Panic. Games within this genre are usually identified by navigating environments that
require timing and jumping in order to reach a desitination while avoiding and/or disposing of
enemies. Many of these, like Donkey Kong, have a series of screens, each with its own
individual pattern of challenges. As companies began to develop platform games for home
consoles and computers instead of arcade machines (i.e. Super Mario Bros for the Famicom and
Nintendo Entertainment system), they took advantage of the evolving processors and and greater
memory capacity by transcending individual screens and utilizing actively side-scrolling worlds.
This evolutionary step in platform games moved them closer to immersive stories rather than
challenging puzzles. Platform video games continued to evolve as gaming became more 3D. One
of the greatest 3D platform games was introduced with the launch of the Nintendo 64 and was
called Super Mario 64. Examples of 2D screen-based platform franchises include Bubble
Bobble, Burgertime, Donkey Kong, Lode Runner, Mario Bros., and Space Panic. Examples of
2D scrolling platform franchises include Bonk, Donkey Kong Country, Sonic the Hedgehog,
8
Super Mario Bros., and Vectorman. Examples of 3D platform franchises include Crash
Bandicoot, Pac-Man World, Spyro the Dragon, and the aforementioned Super Mario 64.

Role-Playing Games (RPGs):
Evolving from pen-and-paper games like Dungeons and Dragons, RPGs are a special type of
adventure game that usually incorporate three major elements: 1) a specific quest, 2) a process
for evolving a character through experience to improve his/her ability to handle deadlier foes, 3)
the careful acquisition and management if inventory items for the quest (i.e., weapons, armor,
healing items, food, and tools). Having said that, these games still have many variations and
appearances.

Puzzle:
In many ways, puzzle video games are not dissimilar from traditional puzzles. What they offer
are unique environments that are not as easily introduced in one's living room. For example,
Wetrix enables the player to build up a series of walls that would be able to contain a deluge of
water when it falls. Successful completion of a level involves capturing enough water. Other
examples include Tetris, Intelligent Qube, Puzzle Bobble, Puyo Puyo, Devil Dice, and Mercury.

Simulations:
By their nature, simulations are attempts to accurately re-create an experience. These can be in
the form of management simulations like SimCity and Theme Hospital, or more hands on like
MicroSoft Flight Simulator or Gran Turismo.

Strategy/Tactics:
Like simulations, strategy/tactics games attempt to capture a sense of realism for the game player
to experience. However, these titles are often turn-based as opposed to realtime and they give the
player a greater sense of specific control over a situation. Franchises that fall into this genre
include Ogre Tactics, Command and Conquer, Final Fantasy Tactics, and Worms.
9

Sports
As you can imagine, sports games are those that simulate the playing of sports. Many of these
have incorporated novel aspects beyond the games themselves. For example, most football video
games like the Madden series enable the player to create and customize teams and play them for
an entire season. Furthermore, many sports games include management elements beyond the
games themselves. There is quite a bit of variety in this genre for fans of the games, the players,
and the behind the scenes responsibilities of owning a team.

Fighting:
These titles pit player against player (usually 2 players head-to-head) and involve one triumphing
over the other. Many of these games include a single player mode, but the real draw to this genre
is the ability to demonstrate one's gaming prowess against a friend. Examples of franchises in
this genre include Street Fighter, Soul Calibur, Mortal Kombat, Tekken, Virtua Fighter, Dead or
Alive, King of Fighters, and Bloody Roar.

Dance/Rhythm:
Dance Dance Revolution is probably the single largest franchise in this genre. Of the rest, many
require a specialized controller like DDR, but several don't. This grouping of games is
differentiated by the timed elements usually synched to music somehow. Other good examples of
this form include Parappa the Rapper, Bust a Groove, Gitaroo Man, Space Channel 5,
Frequency, Beatmania, Para Para Paradise, Donkey Konga, and Eyetoy Groove.

Survival Horror:
As the name suggests, these titles are an interactive evolutionary step of the horror genre. The
main gameplay mechanic in these is to "survive" the environment that includes fantastic or
supernatural elements that are very frightening and often disturbing. Many of these titles are
rated mature because of they are not intended for younger audiences and often include graphic
scenes.
10

Hybrids:
It's important to recognize that many games are not limited to a single genre. Some are the
combination of two or more game types. In fact, as gaming evolves, we see lines blurred
between genres more frequently than not. Since the introduction of 3D gaming, the
action/adventure genre has grown dramatically. It is practically a catch-all category that
incorporates 3D games with real-time combat and puzzle-solving in a fairly cohesive storyline.
Many of these games are also first-person-shooters. Some are 3D platform titles. And most
survival horror titles qualify as Action/Adventure games too. Another example of a hybrid is
Myst. It is both an adventure game and a puzzle game. However, it is most certainly not an
Action/Adventure game. [9]
Below we have three examples of games that show the evolution of video games throughout the
decades:
Figure 1 – OXO (Noughts And Crosses) – made in 1952 by Alexander S. Douglas
Figure 2 - Super Mario Bros. – made in 1985 by Nintendo Co., Ltd
Figure 3 – Assassin’s Creed – made in 2007 by Ubisoft Entertainment S.A.
11
Figure 1 - OXO
Figure 2 - Super Mario Bros.
Figure 3 - Assassin's Creed
12
1.3 Artificial intelligence in video games
Artificial intelligence in video games can be implemented using two types of algorithms:
machine learning, which we will discuss in the next chapter and search algo rithms. Next we will
make a classification of search algorithms and describe them.
Search in artificial intelligence
Search plays a major role in solving many Artificial Intelligence (AI) problems. Search is a
universal problem-solving mechanism in AI. In many problems, sequence of steps required to
solve is not known in advance but must be determined by systematic trial-and-error exploration
of alternatives.
The problems that are addressed by AI search algorithms fall into three general classes:
single-agent path-finding problems, two-players games, and constraint-satisfaction problems [10]

Path-finding problems
Classic examples in the AI literature of path- finding problems are sliding- title puzzles, Rubik’s
Cube and theorem proving. The single-title puzzles are common test beds for research in AI
search algorithms as they are very simple to represent and manipulate. Real-world problems
include the traveling salesman problem, vehicle navigation, and the wiring of VLSI circuits. In
each case, the task is to find a sequence of operations that map an initial state to a goal state.

Two-players games
Two-players games are two-player perfect information games. Chess, checkers, and othello are
some of the two-player games.
13

Constraint-satisfaction problems
Eight Queens problem is the best example. The task is to place eight queens on an 8*8
chessboard such that no two queens are on the same row, column or diagonal. Real- world
examples of constraint satisfaction problems are planning and scheduling applications.
Proble m space
Problem space is a set of states and the connections between to represent the problem. Problem
space graph is used to represent a problem space. In the graph, the states are represented by
nodes of the graph, and the operators by edges between nodes. Although most problem spaces
correspond to graphs with more than one path between a pair of nodes, for simplicity they are
often represented as trees, where the initial state is the root of the tree. The cost of the
simplification is that any state that can be reached by two different paths will be represented by
duplicate nodes thereby increasing the tree size. The benefit of using tree is that the absence of
cycles greatly simplifies many search algorithms.
One feature that distinguishes AI search algorithms from other graph-searching algorithms is the
size of the graph involved. For example, the entire chess graph is estimated to contain over
10^40 nodes. Even a simple problem like twenty-four puzzle contains almost 10^25 nodes. As a
result, the problem-space graphs of AI problems are never represented explicitly by listing each
state, but rather are implicitly represented by specifying an initial state and a set of operators to
generate new states from existing states. Moreover the size o f an AI problem is rarely expressed
as the number of nodes in its problem-space graph. The two parameters of a search tree that
determine the efficiency of various search algorithms are its branching factor and its solution
depth. The branching factor is nothing but average number of children of a given node. For
example in eight-puzzle problem, the average branching factor is square-root(3) or about 1.732.
The solution depth of a problem instance is the length of a shortest path from the initial state to a
goal state or the length of a shortest sequence of operators that solve the problem. [10]
14
Classes of search algorithms

For virtual search spaces
Algorithms for searching virtual spaces are used in constraint satisfaction problem, where the
goal is to find a set of value assignments to certain variables that will satisfy specific
mathematical equations and inequations. They are also used when the goal is to find a variable
assignment that will maximize or minimize a certain function of those variables. Algorithms for
these problems include the basic brute-force search (also called "naïve" or "uninformed" search),
and a variety of heuristics that try to exploit partial knowledge about structure of the space, such
as linear relaxation, constraint generation, and constraint propagation.
An important subclass are the local search methods, that view the elements of the search space as
the vertices of a graph, with edges defined by a set of heuristics applicable to the case; and scan
the space by moving from item to item along the edges, for example according to the steepest
descent or best- first criterion, or in a stochastic search. This category includes a great variety of
general metaheuristic methods, such as simulated annealing, tabu search, A-teams, and genetic
programming, that combine arbitrary heuristics in specific ways.
This class also includes various tree search algorithms, that view the elements as vertices of a
tree, and traverse that tree in some special order. Examples of the latter include the exhaustive
methods such as depth- first search and breadth-first search, as well as various heuristic-based
search tree pruning methods such as backtracking and branch and bound. Unlike general
metaheuristics, which at best work only in a probabilistic sense, many of these tree-search
methods are guaranteed to find the exact or optimal solution, if given enough time.
Another important sub-class consists of algorithms for exploring the game tree of multiple-player
games, such as chess or backgammon, whose nodes consist of all possible game situations that
could result from the current situation. The goal in these problems is to find the move that
provides the best chance of a win, taking into account all possible moves of the opponent(s).
Similar problems occur when humans or machines have to make successive decisions whose
outcomes are not entirely under one's control, such as in robot guidance or in marketing,
financial or military strategy planning. This kind of problem - combinatorial search - has been
15
extensively studied in the context of artificial intelligence. Examples of algorithms for this class
are the minimax algorithm, alpha-beta pruning, and the A* algorithm.[11]

For sub-structures of a given structure
The name combinatorial search is generally used for algorithms that look for a specific substructure of a given discrete structure, such as a graph, a string, a finite group, and so on. The
term combinatorial optimization is typically used when the goal is to find a sub-structure with a
maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented
in the computer by a set of integer variables with constraints, these problems can be viewed as
special cases of constraint satisfaction or discrete optimization; but they are usually formulated
and solved in a more abstract setting where the internal representation is not explicitly
mentioned.)
An important and extensively studied subclass are the graph algorithms, in particular graph
traversal algorithms, for finding specific sub-structures in a given graph — such as subgraphs,
paths, circuits, and so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest
neighbour algorithm, and Prim's algorithm.
Another important subclass of this category are the string searching algorithms, that search for
patterns within strings. Two famous examples are the Boyer–Moore and Knuth–Morris–Pratt
algorithms, and several algorithms based on the suffix tree data structure. [11]
16
II. Learning to play
2.1 Software agents: autonomous programs
There have been many debates about how a software agent should be defined, and a universally
accepted definition has not been given. We have below some definitions of the software agents
from the point of view of theoreticians and other people in this domain.
Software agents are computational systems that inhabit some complex dynamic environment,
sense and act autonomously in this environment, and by doing so realize a set of goals
or tasks for which they are designed (P.Maes) [12].
Russel and Norvig point out: “the notion of an agent is meant to be a tool for analyzing systems,
not an absolute characterization that divides the world into agents and non-agents” [14]. We can
say that ‘software agent’ or ‘intelligent agent’ can best be seen as an umbrella term for programs
that to some extent display attributes commonly associated with agency. (Nwana H.) [13]
Franklin and Graesser (1996) give their own definitions of what an agent is supposed to be: “an
autonomous agent is a system situated within and part of an environment that senses that
environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it
senses in the future.”[18]
17
Figure 4 -Franklin and Graesser’s (1996) agent taxonomy
Agent characte ristics
There are some characteristics that define a so ftware agent. In the section below we will
enumerate and describe each characteristic as defined and accepted in literature.

Reactive
In order for an agent to function autonomously in any given environment it must be able to
perceive its environment and act in a timely fashion upon changes that occur in it. A software
agent may employ any type and number of sensors to sense its environment. The agent can react
to sensory input using its actuators. We can differentiate between various degrees of reactivity,
ranging from purely reactive software agents on the one hand, to software agents that deliberate
extensively before reacting on the other hand. [15]

Pro-active and goal-oriented
Pro-activity is a more specialized form of autonomy. When an agent is said to be pro-active, it
does not simply act in response to its environment, but it will exhibit goal-directed behavior and
take initiative to attain its goals or design objectives [15].
18

Deliberative
More sophisticated agents are not merely reactive (i.e., operating in a stimulus response manner)
but are able to reason about their actions. The ability to reason enables agents to act pro-actively
and perform more difficult tasks in complex environments. [16]

Continual
In order for a software agent to accomplish its goals, it must have temporal continuity. The agent
must be able to function over a certain period of time with persistence of identity and state.
Agents that have an episodic memory are able to learn from previous experiences [17]

Adaptive
Making agents adaptive is one way of attaining flexibility. Adaptivity can range from (1) being
able to adapt flexibly to short-term, smaller changes in the environment, to (2) dealing with more
significant and long-term (lasting) changes in the environment [12]. Software agents that are
able to deal with longterm changes are able to improve themselves and their performance over
time by storing the knowledge of past experiences within their internal state and taking this
knowledge into account when executing (similar) actions in the future. [16]

Communicative
Agents should be able to communicate with other software agents and even humans in order to
complete their tasks and help other agents complete theirs. [15] Especially in multi-agent
systems the ability to communicate with other agents is important. Agents communicate with
other agents using a common agent language, such as FIPA ACL or KQML. When agents
communicate with humans they must communicate using natural language. [16]

Mobile
Although mobility is neither a necessary nor a sufficient condition for agency, many scholars
(Gilbert et al. 1995; Nwana 1996; Brazier et al. 2003) include mobility when describing agent
characteristics. It is oftentimes better for an agent to interact with a remote system at the location
of the remote system than to do it over a distance. Several reasons for this preferred form of
19
interaction can be specified. A first reason is efficiency. Network traffic can be reduced when the
agent and the remote system are at the same location. For instance, when an agent queries a
remote database, data has to be sent back and forth between the remote database and the agent.
This communication can be kept local when the agent and the remote system are at the same
location, thereby reducing the strain on external networks such as the Internet. A second reason
is that data need not be exchanged over (public) networks but can be handled locally. It also
means that agents can operate more secure. A third reason is that the remote system only allows
for agents to operate locally, thereby forcing the agent to migrate to the remote location.[16]
Agent topologies
In the next section we will investigate different agent topologies and try to place different agents
into classes. A typology refers to the study of types of entities. There are several dimensions to
classify existing software agents.
Firstly, agents may be classified by their mobility, i.e. by their ability to move around some
network. This yields the classes of static or mobile agents. [13]
Secondly, they may be classed as either deliberative or reactive. Deliberative agents derive from
the deliberative thinking paradigm: the agents possess an internal symbolic, reasoning model and
they engage in planning and negotiation in order to achieve coordination with other agents. Work
on reactive agents originate from research carried out by Brooks (1986) and Agre & Chapman
(1987). These agents on the contrary do not have any internal, symbolic models of their
environment, and they act using a stimulus/respo nse type of behavior by responding to the
present state of the environment in which they are embedded (Ferber, 1994). Indeed, Brooks has
argued that intelligent behavior can be realized without the sort of explicit, symbolic
representations of traditional AI (Brooks, 1991b). [13]
20
Figure 5 - Typology based on Nwana’s (Nwana 1996) primary attribute dimension
Thirdly, agents may be classified along several ideal and primary attributes which agents should
exhibit. We have identified a minimal list of three: autonomy, learning and cooperation. We
appreciate that any such list is contentious, but it is no more or no less so than any other
proposal. Hence, we are not claiming that this is a necessary or sufficient set. Autonomy refers to
the principle that agents can operate on their own without the need for human guidance, even
though this would sometimes be invaluable. Hence agents have individual internal states and
goals, and they act in such a manner as to meet its goals on behalf of its user. A key element of
their autonomy is their proactiveness, i.e. their ability to take the initiative rather than acting
simply in response to their environment (Wooldridge & Jennings, 1995a). Cooperation with
other agents is paramount: it is the raison d’être for having multiple agents in the first place in
contrast to having just one. In order to cooperate, agents need to possess a social ability, i.e. the
ability to interact with other agents and possibly humans via some communication language
(Wooldridge & Jennings, 1995a). Having said this, it is possible for agents to coordinate their
actions without cooperation (Nwana et al., 1996). Lastly, for agent systems to be truly smart,
they would have to learn as they react and/or interact with their external environment. In our
view, agents are (or should be) disembodied bits of intelligence. Though, we will not attempt to
define what intelligence is, we maintain that a key attribute of any intelligent being is its ability
to learn. The learning may also take the form of increased performance over time. [13]
21
2.2 Machine learning: making computers act
Tom M.Mitchell is one of the most prominent theoreticians in the field of machine learning and
he gave a widely quoted definition about this term : “A computer program is said to learn from
experience E with respect to some class of tasks T and performa nce measure P, if its
performance at tasks in T, as measured by P, improves with experience E”[19]
Machine learning algorithms are organized into taxonomy, based on the desired outcome of
the algorithm. Common algorithm types include:
• Supervised learning - where the algorithm generates a function that maps inputs
to desired outputs. One standard formulation of the supervised learning task is the
classification problem: the learner is required to learn (to approximate the behavior
of) a function which maps a vector into one of several classes by looking at several
input-output examples of the function.
• Unsupervised learning - which models a set of inputs: labeled examples are not
available.
• Semi-supervised learning - which combines both labeled and unlabeled examples
to generate an appropriate function or classifier.
• Reinforcement learning - where the algorithm learns a policy of how to act given
an observation of the world. Every action has some impact in the environment, and
the environment provides feedback that guides the learning algorithm.
• Transduction - similar to supervised learning, but does not explicitly construct a
function: instead, tries to predict new outputs based on training inputs, training
outputs, and new inputs.
• Learning to learn - where the algorithm learns its own inductive bias based on
previous experience. [19]
Of course the list above can be enlarged with other types of machine learning and all kinds of
hybrid algorithms.
22
2.3 Embedding machine learning in computer games
2.3.1 The history of machine learning and games
The history of the interaction of machine learning and computer game-playing started at the
same time Artificial Intelligence appeared. Arthur Samuel was the first one to apply machine
learning techniques in games, working on his checker-playing program, this way being a pioneer
in machine- learning and game-playing techniques (Samuel, 1959, 1967). He facilitated the
evolution of both fields and since then the intersection of the two can be found regularly in
conferences in their respective fields and have evolved throughout research along the years.
In recent years, the computer games industry has discovered AI as a necessary element
to make games more entertaining and challenging and, vice versa, AI has discovered computer
games as an interesting and rewarding application area. The industry’s perspective is
witnessed by a multitude of recent books on introductions to AI techniques for game
programmers or a series of edited collections of articles. AI research on computer
games began to follow developments in the games industry early on, but since John Laird’s
keynote address at the AAAI 2000 conference, in which he advocated Interactive Computer
Games as a challenging and rewarding application area for AI (Laird & van Lent, 2001) [20]
which demonstrates the growing importance of game-playing applications for Artificial
Intelligence.[21]
2.3.2 What do the games have to learn?
The need of AI in computer games is growing, and with a good user experience must come an
artificial intelligence part as good as it. These needs can be recast as a call for new practical and
theoretical tools to help with [21]:

learning to play the game: Game worlds provide excellent test beds for investigating the
potential to improve agents’ capabilities via learning. The environment can be
constructed with varying characteristics, from deterministic and discrete as in classical
23
board and card games to non-deterministic and continuous as in action computer games.
Learning algorithms for such tasks have been studied quite thoroughly. Probably the bestknown instance of a learning game-playing agent is the Backgammon-playing program
TD-Gammon (Tesauro, 1995) [23].

learning about players: Opponent modeling, partner modeling, team modeling, and
multiple team modeling are fascinating, interdependent and largely unsolved challenges
that aim at improving play by trying to discover and exploit the plans, strengths, and
weaknesses of a player’s opponents and/or partners. One of the grand challenges in this
line of work are games like Poker, where opponent modeling is crucial to improve over
game-theoretically optimal play (Billings et al., 2002) [22].

behavior capture of players: Creating a convincing avatar based on a player’s in- game
behavior is an interesting and challenging supervised learning task. For example, in
Massive Multiplayer Online Role-playing Games (MMORGs) an avatar that is trained to
simulate a user’s game-playing behavior could take his creator’s place at times when the
human player cannot attend to his game character. First steps in this area have been made
in commercial video games such as Forza Motorsport (Xbox) where the player can train a
“Drivatar” that learns to go around the track in the style of the player by observing and
learning from the driving style of that player and generalizing to new tracks and cars.
model selection and stability: Online settings lead to what is effectively the unsupervised
construction of models by supervised algorithms. Methods for biasing the proposed
model space without significant loss of predictive power are critical not just for learning
efficiency, but interpretive ability and end-user confidence. optimizing for adaptivity:
Building opponents that can just barely lose in interesting ways is just as important for
the game world as creating world-class opponents. This requires building highly adaptive
models that can substantively personalize to adversaries or partners with a wide range of
competence and rapid shifts in play style. By introducing a very different set of update
and optimization criteria for learners, a wealth of new research targets are created [21].

model interpretation: “What’s my next move” is not the only query desired of models in
a game, but it is certainly the one which gets the most attention. Creating the illusion of
intelligence requires “painting a picture” of an agent’s thinking process. The ability to
describe the current state of a model and the process of inference in that model from
24
decision to decision enables queries that provide the foundation for a host of social
actions in a game such as predictions, contracts, counter- factual assertions, advice,
justification, negotiation, and demagoguery. These can have as much or more influence
on outcomes as actual in- game actions [21].

performance: Resource requirements for update and inference will always be of great
importance. The AI does not get the bulk of the CPU or memory, and the machines
driving the market will always be underpowered compared to typical desktops at any
point in time [21].
2.3.3 Creating a Learning Agent
We will now see how to implement a learning system into a game, because of the need of the
games to be adaptable. The example we will use is given by Nick Palmer in the article Machine
Learning in Games development and is the example of a team-strategy based paintball game.
The aim of the program is for a team of seven agents to capture the opponent's flag and bring it
back to their starting position. They must do this without being hit by the opposing team's
paintballs. For a start, we shall exclude the team strategy from our discussions, and concentrate
on the individual agents - it is no good having a perfect strategy if the agents aren't smart enough
to survive on their own [24].
We must consider the factors which will influence an agent’s performance in the game. Terrain
is an obvious start point, as this is all that stands between the two teams, so it must be used to the
agent's advantage. Secondly, there must be an element of stealth behind the behavior of each
agent, as otherwise it will be simple to undermine any tactics used during the game by simply
picking off the naïve agents one-by-one [24].
A learning agent is composed of a few fundamental parts: a learning element, a performance
element, a curiosity element (or 'problem generator'), and a performance analyzer (or 'critic').
The learning element is the part of the agent which modifies the agent's behavior and creates
improvements. Tom M.Mitchell names this component in his book “Machine Learning” The
Generalizer [19]. The performance element is responsible for choosing external actions based
25
on the percepts it has received (percepts being information that is known by the agent about its
environment). To illustrate this, consider that one of our agents is in the woods playing paintball.
He is aware of an opposing paintballer nearby. This would be the percept that the agent responds
to, by selecting an action - moving behind a tree. This choice of action is made by the
performance element. This component is called The Performance System by Tom M. Mitchell.
The performance analyzer judges the performance of the agent against some suitable
performance measure. The performance must be judged on the same percepts as those received
by the performance element - the state of affairs 'known' to the agent. When the analysis of
performance has been made, the agent must decide whether or not a better performance could be
made in the future, under the same circumstances. This decision is then passed to the learning
element, which decides on the appropriate alteration to future behavior, and modifies the
performance element accordingly. This component is called The Critic in Tom M. Mitchell’s
book [19][24].
Below we have an illustrated example of the four components above working together as
mentioned in Tom M. Mitchell’s book.
Figure 6 - Tom M. Mitchell's Learning Model
26
How do we make sure that the agent advances in its learning, and doesn't merely confine itself to
previously observed behavior? This is dealt with by the curiosity element (so-called because it
searches for a better solution) which has a knowledge of the desirable behavior of the agent (i.e.
it knows that being shot is not desirable, and that finding the opponent's flag is). To achieve
optimal performance, this element will pose new challenges to the agent in an attempt to prevent
bad habits developing. This component is called The Expe riment Generator by Tom Mitchell
[19].
To understand the benefits of this, we will consider a paintballer who is hiding behind a
tree. From his past experience, he knows that he is safe to stay where he is, and this would result
in an adequate performance. However, the curiosity element kicks in, and suggests that he makes
a break from his cover and heads to a nearby tree which is closer to the enemy flag. This may
result in the agent ultimately being shot at, but could also achieve a more desirable goal. It is
then up to the performance analyzer and the learning element to consider whether there is a
benefit to this change in strategy [24].
Figure 7 - Learning architecture proposed by Nick Palmer
This style of learning is known as reinforcement learning, which means that agent can see the
result of its actions, but is not told directly what it should have done instead. This means that the
agent must use, what is trial and error, to evaluate its performance and learn from mistakes. The
27
advantage to this is that there is no limitation on the behavior, other than the limit to alterations
suggested through the curiosity element. If after each action, the agent was told what its mistake
was and how it should correct its behavior, then the desired behavior must already be understood
[24].
As the learning agent is ultimately part of a game, it must not be left simply to work out for itself
how to play. The agents must be imparted with a fair degree o f prior knowledge about the way to
behave. In the case of paintball, this could include methods for avoiding fire by using cover,
which may later be adapted during the learning process [24].
2.3.4 Proble ms with Learning
Although machine learning can bring a very big improvement in gameplay there are certain
problems that can occur when tying to implement such an agent [24]:

Mimicking Stupidity - When teaching an AI by copying a human player's strategy, we
will find that the computer could be taught badly. This is more than likely when the
player is unfamiliar with a game. In this situation, a reset function may be required to
bring the AI player back to its initial state, or else a minimum level must be imposed on
the computer player to prevent its performance dropping below a predetermined standard.

Overfitting - This can occur if an AI agent is taught only a certain section of a game, and
then expected to display intelligent behavior based on its experience. Using a FPS as an
example, an agent which has learnt from its experience over one level will encounter
problems when attempting a new level, as it may not have learnt the correct 'lessons' from
its performance.

Local Optimality - When choosing a parameter on which the agent is to base its learning,
be sure to choose one which has no dependency on earlier actions. As an example, we
will take a snow-boarding game. The agent learns, through the use of an optimization
algorithm, the best course to take down the ski slope, using its rotation as a parameter.
This may mean that a non-optimal solution is reached, in which any small change cannot
improve performance.
28

Set Behavior - Once an agent has a record of its past behavior and the resulting
performance analysis, will it stick to the behavior which has been successful in the past,
or will it try new methods in an attempt to improve? This is a problem which must be
addressed or else an agent may either try to evaluate every possible behavior, or else stick
to one without finding the optimal solution [24].
29
III. Relevant applications of machine learning in games
In this chapter we will present the most relevant games with machine learning techniques and try
to analyze their structure and algorithms used.
3.1 TD-Gammon
One of the most impressive applications of reinforcement learning lately is presented by Gerry
Tesauro in backgammon (Tesauro, 1992, 1994, 1995). Tesauro's program, TD-Gammon, needed
little knowledge of table, but he learned to play very well, close to the world masters. Learning
algorithm TD-Gammon was a simple combination of algorithm TD (λ) and nonlinear function
approximation using a multilayer neural network trained by TD's errors calculated wit h
backpropagation algorithm. [23]
To apply the learning rule a source of backgammon games was needed. Tesauro obtained an
unending sequence of playing games that teach against the agent himself. To choose movements,
TD-Gammon considered each of the approximately 20 ways he could get the dice and
corresponding positions that would result. The artificial neuronal network was consulted to
estimate each of their values. The movement that would lead to the estimated value was selected.
Continuing in this way, it was possible to easily generate a large number o f table games. Each
game was treated as an episode, the sequence of positions as s0, s1, s2 ... that the rule applied
after each movement of TD.
The network’s weights were established initially as low random values. Initial assessments were
thus completely arbitrary. Since movements were selected based on these assessments, initial
movements were inevitably weak and original games often lasted hundreds or thousands of
moves up to win one. After dozens of games, however, performance has improved rapidly.
30
After playing about 300,000 games against itself, TD-Gammon 0.0 as described above, learned
to play as good as the best previous backgammon computer programs. This was a striking result,
because all previous performance implementations have used the extensive knowledge of table.
For example, the champion at that time was undoubtedly Neurogammon, another program
written by Tesauro, who used neural network learning but not TD. The network used by
Neurogammon was trained with copies provided by a backgammon expert, and, moreover, began
with a set of features specifically designed for backgammon. TD-Gammon 0.0, on the other
hand, was constructed essentially with zero backgammon knowledge. The fact that TD-Gammon
was as good as Neurogammon and the other previous programs demonstrates potential of
learning algorithms in games [23].
3.2 Samuel's checkers program
An important precursor of TD-Gammon was one part of Arthur Samuel’s work (1959, 1967) to
build the learning programs for playing checkers. Samuel was one of the first to effectively use
search heuristics and what we now call temporal-difference learning. Samuel wrote a checkers
program for the IBM 701 in 1952. Its learning program was completed in 1955 and was shown
on television in 1956. Future versions have acquired an ability to play good, but not expert [25].
Samuel used two main methods of learning, the easiest one being called rote learning. It
consisted simply in saving a description of each position of the board met during the game, along
with its value determined by back-up procedure Min-Max. The result was that, if a position has
already been seen again it appeared as a terminal position of a search tree, the search depth was
amplified effectively because the position value is cached earlier.
Rote learning produced slow but continuous improvement, which was most effective for opening
and closing the game. Samuel's program has become a "better than average" program, having
learned from many games with itself, a broad array of human opponents, and in a supervised
learning mode from examples in books [25].
31
3.3 NeuroChess
NeuroChess is a program that learned the game of chess created by Sebastian Thrun. It learns
from a combination of playing with itself and the observation of experts. The search algorithm is
taken from the GNU Chess, but the assessor is from a neural network.
NeuroChess is an application of "explanation-based neural networks" (EBNN). EBNN adjusts
explanation based learning theory for neural networks. NeuroChess represents a chess position as
a feature vector of 175 data by given by hand. It has two neural networks, V and M. V is the
evaluation function, which takes as input an array of features and produces an assessment used to
play. M is a model of chess, which has as input an array of features and the predicted value of
two half- moves later. The result of M is used in the formation of V [26].
After training of M with 120,000 games of master level, and V with other 2400 games,
NeuroChess defeated the GNU Chess about 13% of the games. The same preparation, a version
not using model M, NeuroChess chess wins about 10% of the games against GNU Chess. EBNN
thus improves its performance. Of course these percents will grow with the number of games
played.
Learning from the games against itself has failed. However, learning from the experts first
introduced some artifacts in program evaluations. A clear example of learning from experts is
that the program observes that the probability of winning is greater when experts move their
queen in the middle. The program does not realize that it is moved only when threatened.
Therefore the best results are provided by combining the games with itself and learning from
experts [26].
3.4 SAL
SAL is a system developed by Michael Gherrity learners to play more games.
SAL can apparently learn any two-player board game of perfect information, provided the game
can be played on a rectangular board and uses fixed types of pieces [27].
32
He works the following way:
The user supplies a move generator, written in C, and some constants that declare the size of the
board and the number and type of pieces used in the game. This is SAL’s only knowledge of the
rules of the game.
SAL chooses its moves with a search, like any chess program. It uses a two-ply, full-width
alpha-beta search, plus a consistency search for quiescence.
SAL’s evaluation function is a backpropagation neural network (actually, it uses separate
evaluators for the two sides of the game). The inputs to the network are features representing the
board position, the number of pieces of each type on the board, the type of piece just moved, the
type of piece just captured, and a bunch of ad hoc features having to do with pieces and squares
under attack and threats to win the game [27] .
The evaluator of the neural network is trained by temporal difference methods to estimate the
outcome of the game based on the current state.
SAL has been tested on games tic-tac-toe (X and 0), Connect-four, and chess.
 In tic-tac-toe, SAL learned to play perfect in 20,000 games, which means it is a bit slow.
 At Connect-four, a more complex game, after 100,000 games SAL has learned to
overcome an opponent about 80% of the time.

In chess, SAL has played 4,200 games against GNU Chess, with GNU Chess set to play
at a rate of one move per second. SAL has won eight games and lost others. The first
game shows random play by SAL, since it hadn’t learned anything yet. A drawn game
shows how much SAL improved, though it still plays very weakly. The graphs show that
SAL continues to improve [27].
33
IV. “A Tale of Two Catapults” – machine learning techniques
applied in games
4.1 Problem statement
The application is meant to be a Windows Phone application, a turn-based game with catapults.
The player controls a catapult and he must hit the opponent (in our case an AI opponent). In
order to win, the player needs to hit the opponent three times directly. If the player is hit three
times by the opponent, then the game is lost. The logic of the application is the essential part
because the AI opponent is not a normal one. The application is, in essence an example of how
machine learning techniques are used in video games and more than that. The central idea of this
project is not the game itself but the design of a framework, an age nt-oriented architecture that
can be applied to any turn-based two player game.
4.2 Motivation
Our motivation for this application is the growing need of computer game players for a better
experience and a more capable AI. Through the last decades games evolved drastically due to the
new limits of graphic processors. Along with this evolution came the evolution of the video
gamer, whose needs for a more competent opponent are still a problem nowadays. The problems
of the recent games are that their AI component is too weak, because it is static and limited.
There were some very good examples of machine learning techniques used in video games that
had shown us the great potential of machine learning, but unfortunately they are just a few.
Inspired by Tom M. Mitchell’s book “Machine Learning” we have decided to make a computer
game with a learning component and to show the results obtained.
34
4.3 Analysis and design
4.3.1 Proble m analysis
Being a game that can be only played on the computer, we do not have any human expert to
teach our program different strategies, besides the fact that the game itself is pretty simple. So we
have chosen a learning technique that doesn’t require a direct feedback. The most recommended
learning strategy for this kind of task is reinforcement learning or any other learning that is
works on the same principle, more exactly playing repeated games with itself in order to learn.
That being said we have chosen gradient descent learning for our program, a learning technique
in which the player has to learn a function, called target function which will reflect the
mathematical function that calculates the distance of a projectile that is being shot with an initial
velocity and angle. In order to evaluate a state of the game we will give every state a value
assigned by the target function. For the final game states it is very easy, we decide that the value
of the target function is V(s) = 2000 if the game is won or V(s) = -2000 if the game is lost, s
being the current state of the game we have to assign a value. For the intermediate states we need
an approximation function that must be as relevant as it can in reflecting the real value of the
state.
The mathematical function the program has to learn is D = v0 2 / g · sin (2θ), where v0 is the initial
velocity, g is the gravitational acceleration (9,8 m/s2 ) and θ is the angle formed between the
gound and the catapult arm which fires, given in degrees. Because the above function is too
complex to be learned in that form, it will be expressed in a approximation form, a linear
combination of the features of the environment in a given state.
We consider the following features of the environment to be relevant for a state:
 x1 – the number of hits player one can take;
 x2 – the number of hits player two can take;
 x3 – the distance between the two catapults (in meters);
 x4 – the intensity of the wind (in m/s) and it’s direction;
 x5 – the initial velocity of the projectile;
35
 x6 – the angle where the projectile in shot from, in degrees.
To calculate the approximation function correctly we assign to each feature a weight represented
by a float number and a bias. These weights are changed throughout the learning making the
function converge to what we want. This being said, the approximatio n function can be
expressed in the following way:
Ṽ(s) = w0 +
wi · xi
where x i represents the values of the features,
w i represent the values of the weights and w0
represents the bias.
In order to learn properly, the agent needs some kind of feedback from a critic.In our case that
feedback is given indirectly by the value of a function called training function. For every state s,
the training function can be expressed in the following way:
Vtrain(s) = Ṽ(Succesor(s))
The Successor(s) function is the state that was right after s in the list of training examples
provided by the game’s performance system. We need just oane more adjustment to make the
algorithm work properly: the way weights are recalculated every turn. For that we have chosen
the LMS(least mean squares) algorithm in which every turn wehave to minimize the value of E,
which represents the error made by the sum of squared differences between the target function
and the training function. The ideal situation, in which the agent has learned the function
perfectly the target function is the same with the training function. In this way, each turn,for
every single training example(made from a state and the associated value) every weight will
recalibrate after the following function:
wi = wi + η· (Vtrain(s) - Ṽ(s))· xi
where
η
is a very small constant called learning rate,and the other values have been explained
before.
36
After repeated simulations the target function should represent more and more correctly the
desired function and finally it will be the same.
As stated before, our learning model was inspired by Tom M. Mitchell’s book „Machine
Learning”[19], and our learning model can be modeled in the same way using the following
components:
Experiment
Generator
Updated target function
New initial state
Performance
System
Generalizer
Training examples
Game History
Critic
Figure 8 - Catapult Learning Model
37
4.3.2 Solution’s design
Use case diagram
The application being a simple leads to a simple use case diagram. The player can start the game,
play the game or exit the game. The characteristics of the game will be detailed in future
sections.
Figure 9 - Use case diagram
Class diagram
Because the application has many classes including the UI, made with the XNA Game Studio 4.0
framework together with the logic of the application which include s the abstract classes
representing the agent architecture as well as the implementation of those classes with some
specific algorithms used for this game we will show only the class diagram for the abstract agent
model to clarify the relationship between the classes.
38
Figure 10 - Class diagram
 The diagrams were made using the CASE instrument StarUML
Description of classes
In this section we will describe every component of the model, and present the way it works and
what is its relationship with other classes. The main classes of the agent architecture are: Play,
Simulation, Player, AI, Action, Environment and State. Further, the machine learning part is
given by the classes Learner, Critic, Generalizer, TargetFunction and PerformanceSystem.
Play – this class is practically the core of the model. It has only one abstract method called
“Start” which takes as parameter an initial state. It also has as class members the two players and
the environment the game takes place. From this class two abstract classes are derived:
Simulation and Game.
39
Simulation – this class is as important as its base class Play because it implements the actual
logic of the game on an abstract level, independent of the implementation of the involved
classes. It also has a function “IsComplete” which checks the environment if the game is over.
In the implementation section we will present a code snippet from the algorithm. It also has two
components that facilitate machine learning: the Learner and History which will be discussed
later.
Player – this class represents the abstract form of a player. It has as a member his number in the
turn-based game. This class is further expanded in two classes: AI and Human. It has only one
method ”SelectAction” which chooses the best action from the current state in order to win the
game.
AI – the representation of an AI player in a game, having all the components required to play it
properly. As its parent class it has the “SelectAction” method but in addition it also has two new
elements which are part of the machine learning system. The classes TargetFunction and
PerformanceSystem are specific to a player and each of them has its role in the process of
learning. Details about how these two classes work will be discussed later in this chapter.
Action – this class represents a basic action an agent can do when is his turn. It has only one
method “Execute”, which applies the effect of the selected action of the player over the current
state.
Environment – this class represents the environment in which the game takes place. Usually it
has a current state and a method to update it. The environment is an important element in the
model because it’s one of the key elements in the interaction process. The State class is a
property of the environment that gives a clear description of a state in the whole process as well
as the parameters needed for evaluation.
Learner – this is the main component of the learning mechanism, consisting of the Critic and
Generalizer and combining the two processes using a history of the game as an input. The
History is created during the simulation and based on it the learner adjusts the target function.
40
Critic – this is a component of the learning part which takes the history of the game and
transforms it into training examples which consist primarily of the state of the game and its value
as a float number or any desired format.
Generalizer – this component takes the training examples made by the Critic and adjusts the
weights using the LMS algorithm described above. After the adjustments are made the Learner
stores them so they can persist.
TargetFunction – this class has only one method and no members. Its job is to calculate the
value of the state according to the method we select in the implementation of it.
PerformanceSystem – this is one of the most important parts of the learning system. It is a
property of the AI class and it chooses the best move to make based on the target function and
other action selection algorithms. Because in the beginning the target function is totally wrong
we need a method to expand the search of the best solutions in all the cases. That’s way we
implement an epsilongreedy mechanism for the selection of the best moves, the epsilon’s value
increasing along with the performance of the system. For the generation of random numbers we
used an algorithm proposed by George Marsaglia that generates random numbers according to a
Gaussian continuous distribution [28].
41
4.4 Implementation
The implementation of the program is in C# programming language using the XNA Game Studio
4.0 framework and Windows Phone 7 SDK and as an IDE we used Microsoft Visual Studio
2010. In the following section we will present the frameworks above along with their
characteristics.
Windows Phone 7
The Windows Phone Application Platform enables developers to create engaging consumer
experiences running on a Windows® Phone. It is built upon existing Microsoft® tools and
technologies such as Visual Studio, Expression Blend®, Silverlight®, and the XNA Framework.
Developers already familiar with those technologies and their related tools will be able to create
new applications for Windows Phone without a steep learning curve [29].
Runtimes
With Silverlight and the XNA Framework, all development is done in managed code, in a
protected sandbox allowing for the rapid development of safe and secure applications.
Applications written for Silverlight or the XNA Framework today will run on Windows Phone
with only a minor number of adjustments, such as for screen size or for device-specific features.
The two frameworks, along with Windows Phone–specific components and the Common Base
Class Library provide a substantial number of components for developers to construct
applications on [29].
 Silverlight - Silverlight is the ideal framework for creating Rich Internet Applicationstyle user interfaces. A Windows Phone Silverlight application exposes its UI through a
set of pages. Windows Phone controls that match the Windows Phone visual style can be
used to enhance the look and feel of the UI. Visual Studio or Expression Blend can be
used for designing the XAML-based interfaces. Visual Studio can be used to implement
42
the application logic by utilizing the media rich Silverlight libraries or the base
functionality provided by the Common Base Library.
 XNA Framework - The XNA Framework is composed of software, services, and
resources focused on enabling game developers to be successful developing on Microsoft
gaming platforms. Microsoft provides technology that allows professional developers to
quickly enable games on platforms like Windows Phone, Xbox 360, Zune HD, and
Windows 7. The XNA Framework provides a complete set of managed APIs for game
development. This includes 2D sprite-based APIs that support rotation, scaling,
stretching, and filtering as well as 3D graphics APIs for 3D geometry, textures, and
standard lighting and shading.
 Sensors - A variety of sensors will return data that can be consumed by developers. For
example, multi-touch input, accelerometer, compass, gyroscope, and microphone sensors
will all be accessible by APIs.
 Media – Both Silverlight and the XNA Framework pro vide developers with a
programming model for building rich user experiences that incorporate graphics,
animation, and media. The managed APIs support a variety of media formats and allow
for discovery and enumeration of media on the device and for playback of that media.
 Data - Isolated storage allows an application to create and maintain data in a sandboxed
isolated virtual folder. All I/O operations are restricted to isolated storage and do not have
direct access to the underlying operating system file system. This prevents unauthorized
access and data corruption by other applications.

Location - The Microsoft Location Service for Windows Phone allows application
developers to access the user’s physical location information from a single API.
Developers can query for the current location, subscribe to Location Changed events, set
the desired accuracy of the data, get access to the device heading and speed, and calculate
the distance between points. The Location APIs on the phone will work in conjunction
with the Location Cloud Services [29].
XNA Frame work
The XNA Framework class library is a library of classes, interfaces, and value types that are
included in XNA Game Studio. This library provides access to XNA Framework functionality
43
and is designed to be the foundation on which XNA Game Studio applications, components, and
controls are built [30].
Namespaces

Microsoft.Xna.Framework
Provides commonly needed game classes such as timers and game loops.

Microsoft.Xna.Framework.Audio
Contains low- level application programming interface (API) methods that can load and
manipulate XACT-created project and content files to play audio.

Microsoft.Xna.Framework.Content
Contains the run-time components of the Content Pipeline.

Microsoft.Xna.Framework.Design
Provides a unified way of converting types of values to other types.

Microsoft.Xna.Framework.GamerServices
Contains classes that implement various services related to gamers. These services communicate
directly with the gamer, the gamers’ data, or otherwise reflect choices the gamer makes. Gamer
services include input device and profile data APIs.

Microsoft.Xna.Framework.Graphics
Contains low- level application programming interface (API) methods that take advantage of
hardware acceleration capabilities to display 3D objects.

Microsoft.Xna.Framework.Graphics.PackedVector
Represents data types with components that are not multiples of 8 bits.

Microsoft.Xna.Framework.Input
Contains classes to receive input from keyboard, mouse, and Xbox 360 Controller devices.

Microsoft.Xna.Framework.Input.Touch
Contains classes that enable access to touch-based input on devices that support it.

Microsoft.Xna.Framework.Media
Contains classes to enumerate, play, and view songs, albums, playlists, and pictures.

Microsoft.Xna.Framework.Net
Contains classes that implement support for Xbox LIVE, multiplayer, and networking for XNA
Framework games.
44

Microsoft.Xna.Framework.Storage
Contains classes that allow reading and writing of files [30].
Code snippets of important functions
In this section we will present relevant fragments of the main functions of the algorithms.
The Start() function which represents the actual functionality.
public override void Start(State initState)
{
env.SetInitialState(initState);
env.CurrentState().Display();
history.AddToHistory(env.CurrentState());
while (!IsComplete()) {
if (playersTurn)
{
Action action = player.SelectAction(env.CurrentState());
env.UpdateState(player, action);
playersTurn = false;
}
else
{
Action action = opponent.SelectAction(env.CurrentState());
env.UpdateState(opponent, action);
playersTurn = true;
}
env.CurrentState().Display();
history.AddToHistory(env.CurrentState());
}
learner.Learn(history);
Console.WriteLine("END of simulation");
}
In the function above the variable env represents the environment, the variables player and
opponent represent the two players of the game, learner is the component who actually does the
learning, after the game is over (IsComplete() function) based on the history which is a list of
game states.
45
The Learn() function of the Learner component who combines the critic and Generalizer.
public void Learn(History history)
{
List<TrainingExample> examples = new List<TrainingExample>();
examples = critic.CalculateTrainingFunction(history);
State state = generalizer.AdjustWeights(examples);
SaveValues(state);
}
The AdjustWeights() function of the Generalizer component which implements the LMS
algorithm.
public override State AdjustWeights(List<TrainingExample> exemple)
{
float eta = 0.0000001f;
GameState savedFeatures = new GameState();
State sta = exemple[0].State;
if (sta is GameState)
{
savedFeatures = (GameState)sta;
}
foreach (TrainingExample te in exemple)
{
GameState newState = null;
State st = te.State;
if (st is GameState)
{
newState = (GameState)st;
}
float vtrain = te.Vtrain;
float vaprox = agent.TF.CalculateValue(te.State);
foreach (Feature f in newState.Features)
{
Feature newFeature = savedFeatures.getFeatureById(f.FeatureID);
float weight = 0;
weight = newFeature.Weight + eta * (vtrain - vaprox) * f.Value;
newFeature.Weight = weight;
}
}
return savedFeatures;
}
46
The ChooseBestMove() function of the implementation of the PerformanceSystem is the
component which chooses the next move based on the target function and using the epsilongreedy technique to increase the exploring rate.
public override State ChooseBestMove(State s)
{
GameState current = null;
if (s is GameState)
{
current = (GameState)s;
}
currentState = current;
List<GameState> nextStates = new List<GameState>();
nextStates = GeneratePossibleSuccessors(current);
RandomGenerator.SetSeedFromSystemTime();
GameState bestState = new GameState(current);
float bestValue = agent.TF.CalculateValue(nextStates.ElementAt(randomIndex.Next(0,
nextStates.Count-1)));
double minThreshold = 1;
double maxThreshold = 0;
foreach (GameState g in ScrambleArrayList(nextStates))
{
float approxVal = agent.TF.CalculateValue(g);
if (approxVal > bestValue)
{
double treshold = RandomGenerator.GetNormal(0.5, 0.15);
if (treshold < minThreshold)
minThreshold = treshold;
if (treshold > maxThreshold)
maxThreshold = treshold;
if (Math.Abs(treshold) < 1 - epsilon)
{
bestValue = approxVal;
bestState = g;
}
}
}
nextStates.Clear();
return bestState;
}
47
The CalculateValue() function from the implementation of the TargetFunction class which
implements the formula the target function is calculated.
public override float CalculateValue(State s)
{
GameState state = null;
if (s is GameState)
{
state = (GameState)s;
}
float val = 0;
int p1 = state.Features.Where(x => x.Name == "P1 Lives").First().Value;
int p2 = state.Features.Where(x => x.Name == "P2 Lives").First().Value;
if (p1 == 0) { val = -2000; }
else if (p2 == 0) { val = 2000; }
else
{
foreach (Feature f in state.Features)
{
val += f.Value * f.Weight;
}
}
return val;
}
The Execute() function from the Shoot class, and implementation of Action, where the effects
over the environment are made.
public override State Execute(Player p, State s)
{ ..............
int varDist = dist - wind;
float g = 9.8f;
int range = (int)((Math.Pow(velocity, 2) * Math.Sin(2 * (Math.PI * angle / 180.0))) / 9.8f);
if (Math.Abs(varDist - range) < 5)
{
if (p.playerNr == 1)
{
state.Features.Where(x => x.Name == "P2 Lives").First().Value--;
} else if (p.playerNr == 2)
{
state.Features.Where(x => x.Name == "P1 Lives").First().Value--;
}
}
.............
return state;
}
48
4.5 User manual
As the player starts the application the game will automatically start. The screen will look like
the following screenshot, a simple environment with two catapults: blue and red.
Figure 11 - Screenshot from "A Tale of Two Catapults"
To select the angle and velocity swipe freely on the screen of the phone and release when you
want to shoot. After you hit the opponent three times and he hit you three times the game is over.
You can start over by selecting the game again. Anytime during the application you can exit the
game by pressing the “Back” button on the phone.
In the next section we have a flow chart of how the game should work.
49
Figure 12 - Game flowchart
We tried to optimize the game for the best user experience, but there can be improvements in the
future on the UI as well as on the logic of the game
*The chart was made using the online tool Gliffy, www.gliffy.com
50
4.6 Results obtained
In the following section we will present the performance obtained by our machine learning
program using logs from the application. The first chart represents the perce ntage of games won
over many simulations against himself. The second table will represent the evolution of the
weights over the simulations.
Figure 13 - Percentage of games won
As we can see from the graph above there is little difference between the wins and losses, both of
them being kept almost equal. Although the improvements seem insignificant, the small
differences reflect the evolution of the algorithm and because he is playing against himself about
half of the games are won. The ups and downs of the evolution line are in fact the more
significant changes in the weights’ values, changes who affect the entire function.
51
Initial
Value
0.68
Value after
200 games
0.6731037
Value after
400 games
0.6733158
Value after
600 games
0.6732157
Value after
800 games
0.6702238
Value after
1000 games
0.6715883
Player 1
Lives
Player 2
Lives
Distance
0.1
0.1301638
0.1617171
0.1838343
0.2016241
0.2248101
0.1
0.09603488
0.09260094
0.08035062
0.06279583
0.05061561
3
-0.2506298
0.04197381
0.3638703
-0.1142112
0.5817273
Wind
12
6.957904
4.366641
2.584595
1.598882
0.9481826
Velocity
7
3.011064
2.094112
1.595586
0.9499249
0.8309253
Angle
6
1.566059
0.07657859
-0.01758801
-0.05721566
-0.01786953
Name
Bias
We can observe that the values either grow or decrease, and they tend to converge to a very
small number. But in order to get to that number, they need more training, maybe more relevant
features and maybe other action selectors. Nonetheless, this is the proof the learning is a good
mechanism in artificial intelligence, but also that is one that needs perfection and research.
The values used in the chart and table were taken from the files “Log.txt” and “weightLog.txt”
located in the application resources folder.
*The chart was made using the online tool FooPlot, www.foofplot.com
52
4.7 Future work
53
Conclusions
54
Bibliography
[1] Salen, Katie; Zimme rman, Eric (2003). Rules of Play: Game Design Fundamentals.
MIT Press. p. 80. ISBN 0-262-24045-9
[2] Costikyan, Greg (1994). "I Have No Words & I Must Design". Interactive Fantasy, 2
(1994), 3
[3] Abt, Clark C. Serious Games. Viking Press. 1970. p. 6. ISBN 0-670-63490-5
[4] Avedon, Elliot; Sutton-Smith, Brian (1971). The Study of Games. J. Wiley. p. 405. ISBN
0-471-03839-3
[5] Maroney, Kevin (2001). My Entire Waking Life. The Games Journal. May 2001, 1
[6] Epstein, Susan L. Game Playing: The Next Moves. 1999, Sixteenth National Conference
on Artificial Intelligence, 987 - 993. Menlo Park, Calif.: AAAI Press.
[7] Crawford, Chris (2003). Chris Crawford on Game Design. New Riders. ISBN 0-88134117-7.
[8] Kent, Steven L. The Ultimate History of Video Games. Roseville, Calif.: Prima, 2001
[9] Ted Stahl, Video Game Genres, 2005,
http://www.thocp.net/software/games/reference/genres.htm
[10] Robin, AI Search Techniques. December 13th, 2009
http://intelligence.worldofcomputing.net/ai-search/ai-search-techniques.html
[11] Knuth, Donald. The Art of Computer Programming. Volume 3: Sorting and
Searching. ISBN 0-201-89685-0.
[12] Maes, P. 1995. Modeling Adaptive Autonomous Agents. In Artificial Life: An
Overvie w, Ed. C. G. Langton, 135–162. Cambridge, Mass.: MIT Press
[13] Nwana, H. S. 1996. Software Agents: An Overview. Knowledge Engineering Review,
11(3): 205-244.
[14] Russell, S.J., Norvig, P. (1995). Artificial-intelligence: A Modern Approach,
Engle wood Cliffs, NJ: Prentice Hall
[15] Wooldridge, M., Jennings, N.R. (1995). Intelligent Agents: Theory and Practice,
Knowledge Engineering Review
55
[16] Schermer, B.W., Software agents, surveillance, and the right to privacy: A legislative
frame work for agent-enabled surveillance. Leiden University Press, 2007.
[17] Bradshaw, J. (1998). Software Agents, Menlo Park, California: AAArtificialintelligence Press
[18] Franklin, S., Graesser, A. 1996. Is It an Age nt or Just a Program? A Taxonomy for
Autonomous Agents. In Proceedings of the Third Inte rnational Workshop on Agent
Theories Architectures, and Languages. New York: Springer-Verlag.
[19] Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7
[20] Laird, J. E., & van Lent, M. (2001). Human-level AI’s killer application: Interactive
computer games. AI Magazine, 22(2), 15–26.
[21] Bowling, M., Furnkranz, J., Graepel, T., Musick, R. Machine learning and games,
Mach Learn (2006) 63:211–215 DOI 10.1007/s10994-006-8919-x
[22] Billings, D., Pena, L., Schaeffer, J., & Szafron, D. (2002). The challenge of poker.
Artificial Intelligence, 134(1–2), 201–240, Special Issue on Games, Computers and Artificial
Intelligence
[23] Tesauro, G. (1995). Temporal diffe rence learning and TD-Gammon. Communications
of the ACM, 38(3), 58–68.
[24] Palmer, Nick Machine Learning in Games Development (June 2002) – http://www.aidepot.com/GameAI/Learning.html
[25] Arthur, Samuel (1959-03-03). "Some Studies in Machine Learning Using the Game of
Checkers" (PDF). IBM Journal 3 (3): 210–229. Retrieved 2011-10-31
[26] Thrun, Sebastian Learning to play the game of chess (1995) http://www.robots.stanford.edu/papers/thrun.nips7.neuro -chess.html
[27] Gherrity, Michael (1993). A Game Learning Machine. Ph.D. Thesis, University of
California, San Diego
[28] G. Marsaglia, "Random numbers fall mainly in the planes", Proc. Natl. Acad. Sci.
61(1), 25–28 (1968).
[29] MSDN, Windows Phone, Application Platform Ove rvie w for Windows Phone. March
22, 2012 http://www.ms dn.microsoft.com/en-us/library/ff402531(v=vs.92)
[30] MSDN, XNA Game Studio 4.0, XNA Frame work Class Library.
http://www.ms dn.microsoft.com/en-us/library/bb203940(XNAGameStudio.40).as px
56
Images and diagrams were taken from
1. Figure 1 – OXO http://www.dreamauthentics.blogspot.ro/2011/09/noughts-and-crossesfirst-real.html
2. Figure 2 – Super Mario Bros. http://nintendo-okie.com/tag/new-super-mario-bros-wii/
3. Figure 3 – Assassin’s Creed - http://www.elder-geek.com/2009/09/re-reviewed-assassinscreed/
4. Figure 4 - Franklin and Graesser’s (1996) agent taxonomy - Franklin, S., and Graesser,
A. 1996. Is It an Agent or Just a Program? A Taxonomy for Autonomous Agents. In
Proceedings of the Third International Workshop on Agent Theories Architectures, and
Languages. New York: Springer-Ve rlag.
5. Figure 5 - Typology based on Nwana’s (Nwana 1996) primary attribute dimension Nwana, H. S. 1996. Software Agents: An Overview. Knowledge Engineering Review, 11(3):
205-244
6. Figure 6 – Tom M. Mitchell's Learning Model - Mitchell, T. (1997). Machine Learning,
McGraw Hill. ISBN 0-07-042807-7
7. Figure 7 – Learning architecture proposed by Nick Palme r - Palme r, Nick Machine
Learning in Games Development (June 2002) –http://www.aidepot.com/GameAI/Learning.html
57

Download Report

Machine Learning-based gameplay - Facultatea de Matematică şi

Paperzz.com

Your Paperzz