Establishing an Evaluation Function for RTS games

Establishing an Evaluation Function for RTS games
Authors∗
Affiliation
Address
Email
Abstract
Any adaptation mechanism needs a good evaluation
function that adequately measures the quality of the
generated solutions. Because of the high complexity of
modern video games, the task to create a suitable evaluation function for adaptive game AI is quite difficult.
In our research we aim at fully automatically generating
a good evaluation function for adaptive game AI. This
paper describes our approach, and discusses the experiments performed in the RTS game Spring. TD-learning
is applied for establishing two different evaluation functions, one for a perfect-information environment, and
one for an imperfect-information environment. From
our results we may conclude that both evaluation functions are able to predict the outcome of a Spring game
reasonably well, i.e., they are correct in about 75% of
all games played.
Introduction
Modern video games present a complex and realistic environment in which game AI is expected to behave realistically (i.e., ‘human-like’). One feature of human-like behaviour of game AI, namely the ability to adapt adequately
to changing circumstances, has been explored with some
success in previous research (Demasi & de O. Cruz 2002;
Graepel, Herbrich, & Gold 2004; Spronck, SprinkhuizenKuyper, & Postma 2004). This is called ‘adaptive game AI’.
When implementing adaptive game AI, arguably the most
important factor is the evaluation function that rates the quality of newly-generated game AI. An erroneous or suboptimal evaluation function will slow down the learning process,
and may even result in weak game AI. Due to the complex
nature of modern video games, the determination of a good
evaluation function is often a difficult task. This paper discusses our work on fully automatically generating a good
evaluation function for an RTS game.
The outline of the paper is as follows. Our approach for
creating adaptive game AI for RTS games is discussed first,
followed by a description of how we collect domain knowledge of an RTS game environment in a data store. Subsequently, we show how we establish an evaluation function
for RTS games automatically, based on the collected data.
∗
Author names withheld at the request of the organising committee
We test the performance of the generated evaluation function and provide the experimental results. We conclude by
discussing the results, and looking at future work.
Approaches
A first approach to adaptive game AI is by incrementally
changing game AI to make it more effective. The speed of
learning depends on the learning rate. If the learning rate
is low, learning will take a long time. If it is high, results
will be unreliable. Therefore, an incremental approach is
not very suitable for rapidly adapting to observations in a
complex video game environment.
A second approach is by allowing computer-controlled
players to imitate human players. This approach can be particularly successful in games that have access to the Internet to store and retrieve samples of gameplay experiences
(Spronck 2005). For this approach to be feasible, a central
data store of gameplay samples must be created. Game AI
can utilise this data store (1) to establish an evaluation function for games, and (2) as a model to be used by an adaptation mechanism. We follow the second approach in our
research.
Our experimental focus is on real-time strategy (RTS)
games, i.e., simulated war games in which a player needs
to gather resources to construct buildings and combat units,
and defeat an enemy army in a real-time battle. We use RTS
games for their highly challenging nature, which stems from
three factors: (1) their high complexity, (2) the large amount
of inherent uncertainty, and (3) the need for rapid decision
making (Buro & Furtak 2004). In the present research, we
use the open-source RTS game Spring.
Data Store of Spring Gameplay Experiences
We define a gameplay experience as a set of observable features of the environment at a certain point in time. To create
a data store of gameplay experiences for the Spring environment, we start by defining a basic set of features to observe.
For our first experiments, we decided to use the following
six features.
1. Number of units observed (maximum 5000),
2. Types of units observed (maximum 200),
3. Number of units observed of each type,
can be enhanced to enable it to deal with imperfect information.
Design of the Evaluation Function
Of the defined basic set of features, we selected the feature
‘number of units observed of each type’ as the basis of our
evaluation function, since we felt that it was the most indicative for the final outcome of a game. Accordingly, the
unit-based evaluation function for the game’s status is denoted by
X
v=
wu (cu0 − cu1 )
(1)
u
where wu is the current weight of the unit u, cu0 is the number of units of type u that the game AI has, and cu1 is the
number of units of type u that its opponent has. Obviously,
this evaluation function can only produce an adequate result
in a perfect-information environment, since otherwise cu1 is
unknown.
Figure 1: Observation of information in a gameplay environment. Units controlled by the game AI are currently residing
in the dark region. In an imperfect-information environment,
only information in the dark region is available. In a perfectinformation environment, information for both the dark and
the light region is available.
4. Energy statistics (income / usage / storage),
5. Metal statistics (income / usage / storage),
6. Radar visibility.
Spring applies a so-called ‘Line of Sight’ visibility mechanism to each unit. This implies that a non-cheating game
AI only has access to feature data of those parts of the environment that are visible to its own units (illustrated in Figure 1). When the game AI’s information is restricted to
what its own units can observe, we call this an ‘imperfectinformation environment’. When we allow the game AI to
access all information, regardless whether it is visible to its
own units or not, we call this a ‘perfect-information environment’. Arguably the reliability of an evaluation function
is highest when perfect information is used to generate it.
For our experiments a data store was generated consisting of
three different data sets. The first containing training data
collected in a perfect information environment, the second
containing test data collected in a perfect information environment, and the third containing test data collected in an
imperfect information environment.
Learning Unit-type Weights
We used TD-learning to establish appropriate values wu for
all unit types u. TD-learning has been applied successfully
to games, e.g., by Tesauro (1992) for creating a strong AI
for backgammon. Our application of TD-learning to generate an evaluation function for Spring is similar to its application by Beal & Smith (1997) for determining piece values
in chess.
Given a set W of weights wi , i ∈ N, i ≤ n, and successive
predictions Pt , the weights are updated as follows (Beal &
Smith 1997):
∆Wt = α(Pt+1 − Pt )
t
X
λt−k ∇W Pk
(2)
k=1
where α is the learning rate and λ is the recency parameter
controlling the weighting of predictions occurring K steps
in the past. ∇w Pk is the vector of partial derivatives of Pt
with respect to W , also called the gradient of wPk .
To apply TD-learning, a series of successive prediction
probabilities (in this case: the probability of a player winning the game) must be available. The prediction probability
of a game’s status v, P (v) is defined as
Evaluation Function for Spring Games
1
(3)
1 + ev
where v is the evaluation function of the game’s current status, denoted in Equation 1 (this entails that learning must be
applied in a perfect-information environment). Figure 2 illustrates how a game state value v of 0.942 is converted to a
prediction probability P (v) of 0.72.
This section discusses the automatic generation of an evaluation function for Spring. First, we discuss the design of
an evaluation function based on weighted unit counts. Second, we discuss the application of temporal-difference (TD)
learning (Sutton 1988) using a perfect-information data store
to determine appropriate weights for the evaluation function.
Third, we discuss how the established evaluation function
In an imperfect-information environment, the evaluation
value of the function denoted in Equation 1, which has been
learned in a perfect-information environment, will typically
be an overestimation. To deal with the imperfect information inherent in the Spring environment, we assume that it is
P (v) =
Dealing with Imperfect Information
Table 1: The number of Spring games collected in the data
stores.
Figure 2: Conversion from game status to prediction probability (Beal & Smith 1997).
possible to map the imperfect feature-data values to a perfect
feature-data prediction reliably. A straightforward enhancement of the unit-based evaluation function to this effect, is to
scale linearly the number of observed units of each type to
the non-observed region of the environment. Accordingly,
the approximating unit-based evaluation function is denoted
by
X
ou1
v=
wu (cu0 −
)
(4)
R
u
where wu and cu0 are as in Equation 1, ou1 is the number of units observed of type u that the opponent has, and
R ∈ [0, 1] is the fraction of the environment that is visible to
the game AI. If enemy units are homogenously distributed
over the environment, the approximating unit-based evaluation function will produce results close to those of the unitbased evaluation function.
Experiments
This section discusses experiments that test our approach.
We will first describe the setup of the experiments, followed
by the experimental results.
Experimental Setup
To test our approach we first collected feature data in the
data store. For each player, feature data was gathered during gameplay. We decided to gather data of games where
two game AIs are posed against each other. Multiple Spring
game AIs are available. Most of these are closed-source,
and therefore can only be used as opponents for an adaptive AI. However, we found one AI which was open source,
which we labelled ‘AAI’ (for ‘Alexander AI’). We enhanced
this AI with the ability to collect feature data in a data store,
and the ability to disregard radar visibility so that perfect information on the environment was available. As opposing
game AIs, we used AAI itself, as well as four other AIs,
namely ‘TSI’ (for ‘Towowa Sztuczna Intelligence’), ‘KAI’
(for ‘Krogothe AI’) , ‘CSAI’ (for ‘C# AI’), and ‘RAI’ (for
‘Reth AI’). Table 1 lists the number of games from which
we built the data store.
The data collection process was as follows. During each
game, feature data was collected every 127 game cycles,
which corresponds to the update frequency of AAI. With
30 game cycles per second this resulted in feature data being collected every 4.233 seconds. The games were played
on the map ‘SmallDivide’, which is a symmetrical, nonisland map (i.e., a map without water areas). All games were
played under identical starting conditions. A game is won by
the player who first kills the opponent’s ‘Commander’ unit.
We used a MatLab implementation of TD-learning to
learn the unit-type weights wu for the evaluation function
of Equation 1. Unit-type weights were learned from feature data collected with perfect information from 800 games
stored in the training set, namely all the games AAI played
against itself (500), KAI (100), TSI (100), and CSAI (100).
We did not include feature data collected in games where
AAI was pitted against RAI, because we wanted to use RAI
to test the generalisation ability of the learned evaluation
function. The parameter α was set to 0.1, and the parameter λ was set to 0.95. Both parameter values were chosen
in accordance with the research of Beal & Smith (1997).
The unit-type weights were initialised to 1.0 before learning
started.
To evaluate the performance of the learned evaluation
functions, we determined to what extent it is capable of
predicting the actual outcome of a Spring game. For this
purpose we defined the measure ‘absolute prediction’ as the
percentage of games of which the outcome is correctly predicted just before the end of the game. A high absolute prediction indicates that the evaluation function has the ability
to evaluate correctly a game’s status.
It is imperative that an evaluation function cannot only
evaluate a game’s status correctly at the end of the game, but
also throughout the play of a game. We defined the measure
‘relative prediction’ as the game time at which the outcome
of all tested games is predicted correctly at least 50% of the
time. Since not all games last for an equal amount of time,
we scaled the game time to 100% by averaging over the predictions made for each data point. A low relative prediction
indicates that the evaluation function can correctly evaluate
a game’s status relatively early in the game.
We determined the performance using two test sets. One
test set contains feature data collected in a perfect information environment, the other feature data collected in an imperfect information environment. Feature data is collected
of 500 games, where AAI is posed against itself (100), KAI
(100), TSI (100), CSAI (100), and RAI (100).
Table 2: Learned unit-type weights (summary).
We first tested the unit-based evaluation function in
a perfect-information environment.
Subsequently, we
tested the unit-based evaluation function in an imperfectinformation environment. Finally, we tested the approximating unit-based evaluation function in an imperfectinformation environment.
Results of Learning Unit-type Weights
The Spring environment supports over 200 different unit
types. During feature collection, we found that in the games
played 89 different unit types were used. The unit-based
evaluation function therefore learned weights for these 89
unit types. A summary of the results is listed in Table 2.
Below, we give three observations on these results.
First, we observed that a surprisingly high weight has
been assigned to the so-called ‘Construction kbot’ type. Assigning a low weight to this unit type would at first glance
seem reasonable, since it is of no use in combat situations.
However, at the time the game AI kills a ‘Construction kbot’
not only the opponent’s ability to construct or repair a building will decrease, but it is also likely that the game AI has
already penetrated its opponent’s defences, since these units
typically reside close to the ‘Commander’. This implies that
killing ‘Construction kbots’ is a good indicator of success.
Second, we observed that some unit types obtained
weights less than zero. This indicates that these unit types
are of little use to the game AI and actually are a waste
of resources. Notable examples of such unit types are the
‘Stealthy cloakable metal extractor’ and the ‘Light amphibious tank’. The latter one is predictably useless, since our
test-map contains no water.
Third, when looking into the weights of the unit types directly involved in combat, the ‘Medium assault tank’, ‘Light
plasma kbot’ and ‘Bomber’ seem to be the most valuable.
Absolute-Prediction Performance Results
Using the learned unit-type weights, we determined the
absolute-prediction performance of the unit-based evaluation function. Table 3 lists the results for the five trials where
AAI was pitted against each of its five opponent AIs. The
absolute-prediction performance is the sum of the percentages in the cells ‘Win (Predicted)/Win (Outcome)’ and ‘Loss
(Predicted)/Loss (Outcome)’.
For the unit-based evaluation function in a perfectinformation environment the absolute-prediction performance in the AAI-AAI (self-play) trial is 65%. In the AAIKAI, AAI-TSI, AAI-CSAI and AAI-RAI trials, it is 88%,
64%, 82% and 74%, respectively. Therefore, the average
absolute-prediction performance is 75%. As the obtained
absolute-prediction performance consistently predicts more
than 50% correctly, we may conclude that the unit-based
evaluation function provides an effective basis for evaluating a game’s status.
In an imperfect-information environment the unit-based
evaluation function consistently ‘predicts’ a single outcome
in practically all situations, namely a victory for the first
player. This is explained by the fact that since the opponent’s
units are only partially observed, the evaluation is heavily biased in favour of the first player. Basically, it considers anything that it does not see as non-existent. From this result,
we may conclude that the unit-based evaluation function is
incapable of reliably predicting the outcome of a game in an
imperfect-information environment.
For the approximating unit-based evaluation function in
an imperfect-information environment the absolute performace in the AAI-AAI (self-play) trial is 69%. In the AAIKAI, AAI-TSI, AAI-CSAI and AAI-RAI trials, it is 91%,
55%, 87% and 77%, respectively. Therefore, the average
absolute-prediction performance is 76%. From this result,
we may conclude that the approximating unit-based evaluation function in an imperfect-information environment
successfully obtained an absolute-prediction performance
comparable to the performance of the unit-based evaluation
function in a perfect-information environment. It may even
Table 3: Absolute-prediction performance results. Each cell
contains the percentage of the total games played that belongs in that cell, for, in sequence, (1) the unit-based evaluation function in a perfect-information environment, (2) the
unit-based evaluation function in an imperfect-information
environment, and (3) the approximating unit-based evaluation function in an imperfect-information environment.
Table 4: Relative-prediction performance results.
seem to give slightly better predictions in general, but the
difference is not statistically significant.
Relative-Prediction Performance Results
In Table 4 the relative-prediction performance is listed. We
observed that in the AAI-AAI trial the unit-based evaluation
function predicts correctly the outcome of more than 50% of
the tested games after 34% of the game has been played, and
the approximating unit-based evaluation function after 41%
of the game has been played. Thus, we see (as expected) that
if the opponents are evenly matched, the unit-based evaluation function in a perfect-information environment manages to predict the outcome of a game considerably earlier
than the approximating unit-based evaluation function in an
imperfect-information environment.
In the other trials, the differences between the relative performance of the unit-based evaluation and the approximating unit-based evaluation are more divergent. In the AAICSAI trial, the unit-based evaluation’s relative performance
is much better than the approximating unit-based evaluation’s relative performance. In contrast, in the three other trials the opposite is the case. This might seem strange at first
glance, but the explanation is rather straightforward: the five
AIs differ considerably in quality. CSAI is much stronger
than AAI, while AAI is stronger than the other three. Early
in the game, when no units of the enemy have been observed
yet, the approximating unit-based evaluation function will
always predict a victory for the home team. Even though
this prediction is meaningless (since it is generated without
any information on the enemy), against a weaker AI it is very
likely to be correct, while against a stronger AI, it is likely
to be false. Therefore, the approximating unit-based evaluation function gives a better relative-performance than the
unit-based evaluation function against relatively weak AIs
(i.e., KAI, TSI, and RAI), and a much worse performance
against a relatively strong AI (i.e., CSAI).
The afore-mentioned effect is clearly visible in Figure 3,
in which the percentage of outcomes correctly predicted is
plotted as a function over time. The figure compares the predictions of the unit-based evaluation function in a perfectinformation environment, with the predictions of the approximating unit-based evaluation function in an imperfectinformation environment.
Discussion
The current evaluation functions are not able to predict correctly the outcomes of 100% of the tested games, not even
at the game’s end. This is explained as follows. Both the
unit-based and approximating unit-based evaluation function evaluate exclusively by means of the feature ‘number
of units observed of each type’. Tactical positions are not
taken into account. A Spring game is not won by obtaining a territorial advantage, but by destroying the so-called
‘Commander’ unit. Thus, even a player who has a material
disadvantage may win if the enemy’s ‘Commander’ is taken
out. Therefore, an evaluation function based purely on a
comparison of material will never be able to predict the outcome of a game always correctly. Nevertheless, with a current absolute-prediction performance of about 75%, there is
certainly room for improvement. Some of the additional features we expect to produce improved results when included
in the evaluation functions, are information on tactical positions, and information on the phase of the game. This enhancement will be the focus of future work.
Conclusions and Future Work
In this paper we discussed an approach to automatically
generating an evaluation function for game AI in RTS
games. From our experimental results, we may conclude
Figure 3: Comparison of outcomes correctly predicted as a
function over time.
that both the unit-based and approximating unit-based evaluation functions effectively predict the outcome of a Spring
game expressed in terms of the absolute-prediction performance.
The relative-prediction performance, which indicates how
early in a game an outcome is predicted correctly, is lower
(i.e., better) for the unit-based evaluation function than for
the approximating unit-based evaluation function, when the
two players are evenly matched. Nevertheless, a relativeprediction performance for the approximating unit-based
evaluation function of 41% is still relatively low. This indicates that even in an imperfect-information environment,
the approximating unit-based evaluation function manages
to predict the final outcome of the game correctly on average before the first half of the game has been played.
For future work, we will extend the evaluation functions
with additional features and will incorporate a mechanism
to evaluate a game’s status dependent on the phase of the
game. Our findings will be incorporated in the design of an
adaptation mechanism for RTS games.
References
Beal, D., and Smith, M. 1997. Learning piece values using temportal differences. International Computer Chess
Association (ICCA) journal 20(3):147–151.
Buro, M., and Furtak, T. M. 2004. Rts games and real-time
ai research. In Proceedings of the Behavior Representation
in Modeling and Simulation Conference (BRIMS). Arlington VA.
Demasi, P., and de O. Cruz, A. J. 2002. Online coevolution for action games. International Journal of Intelligent
Games and Simulation 2(3):80–88.
Graepel, T.; Herbrich, R.; and Gold, J. 2004. Learning to
fight. In Mehdi, Q.; Gough, N.; and Al-Dabass, D., eds.,
Proceedings of Computer Games: Artificial Intelligence,
Design and Education (CGAIDE 2004), 193–200. University of Wolverhampton.
Spronck, P.; Sprinkhuizen-Kuyper, I.; and Postma, E.
2004. Online adaptation of game opponent AI with dynamic scripting. International Journal of Intelligent Games
and Simulation 3(1):45–53.
Spronck, P. 2005. A model for reliable adaptive game intelligence. In Aha, D. W.; Munoz-Avila, H.; and van Lent,
M., eds., Proceedings of the IJCAI-05 Workshop on Reasoning, Representation, and Learning in Computer Games,
95–100. Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, DC.
Sutton, R. S. 1988. Learning to predict by the methods of
temporal differences. Machine Learning 3:9–44.
Tesauro, G. 1992. Practical issues in temporal difference
learning. Machine Learning 8:257–277.

Download Report

Establishing an Evaluation Function for RTS games

Paperzz.com

Your Paperzz