Establishing an Evaluation Function for RTS games Authors∗ Affiliation Address Email Abstract Any adaptation mechanism needs a good evaluation function that adequately measures the quality of the generated solutions. Because of the high complexity of modern video games, the task to create a suitable evaluation function for adaptive game AI is quite difficult. In our research we aim at fully automatically generating a good evaluation function for adaptive game AI. This paper describes our approach, and discusses the experiments performed in the RTS game Spring. TD-learning is applied for establishing two different evaluation functions, one for a perfect-information environment, and one for an imperfect-information environment. From our results we may conclude that both evaluation functions are able to predict the outcome of a Spring game reasonably well, i.e., they are correct in about 75% of all games played. Introduction Modern video games present a complex and realistic environment in which game AI is expected to behave realistically (i.e., ‘human-like’). One feature of human-like behaviour of game AI, namely the ability to adapt adequately to changing circumstances, has been explored with some success in previous research (Demasi & de O. Cruz 2002; Graepel, Herbrich, & Gold 2004; Spronck, SprinkhuizenKuyper, & Postma 2004). This is called ‘adaptive game AI’. When implementing adaptive game AI, arguably the most important factor is the evaluation function that rates the quality of newly-generated game AI. An erroneous or suboptimal evaluation function will slow down the learning process, and may even result in weak game AI. Due to the complex nature of modern video games, the determination of a good evaluation function is often a difficult task. This paper discusses our work on fully automatically generating a good evaluation function for an RTS game. The outline of the paper is as follows. Our approach for creating adaptive game AI for RTS games is discussed first, followed by a description of how we collect domain knowledge of an RTS game environment in a data store. Subsequently, we show how we establish an evaluation function for RTS games automatically, based on the collected data. ∗ Author names withheld at the request of the organising committee We test the performance of the generated evaluation function and provide the experimental results. We conclude by discussing the results, and looking at future work. Approaches A first approach to adaptive game AI is by incrementally changing game AI to make it more effective. The speed of learning depends on the learning rate. If the learning rate is low, learning will take a long time. If it is high, results will be unreliable. Therefore, an incremental approach is not very suitable for rapidly adapting to observations in a complex video game environment. A second approach is by allowing computer-controlled players to imitate human players. This approach can be particularly successful in games that have access to the Internet to store and retrieve samples of gameplay experiences (Spronck 2005). For this approach to be feasible, a central data store of gameplay samples must be created. Game AI can utilise this data store (1) to establish an evaluation function for games, and (2) as a model to be used by an adaptation mechanism. We follow the second approach in our research. Our experimental focus is on real-time strategy (RTS) games, i.e., simulated war games in which a player needs to gather resources to construct buildings and combat units, and defeat an enemy army in a real-time battle. We use RTS games for their highly challenging nature, which stems from three factors: (1) their high complexity, (2) the large amount of inherent uncertainty, and (3) the need for rapid decision making (Buro & Furtak 2004). In the present research, we use the open-source RTS game Spring. Data Store of Spring Gameplay Experiences We define a gameplay experience as a set of observable features of the environment at a certain point in time. To create a data store of gameplay experiences for the Spring environment, we start by defining a basic set of features to observe. For our first experiments, we decided to use the following six features. 1. Number of units observed (maximum 5000), 2. Types of units observed (maximum 200), 3. Number of units observed of each type, can be enhanced to enable it to deal with imperfect information. Design of the Evaluation Function Of the defined basic set of features, we selected the feature ‘number of units observed of each type’ as the basis of our evaluation function, since we felt that it was the most indicative for the final outcome of a game. Accordingly, the unit-based evaluation function for the game’s status is denoted by X v= wu (cu0 − cu1 ) (1) u where wu is the current weight of the unit u, cu0 is the number of units of type u that the game AI has, and cu1 is the number of units of type u that its opponent has. Obviously, this evaluation function can only produce an adequate result in a perfect-information environment, since otherwise cu1 is unknown. Figure 1: Observation of information in a gameplay environment. Units controlled by the game AI are currently residing in the dark region. In an imperfect-information environment, only information in the dark region is available. In a perfectinformation environment, information for both the dark and the light region is available. 4. Energy statistics (income / usage / storage), 5. Metal statistics (income / usage / storage), 6. Radar visibility. Spring applies a so-called ‘Line of Sight’ visibility mechanism to each unit. This implies that a non-cheating game AI only has access to feature data of those parts of the environment that are visible to its own units (illustrated in Figure 1). When the game AI’s information is restricted to what its own units can observe, we call this an ‘imperfectinformation environment’. When we allow the game AI to access all information, regardless whether it is visible to its own units or not, we call this a ‘perfect-information environment’. Arguably the reliability of an evaluation function is highest when perfect information is used to generate it. For our experiments a data store was generated consisting of three different data sets. The first containing training data collected in a perfect information environment, the second containing test data collected in a perfect information environment, and the third containing test data collected in an imperfect information environment. Learning Unit-type Weights We used TD-learning to establish appropriate values wu for all unit types u. TD-learning has been applied successfully to games, e.g., by Tesauro (1992) for creating a strong AI for backgammon. Our application of TD-learning to generate an evaluation function for Spring is similar to its application by Beal & Smith (1997) for determining piece values in chess. Given a set W of weights wi , i ∈ N, i ≤ n, and successive predictions Pt , the weights are updated as follows (Beal & Smith 1997): ∆Wt = α(Pt+1 − Pt ) t X λt−k ∇W Pk (2) k=1 where α is the learning rate and λ is the recency parameter controlling the weighting of predictions occurring K steps in the past. ∇w Pk is the vector of partial derivatives of Pt with respect to W , also called the gradient of wPk . To apply TD-learning, a series of successive prediction probabilities (in this case: the probability of a player winning the game) must be available. The prediction probability of a game’s status v, P (v) is defined as Evaluation Function for Spring Games 1 (3) 1 + ev where v is the evaluation function of the game’s current status, denoted in Equation 1 (this entails that learning must be applied in a perfect-information environment). Figure 2 illustrates how a game state value v of 0.942 is converted to a prediction probability P (v) of 0.72. This section discusses the automatic generation of an evaluation function for Spring. First, we discuss the design of an evaluation function based on weighted unit counts. Second, we discuss the application of temporal-difference (TD) learning (Sutton 1988) using a perfect-information data store to determine appropriate weights for the evaluation function. Third, we discuss how the established evaluation function In an imperfect-information environment, the evaluation value of the function denoted in Equation 1, which has been learned in a perfect-information environment, will typically be an overestimation. To deal with the imperfect information inherent in the Spring environment, we assume that it is P (v) = Dealing with Imperfect Information Table 1: The number of Spring games collected in the data stores. Figure 2: Conversion from game status to prediction probability (Beal & Smith 1997). possible to map the imperfect feature-data values to a perfect feature-data prediction reliably. A straightforward enhancement of the unit-based evaluation function to this effect, is to scale linearly the number of observed units of each type to the non-observed region of the environment. Accordingly, the approximating unit-based evaluation function is denoted by X ou1 v= wu (cu0 − ) (4) R u where wu and cu0 are as in Equation 1, ou1 is the number of units observed of type u that the opponent has, and R ∈ [0, 1] is the fraction of the environment that is visible to the game AI. If enemy units are homogenously distributed over the environment, the approximating unit-based evaluation function will produce results close to those of the unitbased evaluation function. Experiments This section discusses experiments that test our approach. We will first describe the setup of the experiments, followed by the experimental results. Experimental Setup To test our approach we first collected feature data in the data store. For each player, feature data was gathered during gameplay. We decided to gather data of games where two game AIs are posed against each other. Multiple Spring game AIs are available. Most of these are closed-source, and therefore can only be used as opponents for an adaptive AI. However, we found one AI which was open source, which we labelled ‘AAI’ (for ‘Alexander AI’). We enhanced this AI with the ability to collect feature data in a data store, and the ability to disregard radar visibility so that perfect information on the environment was available. As opposing game AIs, we used AAI itself, as well as four other AIs, namely ‘TSI’ (for ‘Towowa Sztuczna Intelligence’), ‘KAI’ (for ‘Krogothe AI’) , ‘CSAI’ (for ‘C# AI’), and ‘RAI’ (for ‘Reth AI’). Table 1 lists the number of games from which we built the data store. The data collection process was as follows. During each game, feature data was collected every 127 game cycles, which corresponds to the update frequency of AAI. With 30 game cycles per second this resulted in feature data being collected every 4.233 seconds. The games were played on the map ‘SmallDivide’, which is a symmetrical, nonisland map (i.e., a map without water areas). All games were played under identical starting conditions. A game is won by the player who first kills the opponent’s ‘Commander’ unit. We used a MatLab implementation of TD-learning to learn the unit-type weights wu for the evaluation function of Equation 1. Unit-type weights were learned from feature data collected with perfect information from 800 games stored in the training set, namely all the games AAI played against itself (500), KAI (100), TSI (100), and CSAI (100). We did not include feature data collected in games where AAI was pitted against RAI, because we wanted to use RAI to test the generalisation ability of the learned evaluation function. The parameter α was set to 0.1, and the parameter λ was set to 0.95. Both parameter values were chosen in accordance with the research of Beal & Smith (1997). The unit-type weights were initialised to 1.0 before learning started. To evaluate the performance of the learned evaluation functions, we determined to what extent it is capable of predicting the actual outcome of a Spring game. For this purpose we defined the measure ‘absolute prediction’ as the percentage of games of which the outcome is correctly predicted just before the end of the game. A high absolute prediction indicates that the evaluation function has the ability to evaluate correctly a game’s status. It is imperative that an evaluation function cannot only evaluate a game’s status correctly at the end of the game, but also throughout the play of a game. We defined the measure ‘relative prediction’ as the game time at which the outcome of all tested games is predicted correctly at least 50% of the time. Since not all games last for an equal amount of time, we scaled the game time to 100% by averaging over the predictions made for each data point. A low relative prediction indicates that the evaluation function can correctly evaluate a game’s status relatively early in the game. We determined the performance using two test sets. One test set contains feature data collected in a perfect information environment, the other feature data collected in an imperfect information environment. Feature data is collected of 500 games, where AAI is posed against itself (100), KAI (100), TSI (100), CSAI (100), and RAI (100). Table 2: Learned unit-type weights (summary). We first tested the unit-based evaluation function in a perfect-information environment. Subsequently, we tested the unit-based evaluation function in an imperfectinformation environment. Finally, we tested the approximating unit-based evaluation function in an imperfectinformation environment. Results of Learning Unit-type Weights The Spring environment supports over 200 different unit types. During feature collection, we found that in the games played 89 different unit types were used. The unit-based evaluation function therefore learned weights for these 89 unit types. A summary of the results is listed in Table 2. Below, we give three observations on these results. First, we observed that a surprisingly high weight has been assigned to the so-called ‘Construction kbot’ type. Assigning a low weight to this unit type would at first glance seem reasonable, since it is of no use in combat situations. However, at the time the game AI kills a ‘Construction kbot’ not only the opponent’s ability to construct or repair a building will decrease, but it is also likely that the game AI has already penetrated its opponent’s defences, since these units typically reside close to the ‘Commander’. This implies that killing ‘Construction kbots’ is a good indicator of success. Second, we observed that some unit types obtained weights less than zero. This indicates that these unit types are of little use to the game AI and actually are a waste of resources. Notable examples of such unit types are the ‘Stealthy cloakable metal extractor’ and the ‘Light amphibious tank’. The latter one is predictably useless, since our test-map contains no water. Third, when looking into the weights of the unit types directly involved in combat, the ‘Medium assault tank’, ‘Light plasma kbot’ and ‘Bomber’ seem to be the most valuable. Absolute-Prediction Performance Results Using the learned unit-type weights, we determined the absolute-prediction performance of the unit-based evaluation function. Table 3 lists the results for the five trials where AAI was pitted against each of its five opponent AIs. The absolute-prediction performance is the sum of the percentages in the cells ‘Win (Predicted)/Win (Outcome)’ and ‘Loss (Predicted)/Loss (Outcome)’. For the unit-based evaluation function in a perfectinformation environment the absolute-prediction performance in the AAI-AAI (self-play) trial is 65%. In the AAIKAI, AAI-TSI, AAI-CSAI and AAI-RAI trials, it is 88%, 64%, 82% and 74%, respectively. Therefore, the average absolute-prediction performance is 75%. As the obtained absolute-prediction performance consistently predicts more than 50% correctly, we may conclude that the unit-based evaluation function provides an effective basis for evaluating a game’s status. In an imperfect-information environment the unit-based evaluation function consistently ‘predicts’ a single outcome in practically all situations, namely a victory for the first player. This is explained by the fact that since the opponent’s units are only partially observed, the evaluation is heavily biased in favour of the first player. Basically, it considers anything that it does not see as non-existent. From this result, we may conclude that the unit-based evaluation function is incapable of reliably predicting the outcome of a game in an imperfect-information environment. For the approximating unit-based evaluation function in an imperfect-information environment the absolute performace in the AAI-AAI (self-play) trial is 69%. In the AAIKAI, AAI-TSI, AAI-CSAI and AAI-RAI trials, it is 91%, 55%, 87% and 77%, respectively. Therefore, the average absolute-prediction performance is 76%. From this result, we may conclude that the approximating unit-based evaluation function in an imperfect-information environment successfully obtained an absolute-prediction performance comparable to the performance of the unit-based evaluation function in a perfect-information environment. It may even Table 3: Absolute-prediction performance results. Each cell contains the percentage of the total games played that belongs in that cell, for, in sequence, (1) the unit-based evaluation function in a perfect-information environment, (2) the unit-based evaluation function in an imperfect-information environment, and (3) the approximating unit-based evaluation function in an imperfect-information environment. Table 4: Relative-prediction performance results. seem to give slightly better predictions in general, but the difference is not statistically significant. Relative-Prediction Performance Results In Table 4 the relative-prediction performance is listed. We observed that in the AAI-AAI trial the unit-based evaluation function predicts correctly the outcome of more than 50% of the tested games after 34% of the game has been played, and the approximating unit-based evaluation function after 41% of the game has been played. Thus, we see (as expected) that if the opponents are evenly matched, the unit-based evaluation function in a perfect-information environment manages to predict the outcome of a game considerably earlier than the approximating unit-based evaluation function in an imperfect-information environment. In the other trials, the differences between the relative performance of the unit-based evaluation and the approximating unit-based evaluation are more divergent. In the AAICSAI trial, the unit-based evaluation’s relative performance is much better than the approximating unit-based evaluation’s relative performance. In contrast, in the three other trials the opposite is the case. This might seem strange at first glance, but the explanation is rather straightforward: the five AIs differ considerably in quality. CSAI is much stronger than AAI, while AAI is stronger than the other three. Early in the game, when no units of the enemy have been observed yet, the approximating unit-based evaluation function will always predict a victory for the home team. Even though this prediction is meaningless (since it is generated without any information on the enemy), against a weaker AI it is very likely to be correct, while against a stronger AI, it is likely to be false. Therefore, the approximating unit-based evaluation function gives a better relative-performance than the unit-based evaluation function against relatively weak AIs (i.e., KAI, TSI, and RAI), and a much worse performance against a relatively strong AI (i.e., CSAI). The afore-mentioned effect is clearly visible in Figure 3, in which the percentage of outcomes correctly predicted is plotted as a function over time. The figure compares the predictions of the unit-based evaluation function in a perfectinformation environment, with the predictions of the approximating unit-based evaluation function in an imperfectinformation environment. Discussion The current evaluation functions are not able to predict correctly the outcomes of 100% of the tested games, not even at the game’s end. This is explained as follows. Both the unit-based and approximating unit-based evaluation function evaluate exclusively by means of the feature ‘number of units observed of each type’. Tactical positions are not taken into account. A Spring game is not won by obtaining a territorial advantage, but by destroying the so-called ‘Commander’ unit. Thus, even a player who has a material disadvantage may win if the enemy’s ‘Commander’ is taken out. Therefore, an evaluation function based purely on a comparison of material will never be able to predict the outcome of a game always correctly. Nevertheless, with a current absolute-prediction performance of about 75%, there is certainly room for improvement. Some of the additional features we expect to produce improved results when included in the evaluation functions, are information on tactical positions, and information on the phase of the game. This enhancement will be the focus of future work. Conclusions and Future Work In this paper we discussed an approach to automatically generating an evaluation function for game AI in RTS games. From our experimental results, we may conclude Figure 3: Comparison of outcomes correctly predicted as a function over time. that both the unit-based and approximating unit-based evaluation functions effectively predict the outcome of a Spring game expressed in terms of the absolute-prediction performance. The relative-prediction performance, which indicates how early in a game an outcome is predicted correctly, is lower (i.e., better) for the unit-based evaluation function than for the approximating unit-based evaluation function, when the two players are evenly matched. Nevertheless, a relativeprediction performance for the approximating unit-based evaluation function of 41% is still relatively low. This indicates that even in an imperfect-information environment, the approximating unit-based evaluation function manages to predict the final outcome of the game correctly on average before the first half of the game has been played. For future work, we will extend the evaluation functions with additional features and will incorporate a mechanism to evaluate a game’s status dependent on the phase of the game. Our findings will be incorporated in the design of an adaptation mechanism for RTS games. References Beal, D., and Smith, M. 1997. Learning piece values using temportal differences. International Computer Chess Association (ICCA) journal 20(3):147–151. Buro, M., and Furtak, T. M. 2004. Rts games and real-time ai research. In Proceedings of the Behavior Representation in Modeling and Simulation Conference (BRIMS). Arlington VA. Demasi, P., and de O. Cruz, A. J. 2002. Online coevolution for action games. International Journal of Intelligent Games and Simulation 2(3):80–88. Graepel, T.; Herbrich, R.; and Gold, J. 2004. Learning to fight. In Mehdi, Q.; Gough, N.; and Al-Dabass, D., eds., Proceedings of Computer Games: Artificial Intelligence, Design and Education (CGAIDE 2004), 193–200. University of Wolverhampton. Spronck, P.; Sprinkhuizen-Kuyper, I.; and Postma, E. 2004. Online adaptation of game opponent AI with dynamic scripting. International Journal of Intelligent Games and Simulation 3(1):45–53. Spronck, P. 2005. A model for reliable adaptive game intelligence. In Aha, D. W.; Munoz-Avila, H.; and van Lent, M., eds., Proceedings of the IJCAI-05 Workshop on Reasoning, Representation, and Learning in Computer Games, 95–100. Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, DC. Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3:9–44. Tesauro, G. 1992. Practical issues in temporal difference learning. Machine Learning 8:257–277.
© Copyright 2026 Paperzz