Learning Goals in Sports Games Jack van Rijswijck1 Department of Computing Science University of Alberta Edmonton, Alberta, Canada T6G 2E8 [email protected] Abstract: The illusion of intelligence is most easily destroyed by predictable or static behaviour. The ability for game characters to learn and adapt to the game player's actions will become increasingly important as the Artificial Intelligence (AI) in games develops. Yet in many games, specifically in all sports games, the AI must be kept in a "sandbox": It must not potentially evolve into nonsensical directions. This paper focuses on a strategy learning experiment as part of an AI architecture under design in collaboration with Electronic Arts for their series of sports games. Keywords: Artificial Intelligence, Learning, Strategy. Introduction The ability to learn from experience is generally regarded as one of the most important future developments in game AI [9]. In most game genres, especially the genre of sports games, any adaptive AI must be prevented from developing nonsensical behaviours. The purpose of this paper is to describe a strategy learning method as part of an AI architecture under development for sports games. The method was tested with the game engine of Electronic Arts’ FIFA 2002 [5]. It uses a behavioural model [1], in which the strategy is one of a number of existing drives that are all active simultaneously. The drives are implemented as force fields, similar to ones that have for example been used in The Sims [7] and in the “Robo-Cup” robotic soccer tournaments [6,8]. This model can be extended to other sports games, and one can also think of using it in Real Time Strategy games. Where AI is informally sometimes as "anything that is not graphics", this paper will adopt the definition that AI refers to those and only those decisions that the human gamer makes as well. This in effect regards the AI as just another gamer. It has the advantage of providing a clean interface between the AI and the rest of the code, and necessarily avoids cheating on the part of the AI. In the case of sports games, the characters within the game are sports players. To avoid confusion, in this paper the human game player will be referred to as the gamer. The players on the soccer field are the game characters, including the ones controlled by the gamers as 1 The author gratefully acknowledges the support of Electronic Arts Canada, the Alberta Research Council, iCORE, IRIS, and NSERC. well as the ones controlled by the AI. Whenever the word player is used, it refers to a character, not a gamer. Learning In this paper, learning refers to acquiring truly new behaviour, as opposed to just modifying the parameters of already existing behaviours. Learning is an attractive prospect, but there are serious concerns that need to be addressed. However, it is possible to create a successful game that features real learning, as for example Black & White [3] demonstrates. Gamers often prefer online gaming against other humans over playing against AI opponents. However, sports games involve entire teams of characters, out of which the gamer only controls one character at any given time. Good AI is then still important, since it must be able not only to work against the opponent, but also to work with the human. In addition, contrary to many other games, the characters in a sports game tend to not die during the game. This gives them ample time to display their level of intelligence. One of the concerns of releasing a game that learns after it ships is that the AI is much more difficult to test. It is worth noting, though, that the developers Black & White feel that the testing problem is actually quite manageable [2]. Another concern is that the AI must not learn the wrong lessons from gamers who are, possibly deliberately, being incompetent. Finally, the learning method must of course be able to run in realtime within the hardware constraints of game platforms. Commercial games are different from Robo-Cup soccer and most other games that academics have traditionally studied. The goal in a commercial game is not to win, but to entertain. When learning from a loss, an adaptive AI should not ensure that it never loses again — just that it never again loses in the same way. Sports games Sports games AI faces one specific challenge that is absent from many other games genres. The game is to simulate events that actually happen in the real world on, and most gamers will be quite familiar with what those events look like. Every feature of sports games carefully tries to re-create the real world example as faithfully as possible; the AI should be no exception. This goes not only for the behaviour of the individual players, but also for the team as a whole. One unique feature of sports games is that ideally the characters and teams should behave like their specific real-world counterparts. In a basketball game, the Shaquille O'Neill character should never go for a lay-up instead of a dunk, and Dennis Rodman should never take a 3point shot. In a soccer game, playing against Brazil should feel different from playing against Italy. This makes the strategy sandbox even smaller; not only should the adapting Brazilian strategy stay sensible, it should stay Brazilian. Behavioural models make this possible. Behavoural models The Finite State Machine (FSM) is a commonly used paradigm for game AI, appearing in classical and fuzzy varieties. One feature of the FSM is that a character is by definition always in just one particular state. Suppose the creature in Figure 1 can be in states like "afraid: flee from enemy" and "hungry: search for food". If the creature chooses to flee, it will also move away from all the food. If it chooses to find food, the hunger state might tell it to head for the nearest food source, but that will take it dangerously close to the enemy. While being hungry, it forgot all about its other goals in life. Figure 1: Various influences In a behavioural model [1], such drives are all active simultaneously, and they have varying levels of influence. In the model known as "schema-based coordination", each behaviour generates a force field, pushing the creature in a certain direction. The force driving the creature is the sum of all these influences. In Figure 1 there would be attractive forces from both food items and a repelling force from the enemy. The result is that the creature can satisfy both drives. Models like these are used for example in the Sims, with their attractiveness landscape, and in Robo-Cup soccer [8]. If a game AI engine attempts to calculate “whether or not” predictions, such as whether or not a character can reach a particular goal or whether or not it should execute a certain task, the resulting behaviour can be rigid and predictable. The problem is analogous: Since the decision is always this or that, the character is confined to a patchwork of regions of identical behaviour. Behavioural approaches can produce more dynamic and fluid behaviour. In sports games like soccer, hockey, and basketball, the drives that govern an agent can be things like "intercept ball", "stay onside", "attack goal", etc. The resulting behaviour satisfies all these goals as much as possible. Each drive has a certain direction and strength. The strength can depend on how important the drive is at a given instant. For example, "intercept ball" is not important when a teammate has the ball. An additional advantage of behavioural models is maintainability. When a new state is added to a FSM, the programmer needs to figure out all the new state transitions, as well as remembering what other already existing goals need to be satisfied at the same time in the new state. By contrast, adding a new behaviour in a behavioural model does not invalidate or duplicate the already existing behaviours, since they are all active simultaneously. Each behaviour just needs to know how important it is at any given time. These urgency levels present an important opportunity: They can be subject to learning. A learning example When something undesirable happens, such as a goal scored by the opponent or maybe even just a shot on goal, it may be possible to go back and adjust the urgency levels of the various behaviours in order to stop the same thing from happening next time. One could even add a new behaviour for each mistake, taking care of avoiding the same mistake in the future. Each learning sample contains information about the situation in which it occurred, and information about what the agents should have done about it. The latter piece of information is a drive pushing the agents in a direction that hopefully avoids the mistake. The urgency of this drive depends on how similar the current situation is to the learning sample. One can think of ways to figure out how similar two situations are, and ways to do this quickly such that the whole procedure is computationally feasible under real-time constraints. But how to figure out the drives -- what should the agents have done to repeat a certain mistake? Figure 2 shows a learning example in FIFA2002. It encodes a situation that occurred in the past, where the blue team ended up scoring. The trace shows the trajectory of the ball during that play. When the trace is solid blue, one of the players of team Blue, indicated by the jersey number, has the ball. When the trace is dotted, the ball is underway from one player to another. White dots indicate that the ball is close enough to the ground to be intercepted by a player; if the ball is too high in the air, the dots are red. The sequence starts with a goal kick by the blue goalkeeper, and ends with player 6 scoring a goal. Figure 2: A learning example If the AI has not learned anything from that sequence, then the same events can happen again. Figures 3 and 4 show two snapshots of the play sequence as it unfolded. In both cases, one of the blue players is about to send off a pass. Figure 3: Blue 3 passes to Blue 11 Figure 4: Blue 11 crosses to Blue 6 Figure 3 shows player 3 about to pass the ball to the location indicated by the dotted circle. Player 11 will run to that location to receive the pass. He can do this because he is closer to the dotted circle than any of the opponent’s players. Figure 4 shows player 11 about to cross the ball in front of the goal. The cross is targeted at player 6, who connects and scores the goal. Again, the player was able to do this because he was closer to the reception point than any of the opponents. What might the other team have learned from this? The mistake was mostly due to defender 5, who allowed attacker 6 to slip past him. In the first snapshot, attacker 6 was still far behind him. Defender 5 could have prevented this by moving closer to the key spot in Figure 4. At an earlier point in the play, defender 2 or 6 could have interfered with attacker 11’s activities by moving closer to the key spot in Figure 3. The defending team does not need to prescribe who goes to those spots, just that someone does. By the same token, it does not matter which one of the attacking players has the ball, just that he has the ball in a location that resembles the one in the learning example. It is the proximity of the ball to one of the key points that matters. Thus the learning example does not give strategic hints to specific players, but rather to specific areas of the field. The same learning example can become active in situations that are not identical but similar, when the ball is near the trajectory of the learning example. It can also be used by the attacking team, in order to try and repeat a successful play. Force fields The various drives of the players act as force fields. One of the drives is the one that represents the adaptive strategy. Pseudocode for this force field is given in the appendix. The behaviour learned from the example in the previous section is a force field that is anchored to the playing field, not to any player in particular. This follows the adage "the intelligence is in the environment, not in the ant", as in the Sims, where the instructions on how to use an object are contained in the object, not in the character that uses it. A force field specifies its influence on the characters, as a function of their location on the soccer field. In addition, it also needs to specify the context in which it applies. A particular learning example is relevant only if the current situation is similar to the one that happened in the learning example. Thus the force depends on two parameters: location, and context. When learning is triggered, the algorithm first needs to choose when the relevant play sequence started. In this case the start of the sequence is defined as the latest time when the scoring team gained possession of the ball, or the latest re-start of play, whichever happened later. Next, the force field is calculated by making all the points on the ball trajectory attractors. This excludes the points where the ball was high in the air, indicated by red dots, since the ball could not be intercepted there. The attractor forces can be calculated as a gravity field, diminishing with the square of the distance. This ensures that not all players head towards the same spots, and also deals with the problem of a single player caught between having to defend two spots: Instead of staying in the middle and defending neither spot, other forces will make the player drift towards one of the two spots and then the stronger attraction will force a commitment to that one. For the learning example of Figure 2, the resulting force field is shown in Figure 5. The field is indicated only on the points of a grid that covers the field. Since the field is well-behaved, it is sufficient to store the field values only on those grid points and use interpolation elsewhere. The grid resolution can be chosen to meet any storage capacity limits. Figure 5 shows the ball trajectory in white, and the salient points on the trajectory are indicated as white dots. Salient points are those where the ball changes possession; the ball can be possessed by a player or by “ground” or “air”. Thus the points where the ball changes from close to the ground to high in the air are also salient points. In this example, the attractive forces are calculated not for all the points on the ball trajectory, but just for the salient points. This may be a sufficiently effective summary of the trajectory. Using the full trajectory requires more computation, but it only needs to be done once. Figure 5: Force field resulting from the learning example The second parameter of the force field, its context, specifies how much influence it carries in any given situation. In fact, it is relevant to the Yellow team if Yellow does not have possession of the ball, and the ball is near the trajectory of the learning example. Thus the strength of the force field depends on the position of the ball. Figure 6 shows the force field resulting from two learning examples, for a situation where the ball is closer to the original trajectory than to the new one. In this situation, the former force is stronger than the latter. Figure 6: Two learning examples As Figure 6 suggests, it fortunately is not necessary to maintain one force field for each learning example, since force fields are additive. For each position of the ball, the strengths of all the force fields can be calculated and the fields can be added together. This results in one net force field for each position of the ball. The collection of positions of the ball can, in turn, also be sampled in a grid and interpolated. At runtime, the position of the ball indexes into one of the force fields. In order to find the resulting force during game play, all that is needed are four direct look-ups and an interpolation. Temporal discounting can be introduced by making the earlier points more attractive than the later points, to encourage the players to disrupt the play as soon as possible. This increases the computational cost, but again that does not matter since it is an offline computation that only needs to be done once. Figure 7 shows such a field. Figure 7: Temporally discounted force field Later in the same sequence it is no longer necessary to control the points that have already been passed. Only the remaining points on the trajectory are interesting. This can be addressed by adding the partial trajectory from each salient point to the end point as a learning example, which is equivalent to temporal discounting where the later points in the sequence are more attractive. This discounting factor is a parameter that can be played with. Before and after After the forces resulting from the learning example have been calculated, the same situation can be re-started to see if the Yellow team have learned anything. Figure 8 shows the result. The play starts approximately the same. It does start deviating a bit when the ball gets to player Blue-3, but the pass to the outside left wing is still fired off. However, defender Yellow-6 has now altered his behaviour sufficiently to get there in time and even manage to intercept the ball. Figures 9 and 10 show a closer look at Yellow-6’s trajectory. In the first case Yellow-6 wastes too much time before heading over to Blue’s pass reception. The second case starts the same, but then sees Yellow-6 turning around sooner and getting back in time. Figure 8: After learning, Yellow intercepts the ball Figure 9: Yellow-6's path before... Figure 10: ... and after learning Discussion The main focus of this learning experiment is to augment, not replace, any existing strategy. This may be compared to the subsumption architecture as used in robotics [4], and corresponds to the behavioural approach of allowing all behaviours to be active, instead of only one. An additional benefit of this is that the learning component is easy to add to an existing commercial games program. The learning model is deliberately kept simple in order to be cheaply calculated at runtime. It involves a one-time calculation which can be done offline, for instance during goal celebration animations. At runtime, a table lookup and an interpolation suffice. The memory requirements can be adjusted as needed, by modifying the solution of the grid on which the force fields are sampled. No attempt is made to discover theoretically optimal solutions, nor to model or predict game situations as they occur. The goal is to let game characters behave human-like, as well as make sure that they do not become unbeatable; for both purposes, optimality is undesirable. Less important than making sure that the AI does not lose is making sure that it does not lose in the same way repeatedly. Another requirement that is kept in mind is that the model should be able to learn from very few examples, since the examples are to be provided by a human gamer playing the game in realtime. The examples shown in this paper all involve changes that occurred as the result of a single training example. In general, very few examples are needed to adjust the playing style sufficiently to disrupt previous mistakes, while not disturbing the overall strategy. There are several parameters and options to play with in the force field model. The decay rate in space, time, and relevance can all be adjusted. The forces can be modeled as gravity fields, with the strength of the force proportional to 1/ r2 where r is the distance to the attractor point, but other fields are also possible. For instance, the field could be proportional to 1/r 2 when r>R and r/R3 when r<R, corresponding to the gravity field of an object of radius R. The intuition behind this type of field is to diminish the attractive force when the location is already under control of the player; near the centre of attraction, the force becomes zero. The experiments in this paper were performed using Electronic Arts’ FIFA 2002 software, but they could also be applied to other sports games, such as hockey and basketball, as well as other games genres, such as Real Time Strategy games. Acknowledgments I am indebted to Electronic Arts Canada, and in particular to John Buchanan, Jason Rupert, and Matt Brown, for their cooperation, and to Jonathan Schaeffer for providing feedback on early drafts of this paper. References 1. Arkin, Ronald C. Behavior-Based Robotics. MIT Press, 1998. 2. Barnes, Jonty, and Jason Hutchens. Testing Undefined Behavior as a Result of Learning. In Steve Rabin, editor, AI Game Programming Wisdom, pages 615-623. Charles River Media, 2002. 3. Black & White. Lionhead Studios / Electronic Arts, 2001. See www.bwgame.com. 4. Brooks, Rodney A. Challenges for Complete Creature Architectures. From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, MIT Press, 1990. 5. FIFA 2002. Electronic Arts, 2001. See www.fifa2002.ea.com. 6. Robo-Cup soccer tournament. See www.robocup.org. 7. The Sims. Maxis / Electronic Arts, 2000. See www.thesims.com. 8. Stone, Peter, and David McAllester. An Architecture for Action Selection in Robotic Soccer. In Jörg P. Müller, Elisabeth Andre, Sandip Sen, and Claude Frasson, editors, Proceedings of the Fifth International Conference on Autonomous Agents, pages 316-323, Montreal, Canada, 2001. ACM Press. 9. Woodcock, Steven. Game AI: The State of the Industry 2001-2002. Game Developer Magazine, July 2002, pages 26-31. Pseudocode Below follows high-level pseudocode for calculating the force field associated with a new training example, updating the existing force fields, and determining the forces at runtime. Let FieldGrid and BallGrid be discrete sets of points both covering the soccer field at some arbitrary resolution. When a new training example arrives, containing a ball trajectory, a force field NewField[p] is calculated where p is the position on the field. foreach p in FieldGrid { foreach t in trajectory { NewField[p] += (t-p) / |t-p|2; } } Note that p and t are vectors, and that |t-p| denotes the length of the vector. The existing forces are encoded in MainField[b,p], which encodes the force at field location p when the ball is in position b. The parameter b specifies the context. The new field is added to the main field with its influence depending on the distance of b to the trajectory. foreach b in BallGrid { /* determine d = distance of ball to trajectory */ d = infinity; foreach t in trajectory { d = min(d, |b-t|2); } /* add NewField to MainField[b] with strength 1/(1+d) */ foreach p in FieldGrid { MainField[b,p] += NewField[p]/(1+d); } } At runtime, the force at field position p when the ball is in position b can be looked up directly by snapping p to the FieldGrid and b to the BallGrid, and looking up MainField[b,p] directly. If the two grids have low resolution, the field can be looked up in the surrounding points and then interpolated.
© Copyright 2025 Paperzz