University of Texas, Austin Supervising Professor: Dr. Risto Miikkulainen An Analysis of Automated Decision Making Methodologies in Role Playing Video Games Centralized Approach Christopher Bush Spring 2010 Table of Contents Abstract ......................................................................................................................................................... 3 Introduction .................................................................................................................................................. 3 Hypothesis................................................................................................................................................. 3 Background ................................................................................................................................................... 4 rtNEAT ....................................................................................................................................................... 4 Dungeons and Dragons, MUDs & MMORPGs ........................................................................................... 4 Implementation & Domain ........................................................................................................................... 5 KinchoMUD ............................................................................................................................................... 5 SDK ........................................................................................................................................................ 5 Turn Based Combat ............................................................................................................................... 7 Experiment 0: 1v1 Battle Simulation .................................................................................................... 7 Parties ................................................................................................................................................... 9 Agent Decision Centralization ..................................................................................................................... 10 Prior Work ............................................................................................................................................... 11 Fractured Decision Space .................................................................................................................... 11 Experiments ................................................................................................................................................ 11 Actions and Mobs ................................................................................................................................... 12 Actions (Casting mob stats affected in parenthesis) .......................................................................... 12 Mobs ................................................................................................................................................... 12 Experiment 1: Heterogenerous 2v1, Hybrid Fitness ............................................................................... 13 Experiment 2: Heterogeneous 3v1, Hybrid Fitness ................................................................................ 14 Experiment 3: Homogeneous 2v1, Hybrid Fitness .................................................................................. 15 Experiment 4: Homogeneous 2v1, Damage Fitness Focus ..................................................................... 16 Conclusion ................................................................................................................................................... 17 Bibliography ................................................................................................................................................ 18 Abstract This paper analyzes an approach at evolving intelligent agents to work together to perform and accomplish a common task in a role-playing video game environment. A platform was created that allows research into the capabilities of automated neuro-evolving agents in team (party) environments to work together to defeat a common enemy or perform a common task. This research focuses on a centralized methodology to accomplish this. All party members share a common agent, or brain. Party members are given a set of qualities that contain information that either a player or a computer enemy would need in a MUD or MMORPG combat situation. These qualities are then used by the centralized agent to make decisions for each party member to perform their tasks. This shared agent is then rewarded based on the performance of the party as a whole. This research looks into the effectiveness and limitations of that centralized agent in this environment. Introduction In an environment in which multiple, separate individuals share a common goal, it may be beneficial for these individuals to work together to complete tasks to accomplish this goal. Many of these individuals may have different abilities or even a subset of abilities necessary to perform the tasks at hand, so they may need to learn to work together with others to find ways to be successful. Given this, these individuals then become part of a party, all with a shared purpose. In performing tasks, many questions are presented concerning these parties. How should these parties be organized? In learning ways to accomplish tasks to attain a goal, how should individuals be rewarded for positive actions? If the party members are all homogeneous in learning and skill, will unique roles emerge that the party members take to accomplish tasks? If the party members are heterogeneous, can they use their abilities to work together meaningfully? What information do party members need to know about one another to make proper decisions? And, most importantly, should they be rewarded individually, or as a party? This paper addresses these questions utilizing a centralized approach to learning in a game domain that was created to mimic role playing video games. Teams are created of either homogeneous or heterogeneous individuals; these individuals all share a common goal and brain. The shared goal is to defeat a common enemy in a battle simulation; this shared brain makes decisions and is rewarded based on the party’s success in defeating their enemies. Hypothesis As the number of party members grows in this domain, the number of inputs to the shared brain increases drastically, as does the importance of those inputs at any point during battle. I expect to see tremendous success in the experiments in parties with a low number of members and actions per member, but as the number of members and actions increase, rtNEAT will be incapable of making any sense out of the inputs presented. Background rtNEAT rtNEAT is a platform that allows for a “method for evolving increasingly complex artificial neural networks in real time, as a game is being played.” (Kenneth O. Stanley, 2005) In other words, behaviors can be evolved during gameplay that can possibly change the actions performed by rtNEAT backed characters in a game. At the most basic level, a population of agents are created that each have a uniform set of inputs and outputs. These inputs run through neural networks to determine the values of the outputs for each agent. rtNEAT allows a developer to assign a fitness to an agent based on their performance of the task at hand. After a period of time, neuroevolution takes place which removes the weakest agents from the population and replaces them with a merged version of the better performing agents. Over time, this could allow agents to learn behaviors in a game that maximize their success given a task at hand. (Kenneth O. Stanley, 2005, Pg. 15) Dungeons and Dragons, MUDs & MMORPGs In 1974, the popular fantasy role-playing game Dungeons and Dragons was released for the first time. “The game provides rules for creating and playing heroes in a fantasy world filled with unbelievable magic, fierce dragons, and brave knights.” (Wizards of the Coast). Players are able to come up with their own characters, form parties with other characters that their friends create, and go through adventures that a person (the Dungeon Master) running the game develops. From 1975 to 1977, around the same time as the initial development of Dungeons and Dragons, two single player, text based role playing games were under development. In 1975, Will Crowther was working on and releaed Adventure, the “first popular computer adventure game” for the DEC PDP-10 computer. It was later extended by Don Woods at Stanford University in 1976. On the most basic level, Adventure was a computerized game similar to Dungeons and Dragons in which the dungeon master was the program driving Adventure. Inspired by Adventure, 1977 saw the release of Zork, which turned out to be much more successful and wide reaching than its predecessor. (Bartle) In 1978, a multiplayer version of this new type of text based computer game was released in 1978 by Roy Trubshaw simply called MUD, or Multi-User Dungeon. It allowed multiple players to control characters that interact together in this world to complete different tasks. From there, many other offshoots and extensions of the original MUD game were released until the mid-1990s (Bartle), but quickly lost popularity once graphical forms of the games were released, the first of which was Neverwinter Nights, released in 1991. (Daglow, 2008) In 1999, after more than 3 years of development, Verant Interactive released Everquest. By 2004, Everquest saw more than 450,000 subscribers (Wolf, 2008). This popularity pales in comparison to World of Warcraft, released in 2004. By 2008, World of Warcraft boasted more than 10 million subscribers worldwide. (Alexander, 2008) Despite this popularity, most of these games rely heavily on scripted actions for the nonplayable computer enemies (henceforth to be referred to as mobs). Also, as players learn the behaviors of these mobs, the difficulty of these games can drastically drop. We naturally saw a potential opening here for rtNEAT, which led to the following question: Can rtNEAT be successfully utilized within this genre of games to create interesting, non-scripted gameplay experiences and behaviors behind the characters and objects with which players interact? Implementation & Domain KinchoMUD As MUDs are the foundation of MMORPGs such as World of Warcraft and Everquest, rtNEAT research surrounding MUDs seemed to be a logical place to begin. SDK In the summer of 2009, we began development on a software development kit (SDK) that could be utilized for development of a MUD. It was decided that for our purposes, starting from scratch was necessary for educational purposes on the game development front, licensing purposes if we ever decide to professionally release the game, and that rtNEAT integration would be simpler for our research purposes. The underlying purpose of the SDK was to allow a developer to build a world similar to one in a standard MUD that would contain rooms, playable characters, interactive objects, non-playable characters, and a battle system between the characters. The long-term goal is to allow this world to be built through markup files (XML) editable at runtime so that the developers do not even have to touch the source to change the content of the game itself. Needless to say, the SDK is still a work in progress and currently consists of: - - - - Game Loop: Entry point of the game. Kicks off a login page (LoginPage) and the building of the world (Chain). LoginPage: In place to allow a player to log in to the game. Currently used to point to a Mob the player controls when traversing the world. Chain: Singleton object that controls the current state of the world at any given moment. All actions taken by mobile objects in the world must get permission from Chain before a change in the world is allowed to happen. Area: A set of rooms logically placed together. Long term goal is to have one area loaded into memory at a time. Room: A room is a section of the world that a Mob can interact with at any given time. Long term goal is to also have a room contain objects with which a Mob can interact. Mob: A mobile unit, often time party member, in the world, either player controlled or automated. Action: A behavior or activity a Mob can take in a variety of situations in the game. There are currently types of actions: o Physical Damage (Instant): Actions that damage an enemy target once with no affect on stats. o Magical Damage (Instant): Actions that damage an enemy target once with an effect on a stat container such as health or mana. o Damage over Time (DoT): Actions that continuously damage an enemy target for more than one turn of battle; generally affect a stat container. o Heal (Instant): Restores a stat container, generally at the expense of another container. o Heal over Time (HoT): Restores a stat container over multiple turns, generally at the expense of another container. o Buff: Increases a stat for a number of turns, generally specified by a stat. o Debuff: Decreases a stat for a number of turns, generally specified by a stat. o Support: Any action that supports a party member; for instance, covering a fellow party member from taking damage from an enemy Mob. Stat: An attribute a Mob contains that affect the outcome of attempted Actions. Containers are stats that affect things like the health of the Mob or the number of certain Actions a mob can perform in a battle. Console: A set of procedures to print to the console. In place to allow the way the game is presented to the user to be easily changed. Battle: Currently a set of simulated battles for rtNEAT experimentation purposes. Kicked off by an Action taken by the human player’s Mob. The SDK also contains a few items that sit as a level of abstraction between kinchoMUD and rtNEAT that allow for rtNEAT integration in the game: - BrainFactory: Keeps track of all rtNEAT agents (brains) that Mobs are attached to in the game. MobBrain: The rtNEAT agent attached to a Mob. Commander: In place for party code experimentation. The commander concept is explained below. Turn Based Combat The battle simulation is a turn based combat system similar to ones that were common in video games in the mid 1990s. One mob makes a decision at a time, the order of which is determined by a stat. Mobs may be able to make decisions multiple times before another mob is allowed to do so depending on this stat. Essentially this creates a queue of acting mobs throughout the battle, and mobs may be in this queue multiple times. All mobs, regardless of party, are included in this queue, so an enemy mob will have turns mixed in with mobs in the player party. This system allows for strict progression concerning decision-making, and allows for survivability to be determined based on the number of turns each individual mob survives. The figure above represents the turn based combat queue with 4 acting mobs (represented by the shapes in the queue). When a mob is ready to act, it is added to the queue, and when the mob reaches the front of the queue, it is given the opportunity to make a decision. Experiment 0: 1v1 Battle Simulation In the fall of 2009, we took what we had from the SDK into a CS 370 and extended upon it to allow for a first set of experiments to be run utilizing rtNEAT. A five room world was created that contained a Mob with which the player could login as and an enemy Mob in the basement of the world we created. A player could traverse this world, find this mob, and start a simulated battle in which a player had a set of actions that could be performed against this enemy Mob. The logical goal we set out for rtNEAT was for the player to deal the maximum amount of damage to the enemy Mob as possible given any set of actions available to it. rtNEAT Setup A population generally between 10 and 50 is set up for the agents. At the beginning of each battle, the player mob requests an organism from this population, and following the battle, the organism is retired with its calculated fitness. Once the population has been run through once, neuroevolution takes place and the battle simulation cycle continues with the new set of agents. Fitness The fitness function is set in a way that there is an injective mapping between the amount of damage dealt to the enemy mob and the fitness. In other words, for every point of damage dealt per battle, one fitness point is given to the player Mob. Rewards Rewards are calculated at the end of each battle. This allows the agent to have a chance to perform all necessary actions to maximize their fitness. Fitness rewards and organism retirement were attempted at the end of each turn; unfortunately this did not allow enough time for the agents to learn anything useful. Stats (Inputs) The player mob could see the following stats: - HP: The mob’s current health. If this drops to 0, the mob dies. MP: The mob’s current “spell” points. Certain actions could be performed at the expense of these points. Actions (Outputs) We gave the player mob the following set of actions: - HIT: An instant physical damage attack. CURE: An instant heal to recover the player’s HP. The player dies if their HP drops below 0. A player can cast this at the cost of MP. FIRE: An Instant Magical Damage attack. A player can cast this at the cost of MP. POISON: A DoT spell for the duration of the battle. The player can cast this at the cost of MP. Casting more than once does not increase effectiveness of the spell. CRIT: An instant physical damage attack stronger than HIT that can be used at the cost of HP. IDLE: Do nothing. This was in place for sanity’s sake; a mob should NEVER idle if it can hit. Inputs and outputs into an Experiment 0 mob brain. Results In almost every case, the agents would relatively quickly evolve to find a set and pattern of actions that would perform the maximum amount of damage possible before dying. Two of the most interesting follow: Instant Damage/Instant Heal Pattern In the case that other MP absorbing actions were relatively weak, a mob would HIT until its HP was 1, cast CURE to restore its HP to full, and then repeat that cycle until its MP was 0. It was even seen in a few instances that the mobs even learned to CRIT when its MP was 0 and HP was 1. In the case of learning to CRIT, it was found that the extra damage done was negligent to a point that the learned action would not arise very often. If we were to increase CRIT to have too high a damage, then it would be beneficial to the mob to just CRIT instead of do anything else as the fitness would actually be higher. But, if learning was allowed to continue long enough, these mobs would essentially sacrifice their lives to maximize their damage if they knew their death was inevitable. DoT/Instant Damage/Instant Heal Pattern As far as POISON is concerned, it was set up so that POISON would not have increased effectiveness if cast twice. We set it up so that the maximal amount of damage possible was to spend 1 point of MP casting poison at the very beginning of the battle, then to perform the HIT/CURE/CRIT pattern mentioned above. The agents now had to learn that they want to cast POISON once at the beginning of the battle, cast CURE when their HP is 1, and HIT otherwise. It would take a little longer for the pattern to arise, but mobs quickly learned this as well. We saw this pattern continue. Regardless of the setup of the actions above the agents quickly learned how to maximize their damage and stay alive the longest they possibly could. Parties While the CS 370 research was successful, the whole point of a MUD is that it is MULTI-USER. Even in graphical games like World of Warcraft, players work in teams of 25 or more to defeat an enemy or traverse a dungeon. Many of these characters have different jobs and roles; some are healers, some deal damage, and some do nothing more than increase the stats of the party members around them. While we showed that a mob can be successful individually, a new question presented itself: Can mobs learn to work together in a party environment with different roles and be as successful as they were individually? This is a much more complex task. It is no longer the success of the individual that is important but the success of the party that agents have to concern themselves with. We came up with two possibilities for how this could work. The first is considerably decentralized; it concerns keeping agents for individual mobs and analyzing different fitnesses and reward mechanisms to see if they can learn to work together to perform interesting tasks while maximizing their damage and increasing their success as a party. Agent Decision Centralization The second method is to abstract the agents and rtNEAT away from the individual mobs and consolidate the inputs and outputs into one “commanding” agent that instructs the mobs on their actions for each turn in battle. Different setups, fitnesses, reward mechanisms, and battle situations are experimentally analyzed to see if they can relatively quickly maximize their damage to an enemy party. To research this, the approach concerning agent brains had to be extended. Now, mobs need to be aware of the status of their party members as well as their enemies. They also need to be able to determine which mob to target as well as what action to perform on the targeted mob. In the centralized approach, since there is one shared brain among all mobs, the number of inputs increases dramatically. The figure above shows the possible inputs and outputs in a 3 party member combat situation against 1 common enemy. There are now 8 inputs the centralized commander brain must be aware of when making a decision for a mob in the party. The centralized brain must then make an action decision based on these inputs, as well as be able to determine the target on which the action is to be performed. If the action is friendly, as in a heal action, the brain needs to learn to only target a mob in its own party. Hostile actions should target enemy mobs only. The only remaining piece concerns the heterogeneous case in which mobs may not all share the same possible actions. If the commander brain tells Mob 0 to perform an action, and that action is not available to mob 0, what is the mob to do? In this case, another input is necessary. Here, another input is added, called the “Unique Mob Identifier”. In this case, this input is representative of the mob that is acting in the turn based combat system described above. Prior Work Fractured Decision Space The addition of the Unique Mob Identifier creates a potential issue in the learning and decision making capabilities of the Centralized Commander Brain. Subtle changes in this one input will drastically change the necessary outputs of the brain with much more weight than the others. This problem possibly falls in the realm of a Fractured Decision Space (Kohl & Miikkulainen), “loosely defined as a space where adjacent states require radically different actions.” In other words, the brain may have issues building meaningful combat strategies as this input is constantly changing, causing a complete shift in the necessary action and target outputs accordingly. Experiments In designing the experiments to determine the effectiveness of the centralized approach, two areas of thought were analyzed. First, a split between parties that contain heterogeneous agents in which they all have differing stats and available actions and parties that contain homogeneous agents is looked at. Secondly, different fitness reward strategies are looked into. First, a preferred hybrid fitness that rewards based on the number of turns survived, damage done to the enemy and number of enemies killed is used. This was chosen to allow for a party to be generally successful given this domain. If they are not successful given this reward scheme, a subset of the hybrid fitness is looked into to see if strategies can be learned given that. Each experiment contains a party that has a set of mobs, of which have a standard set of stats and actions available to them. These mobs and actions are described here. Actions and Mobs Actions (Casting mob stats affected in parenthesis) - Hit o Basic melee attack - Cure (-MP, +HP) o Attempt to restore HP with cure spell - Fire (-MP) o Attempt to damage enemy with fire spell - Poison (-MP) o Attempt to cast poison Damage-over-time spell on enemy - Reap (-HP) o Powerful melee hit at expense of one's own HP - Regen (-MP) o Slowly regenerates an ally's hp - Refresh (-MP) o Slowly regenerates an ally's mp - Drain (-MP) o Steals an enemy's hp for your own use - Protect (-MP) o Greatly increases an ally's defense, lowering damage received - Berserk (-MP) o Doubles damage given, but also doubles damage received - Haste (-MP) o Doubles an ally's speed, increasing frequency of available turns - Sleep (-MP) o Causes a foe to fall asleep, making them unable to act - Blink (-MP) o Grants a target a protective shadow, reducing damage - Cover o Takes physical damage in place of an ally Mobs - Paladin o Mob that contains 3 actions: Hit, Cure and Cover Sorcerer o Mob that contains 4 actions: Hit, Refresh, Haste and Blink Magician o Mob that contains 4 actions: Hit, Fire, Poison, and Sleep Blue Onion Knight o Mob that contains all actions (homogeneous) Green Onion Knight o Mob that contains all actions (homogeneous) Experiment 1: Heterogenerous 2v1, Hybrid Fitness Party Enemy mob contains 1 action: Hit Party consists of a Paladin and Sorcerer. Results After 25000 or so encounters (about 15 minutes runtime), a strategy was found. The Paladin will cover himself and the sorcerer, as well as cure both party members as needed. The sorcerer will refresh the paladin as needed so that the paladin can continue curing, then will essentially stand behind the paladin’s cover and attack the enemy mobs. In the following figure, one notices that a stable solution is found that is not the highest fitness that was attained at any given point. The point here is STABILITY. The solution that was found allows for consistently high fitness and success rates in battle. There may be cases in which a party gets lucky, or a solution works really well some times, but gets the party killed in others. Discussion This initial experiment shows that the mobs are working together in interesting ways to maximize their fitness in EVERY case. Risky solutions are found that will allow them to achieve very high fitnesses some of the time, but these aren’t consistent. They are abandoned for more stable solutions. The paladin knows to cover and cure the sorcerer, and the sorcerer learned to refresh the paladin so that the paladin could continue casting cure. The party also found a way to defeat enemy mobs in this situation as well as the sorcerer would use turns to hit the enemy mob to defeat them. In this case, it seemed the party was leaning primarily towards defeating as many enemies as possible while doing what was minimally necessary to keep the party alive. Experiment 2: Heterogeneous 3v1, Hybrid Fitness Party Enemy mob contains 1 action: Hit Party consists of a Paladin, Sorcerer and Magician. Results The primary strategy here that was found was to essentially allow the magician to keep the enemy mob asleep while the other two party members defeated the enemy mobs. This maximized both damage dealt and survivability of the party. This solution, however, is much less stable than what was found in the 2v1 case. This can be expected, however, as the party members have to concern themselves with keeping more than 1 ally alive as well as themselves. In the experiment, if one party member dies, the battle finishes. From the figure below, however, it is seen that the overall fitness rises over time, and that the difference between maximum and minimum attained fitnesses becomes much less drastic over time as well. Discussion This sleep spamming solution was something that arose relatively quickly in multiple runs of this experiment. The other mobs worked with the magician primarily in support roles to keep the party alive, supporting survivability first. This is a very different approach than the first experiment, but shows that the parties can learn different tactics and take different priorities while continuing to be successful in combat. Experiment 3: Homogeneous 2v1, Hybrid Fitness Party Enemy mob contains 1 action: Hit Party consists of a Green and Blue Onion Knight. Results The individual mobs generally started off each performing the same actions, but did not come up with any meaningful strategies in a reasonable amount of time outside of spamming sleep (in some cases). However, in no cases here did the mobs work together to defeat their enemies. They never targeted one another with friendly spells, and did not do anything meaningful when it comes to hostile action patterns (as was seen in experiment 0). This figure shows the sleep spamming solution that was found. All other cases provide little success, so this is all that is found. Discussion Essentially it seemed the mobs in the party had trouble developing a unique role in the party. The majority of the initial agents performed the same actions regardless of the unique mob identifier. Since the fitness here is shared among all the elements of success of combat, they seemed to have problems building a strategy to work together, and since they are not individually rewarded, it seems it was too difficult a task to learn a role and maximize the fitness of the party. Experiment 4: Homogeneous 2v1, Damage Fitness Focus Party Enemy mob contains 1 action: Hit Party consists of a Green and Blue Onion Knight. Results Here, the most successful solution seemed to be a party that contained mobs that did nothing more than spam Hit. Essentially, both mobs would hit until a party member died (which would end the battle). This figure showcases the instability of the parties in this experiment. In many cases, the parties would get lucky and stay alive for a while (as they did in the abandoned solutions in experiment 1), but since the results of those solutions weren’t consistent, nothing was learned. Discussion This seems to go back to the problem in experiment 3; since the mobs initially have access to all actions, their unique mob identifier is meaningless. They don’t ever develop a unique identity or role in their party, so they learn to spam something that is successful. In this case, since sleep is not rewarded (as survivability is not rewarded), the most successful thing they could come up with is to spam hit until the party is wiped out. Conclusion The centralized brain ended up being very successful in coming up with strategies in the heterogeneous cases. However, in the homogeneous case, the Centralized brain seemed to have trouble converging on a meaningful combat strategy outside of survivability. This is most likely related to the fractured decision space problem as well as the problem of rewarding learning in meaningful ways. Since the mobs don’t have a unique role to begin with, the unique mob identifier is meaningless and never seems to develop a use during learning. In many cases, it seemed as if the homogeneous case was about to converge on a spamming solution to maximize their fitness, but the solutions never stayed around for very long. The overall conclusion here is that the centralized approach is a viable option in the heterogeneous case. As long as mobs are defined, the shared brain can come up with strategies to defeat their common enemy. However, if mobs are poorly defined or are homogeneous in nature, a meaningful solution is not found in a reasonable amount of time. Bibliography Alexander, L. (2008, 1 22). Gamasutra - News - World of Warcraft Hits 10 Million Subscribers. Retrieved 04 11, 2010, from Gamasutra: http://www.gamasutra.com/php-bin/news_index.php?story=17062 Bartle, R. (n.d.). MUD History, Who Invented MUD's, How MUD's Were Invented. Retrieved 04 10, 2010, from http://www.livinginternet.com/d/di_major.htm Daglow, D. (2008, 1 10). Neverwinter Nights | Press Release by MCV | MCV. Retrieved 04 12, 2010, from Neverwinter Nights | Press Release by MCV | MCV: http://www.mcvuk.com/pressreleases/34298/NeverWinter-Nights Kenneth O. Stanley, B. D. (2005). Real-Time Neuroevolution in the NERO Video Game. IEEE Transactions on Evolutionary Computation , 1-3. Kohl, N., & Miikkulainen, R. (n.d.). Evolving Neural Networks for Strategic Decision-Making Problems. Wizards of the Coast. (n.d.). Wizards of the Coast FAQ Archive. Retrieved April 10, 2010, from Wizards of the Coast: http://www.wizards.com/dnd/DnDArchives_FAQ.asp Wolf, M. J. (2008). The Video Game Explosion: a History from PONG to Playstation and Beyond. Westport: Greenwood Press.
© Copyright 2026 Paperzz