Creating an virtual agent by integrating player interaction into its decision making algorithm Martin de Laat University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT In this study an attempt is made to create a Virtual Agent (VA) designed to exercise control in the opponent’s behavior by interaction with its human opponent in order to influence the outcome of a game. The goal is to create an artificial player succeeding to influence its human opponent in a game with no optimal strategy, one which is based around the (iterated) prisoner’s dilemma. Keywords Virtual Agent, AI, Iterated Prisoner’s Dilemma, player interaction, influencing opponent 1. INTRODUCTION Nowadays, a lot of sophisticated AI’s exist designed to play the best strategy in order to achieve its goal; such as winning a game. But what if there exists no optimum strategy or what if the optimal strategy is not sufficient for the required endstate? What if there is another, perhaps easier way, to influence the outcome of a game or situation that goes beyond the traditional boundaries of the game? Can a Virtual Agent positively influence the outcome of the game by interacting with its opponent? An attempt to find this out will hopefully not only provide new options for existing AI’s, but might also provide more insight in the ability for Virtual Agents to influence the human opponent’s behavior. This study is aimed towards finding out if, and in what way we can influence the opponent’s behavior and to what extend this influence can be used in creating a stronger virtual opponent. 1.1 Problem statement With games like tic-tac-toe, checkers or even chess it’s within our reach to create an AI that’s capable of outperforming world’s top human players without fail. The state and goal of the game are always known, and there always exists a best move to be played. When playing a game such as poker in real life, players don’t solely follow a set of guidelines to decide whether to Fold, Call or to Raise. Good poker players don’t base their decisions purely on statistics and probability, but also on reading their opponents behavior, facial expression or the way they interact. Poker players sometimes bluff, or try to affect an opponent’s next move by vocal or visual deception. In Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. T o copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 17 th Twente Student Conference on IT, June 25 th , 2012, Enschede, T he Netherlands. Copyright 2012, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. short, poker “is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world.” [3] Up till now, rather successful attempts have been made to integrate player analysis into an artificial poker player’s decision making algorithm, such as Poki. “Poki uses learning techniques to construct statistical models of each opponent, and dynamically adapts to exploit observed patterns and tendencies.”[3]. Although they’ve managed to adequately analyze a player’s playing style, they have yet to reach the level world-class poker players play at, and especially: player analysis is just one of the many way s real-live poker players influence the game. Besides influencing the game by making a move based on your own hand and the speculation of your opponent’s, one can alter the outcome of a game by influencing the opponent’s behavior, by making him (re)act in a way that’s favorable to you. Just like a world class poker player would try to taunt or deceive his opponent, a Virtual Agent could do more than just follow a set of tactics; it could also exercise control by interacting with its opponent. This study intends to explore to what extend it’s possible to successfully integrate player interaction into a VA’s decision making algorithm and does so in an attempt to create one. But even if we were able to create a VA capable of doing just that, how can we confidently conclude that the human opponent’s decision was in fact influenced by the VA? We could ask the participant to fill in a questionnaire after playing the game to describe how the VA influenced his decision making process, but can someone really claim to know exactly did and did not influence him into making a decision? M ost thought patterns occur subconsciously and we’d like to be able to statistically prove the VA works, not by psychologically interpreting the participants’ rationale afterwards. In order to successfully prove the Virtual Human’s ability to influence the behavior of a human opponent, we must design / choose a game where we can easily interpret the data consisting of the moves played, without the external factors and uncertainty a game like Poker would bring. Ideally, we must use a game where without any form of interaction; the game will have no definitive best strategy making the only possible way to successfully play the game by interaction with the opponent. The game chosen for this is one based around the Iterated Prisoner’s Dilemma. Why this game is ideally suitable for our study is explained in paragraph 2.1: Designing the game. A problem arises when we would try to exercise control upon the human opponent. One human player differs from the other, and might be susceptible to different methods of persuasion. For this we must ask the question to what extend human players differ from one other and how we can successfully categorize them within the scope and time limits of this research. Concluding, this study aims to research the feasibility and possibilities of integrating player interaction into an AI’s decision making algorithm and does so by creating a Virtual Human aimed at positively reinforcing the opponent to choose to Cooperate in an Iterated Prisoner’s Dilemma game. will only have to motivate its opponent in choosing one option over the other. Feasibility As discussed in the points above, the narrow scope of the game allows for easy communication and evaluation whilst still holding true to the No optimal strategy -requirement. A game like poker is not deemed feasible given the time frame. 1.2 Research Questions The primary research question is as followed: To what extend can we integrate player interaction into an AI’s decision making algorithm? The answer to his question will be based upon an attempt to realize a Virtual Human capable of interacting with its opponent. In designing the Virtual Human the following sub questions must be explored: Which tactics are deemed successful and suitable for our goals? How can we distinguish between player’s playing styles? What is a suitable way to interact with this type of player? How can we feasibly integrate game theory and interaction strategies into a working VA who’s decision making algorithm integrates player interaction into a usable strategy? Only once these questions are explored a successful attempt into creating the Virtual Agent can be made. Once the VA is created, it shall be put to the test against a number of human opponents and a comparable AI without interaction will be used as reference material. With data that follows from these tests we will be able to conclude to what extend we managed to influence the opponent’s behavior and improved the used strategy. 2.1.2 Designing the game’s mechanics A game based on the Prisoner’s Dilemma, usually involves around two players with a choice: either to Cooperate or to Defect. If player 1 and 2 both choose to Cooperate, their reward will be greater than if they both choose to Defect. If one chooses to Defect and the other to Cooperate, the player who chose to Defect will get a higher reward than in the first scenario, and the player that Cooperated an even lower reward than if they had both Defected. For the point system we follow the system that’s most commonly used, including [9]. Table 1 lists all possible moves two players (1 and 2) can do. Table 1 Possible moves Prisoner’s Dilemma 1) Cooperate 1) Defect 2) Cooperate A, A C, B 2) Defect B, C D, D In this table, ‘B, C’ means that the player 2 chooses to Defect whilst player 1 decides to Cooperate. To be a Prisoner’s dilemma, the following must be true: B>A>D>C 2. METHOD OF RESEARCH For Iterated Prisoner Dilemma (IPD) games the following rule generally applies as well: In this chapter, paragraphs 1 through 5 describe the design phases. In phase 1 we will define the game design and mechanics. Phase 2 gives an insight into proven tactics for the IPD. Next, in phase 3, we will choose how to categorize an opponent. In phase 4 we will discuss in what ways we can influence the opponent and in phase 5 we will decide how to combine the found game tactics with the ways to influence the opponent. Finally, in chapter 3 we will evaluate the results after having participants play against our VA and see to what extend we’ve managed to succeed in creating a VA capable of positively influencing the outcome of a game. This way, continuous Cooperation will give a greater reward than if two players decide to ‘cheat’ the system and alternatively choose to Cooperate and Defect. For the point distribution we follow the standard payoff model, also used by Pelc and Pelc [9]. Table 2 Point distribution 3) 2.1 Setting up the test environment 2.1.1 Choosing the game The game chosen to design the VA for will be one based around the Iterated Prisoner’s Dilemma. This game will be ideally suitable for the intended research for the following reasons: No optimal Strategy As there is no optimal strategy the only way to excel is by carefully guided interaction with the opponent. Narrow scope Because the game only allows the player to choose between 2 options, to either Cooperate or to Defect, it makes it easy to evaluate the data without having to look at each specific scenario and weighing all the factors as the current state of the game, as you would with games like poker or chess. It will also increase the likelihood of succeeding in letting the VA successfully interact with its opponent because in this case it 2A > B + C Cooperate 3) Defect 4) Cooperate 3, 3 0, 5 4) Defect 5, 0 1, 1 If all moves were made randomly, the expected result of each turn would be: (3 + 0 + 5 + 1) / 4 = 2,25 This makes Cooperation between players the most efficient way to play the game. If we look at this point distribution closely however, we’ll see that no matter what the opponent chooses to do, the best outcome for you will always be to choose to Defect. Therefore, especially if a game would consist of only 1 move, one might claim that the best strategy would be to always choose to Defect. This isn’t the case, as in an IPD situation, your opponent will react to your defection accordingly. Just as choosing to Defect always gives the best return no matter what the opponent chooses, independent of your choice it is also most beneficial for you if the opponent chooses to Cooperate. Choosing to Cooperate to mutually benefit from a scenario with the prospect of an increased future pay -off is the basic idea of reciprocal altruism [11]. The only way to get a higher reward than the expected result is by Cooperating with your opponent and by creating a relationship of trust. Depending on your estimation of what the opponent will do, you may choose betray him at the right time to gain even more points.The goal of the game for the players will be to gain as much points as possible, regardless of what their opponent does. The Prisoner’s Dilemma is meant to propose a tough decision regardless of the faith of your opponent, and we ensure this by taking the VA’s sessions off the ranking list. Therefore, at the beginning of the game, the players will be told that they have to gain as much points possible, and that there score will be put on a ranking list. Additionally, the participant will only be shown the points gained last round, not to total scores. This is to prevent players from getting the urge to play solely against the VA. The goal for the VA will be to successfully interact with the opponent; to see to what extent it can influence the player’s decision. Ideally this would be if the VA would be able to increase the amount of Cooperated moves the opponent plays, and by effectively interact with the opponent. As explained by Axelrod and Hamilton [2], the number of rounds a game has affects an opponent’s playing style. If a round contains 15 turns, it would be best to Defect on the last turn as no future iterations can occur. If you can expect your opponent to Defect on the last turn n, your best solution is to also Defect on the (n – 1)th turn, all the way down to the first turn. To prevent the number of rounds from affecting the player’s decision, the exact number of rounds will not be shown to the particip ant. Similarly, in designing the VA, we will not use take this information into account for its decision making algorithm. wants to Cooperate or not. What he will actually do is up to him. This is the only way for the participant to interact with the VA, as he is not able to propose to cooperate with the VA himself. This is to conform to the idea of keeping the information flow going one direction only. The VA on the other hand can interact with the human player via text messages, a facial expression or, as stated, by proposing to Cooperate. Appendix A illustrates the playing environment of the game. Here, the participant will be able to see the scores gained last round, a facial expression of the VA, text messages from the VA and the three buttons to play the game. When playing against the silent AI, the participant will see no face or text popups. 2.2 Game tactics It is impossible to define a true tactic that will always yield the best result. M ost strategies in the IPD incorporate a decision algorithm that bases the decision in round n + 1 on what happened in round n. Some more advanced strategies model the behavior of the opponent as a M arkov process and/or make use of the Bayesian inference, which are regularly used for Poker agents as well. Following Cohen’s conceptualization of iterated decisions [4], we can expect some players to choose one decision somewhere in the game, and stick to that decision. Numerous studies have been conducted to capture a valid pattern between a player’s moves based on the opponent’s last move. A study from Rapoport and Dale [10] shows that in two models the authors rather successfully succeeded in resembling the progress of a game by separately assuming that at one point a player will stick to a certain decision, and by the assumption that “at each play the probability of choosing C is a linear function of that probability on the preceding play, whereby the parameters of the transformation depend on the preceding outcome.” 2.1.3 The test environment As it is custom with these kinds of studies, the participant will have to fill in a survey before he can start. Besides the usual information as a name and email address (to prevent someone from participating multiple times) the participant is asked a question that will help us determine what type of player he is. What question this is, and why this is deemed necessary is explained in paragraph 2.1.3. When the game is over the participant will be asked to fill in a small survey to log his experience. The outline of the survey can be found in paragraph 3.1.After filling in the entry form, the participant will play one round against the created VA, and one round against a silent AI. In order to prevent misleading data due to the issue that the participant could alter his playing style from the experience gained playing the first game, the order of the AI’s the participant is playing against is randomized. Apart from being able to interpret the final data independent of the order, we’re also able to note the difference between the first and second game played. After the survey the participant will be introduced to Alice, the created Virtual Agent. While playing the game the player will have the following two possible actions: Cooperate Defect At all times, the human player will have access to the score system (Table 2 Point distribution) and the scores achieved last round to keep him compelled to the game and guide him in his decision making process. At a certain time, the VA, and the VA alone, can propose the participant to Cooperate together. When the human player is proposed to Cooperate, he can respond to the VA whether he Au and Komorita [1] conclude in their study that “The results of this study clearly show that (a) a cooperative strategy - one that initiates unilateral cooperation at the outset and then adopts a TFT strategy - is very effective in inducing subsequent cooperation from the other party, (b) the effectiveness of a cooperative strategy varies directly with the cooperative orientation of the other party (a cooperative strategy is more effective against a cooperative than a competitive person), and (c) initial cooperation is more effective if it is repeated more than once.” Intriguingly, Axelrod and Hamilton [2] showed that perhaps the best strategy of all would actually be one of the simplest in design. In entering a computer tournament for the Prisoner’s Dilemma they attained the highest average score with the simplest of all strategies submitted: TIT FOR THAT (TFT). The basis of the TFT strategy revolves around the fact that it never Defects first, it retaliates when the opponent decides to Defect but forgives its opponent when he decides to cooperate again. The TFT strategy proved to be a very robust one. M oreover, Axelrod and Hamilton showed that once TFT is established, it can resist invasion by any possible mutant strategy provided that the individuals who interact have a sufficiently large probability of meeting again. That is, if: ώ ≥ (B – A) / (B – D) (5-3)/(5-1)= 2/4 ≤ 0.93 ώ ≥ (B – A) / (A – C) (5-3)/(3) = 2/3≤0,93 They demonstrate that TFT is evolutionarily stable if ώ is constant and sufficiently large. One of the reasons why TFT works so well is because it only remembers the previous situation. The downside is that if the opponent only chooses to Defect, TFT will copy this pattern without trying to persuade the opponent to Cooperate. If the opponent wants to Cooperate the round after Defecting, the TFT will choose to Defect that round, causing the opponent to consider to Defect again the following move. This will result in a spiral where one of the players has to ‘man up’ and continue to Cooperate. This is a deficiency that can be overcome with interaction between players, creating possibilities for our AI. Doebeli and Hauert [5] claim that “TFT performs poorly in a noisy world, in which players are prone to make erroneous moves that can cause long series of low paying retaliatory behavior.” They suggest the introduction of probabilistic strategies, for instance as used by the Generous TFT, where the probability to retaliate is lowered to 2/3th. Another way to conquer the weaknesses of TFT is another equally simple strategy called ‘win – stay, lose – shift’(WSLS) [7]. WSLS repeats the previous move if the resulting payoff has met its aspiration level and changes otherwise. The main argument for WSLS is that it is able to correct mistakes from the opponent. Two TFT strategies put against each other can each other to alternate between Cooperating and Defecting, whilst WSLS corrects this. If the payoff of a round is A or B (in our case 3 or 5 points resp.), WSLS will repeat the same action. If the payoff is C or D (0 or 1 point resp.) it will change its move. A second argument for WSLS is that it can dominate an all-cooperating strategy by defecting, while the TFT-strategy will copy the opponent’s move. A final reason to choose WSLS over TFT would be because with TFT, the game can get stuck in a situation where everybody chooses to Defect. WSLS gives the opponent a reason to Cooperate. Imhof, et al. [7] showed that in an IPD with noise, where the benefit-to-cost ratio exceeds 3, the WSLS strategy (also called Pavlov) is superior to TFT. If not, it’s best to always defect. The issue with SLS however, is that it is only superior to TFT in a noisy environment because it ‘forgives’ the opponent if the game is going to wrong way. It also dominates AllCooperating strategies, but when playing against human opponents or any capable AI, none of them will continue Cooperating when the opponent has Defected for the last few rounds. For both players to Cooperate, trust is necessary. This creates the problem of having to ‘trust’ the opponent with nothing to substantiate this trust. WSLS blindly trusts the opponent if the game is going the wrong way. This way, a strategy that always Defects, exploits WSLS in every second round, whereas this is not the case with TFT. Integrating player interaction with TFT, so that it can, up to a certain extent, influence and predict the opponent’s next move might surpass them both. In order to substantially evaluate the effect of the interaction, the participant will play one round against a silent TFT-strategy, and one round against a modified TFT-strategy which incorporates player interaction. the Cooperatively- and Individualistically orientated players cooperated with a TFT strategy. We’ll be using a strategy that is rather similar to TFT, thus creating an overlap in playing styles between cooperatively and individualistically oriented players. Because of this and because the participants are enforced not to play competitively, the participant will be categorized into one of the following styles: 2.3 Categorizing the human player This can be earned by either applying to the opponent’s cognitive reasoning or by playing into his emotional state. A Virtual Agent can use reasoning to make the opponent behave as required, or can use kindness or guilt against him. Textual attempts to play into the opponent’s emotions can be accompanied with visual representation of a human emotion (e.g. avatars or emoticons) in order to be more affective and make the player feel like he’s playing against something more than just a computer. Geiwitz [6] concluded that cooperation between players increased when the players were able to send both threats and This way, we won’t have to base our assumptions on what type of player we’re playing against on analysis of previous moves played, but can instantly and fault free interact accordingly. In their experiment, Kuhlman and M arshello [8] divided their participants into three group with the following motivational orientations: Cooperative, Competitive and Individualistic. Their study showed that Competitive players Defected against TFT, 100% Cooperation and 100% Defection strategies while 1) 2) Cooperating player Self-Interested player Cooperating players will try to Cooperate as much as possible, because they see a mutual gain. They will not only look at what’s best for him individually, but also sees value in helping his opponent. He will be more easily persuaded into Cooperating. The Self-Interested player on the other hand will have a tendency to Defect, or only act in his best interest. He will rationalize from his own perspective and only reacts on what’s in his best interest. In order to categorize the participant to match his playing style, the participant will fill in a survey before starting the game. Apart from the general questions, the participant will fill in a question that will determine what type of player he is. The participant will be asked the following question: If I were to give you 50 euros and ask you to divide it between you and your best friend, would you do it? He will not find out if you choose not to split it. This simple question in fact determines two aspect of the participant. The first is to what extend he cares about the faith of his friend. The second is to find out if he can be motivated into splitting the money. Self-Interested players will see this as an opportunity to gain 50 euros, especially if the friend doesn’t find out. Cooperating players will have no problems splitting the money as they got it for free and they are asked to; they care about the faith of their friend. This question proved to be accurate in 72% of the cases in regards to their own insights (see paragraph 3.1). 2.4 Influencing the game No matter what move you choose, it is most beneficial for you if the opponent chooses to cooperate with you. A major factor of the game is thus to convince the opponent to Cooperate with you. This includes both positive stimulation when the opponent chooses to Cooperate as well as a negative response when the opponent chooses to Defect. In order to influence the opponent’s decision, the Virtual Agent must be able to interact with its opponent. The VA has the following ways to interact: Textual messaging Facial expressions Propose to cooperate Choose what move to play (indirect) messages of intention to each other in an IPD instead of solely messages of intention. This indicate that threats do seem to have function in an IPD. Based upon the possible moves that are played in a round, there can be four outcomes of the last round to respond to. When the opponent and Alice both Cooperate, the visual reaction must be joy. When both players choose to Defect, shame/a sign of wastefulness would be correct because this way, no one hardly gains any points. When the opponent chooses to Defect while Alice Cooperates, anger will be deemed as the correct emotional response. These are all fairly straight forwarded emotional responses, but when Alice Defects and the opponent Cooperates we can choose between two responses to give. One would be to show joy/laughter. The only effect his will achieve, however, is for the opponent to get angry, and start Defecting out of retaliation; a situation not preferable for both parties. The chosen emotional response is that of a classic ‘oops’. This way, the opponent will presumably not hold too much of a grudge against Alice. The facial responses are deemed as suitable responses to both a Cooperative as a Self-Interested player. The textual messages however, are different for each player. Table 3 shows the Alice’s textual responses to a certain move played last round, depending on the ty pe of player she’s up against. Table 4 shows captions of Alice’s facial responses to the moves previously played. In the actual game the participants are shown a series of movies to illustrate Alice’s reaction in contrast to just images. Table 3 Textual responses Last round’s moves played start to Cooperate again! Table 4 Facial responses Last round’s moves played Alice’s facial response P: Coop. VA: Coop (joy) P: Coop. VA: Defect (oops) Textual response P: Coop. VA: Coop Against Coop player We are an awesome team! Against SI player Let's keep Cooperating for more points! P: Defect VA: Coop. (anger) P: Coop. VA: Defect Against Coop player Oops, I'm sorry. I wasn't sure you would change your mind. Let's both Cooperate next round! Against SI player Your actions in the previous rounds didn't convince me you would Cooperate. Let's start with a clean slate and both Cooperate next round! P: Defect VA: Defect (shame/wastefulness) P: Defect VA: Coop. Against Coop player Why did you do that? I thought we were a team! Let's work together Against SI player You betrayed me! You better start Cooperating or I will start working against you! P: Defect VA: Defect Against Coop player I don't want to do this, but you force me to. Please show me a sign you want to start working together again! Against SI player If this is how you want to play it, I'll Defect until you When playing against a Cooperating player, the VA will choose to use kind words while meanwhile trying to make the opponent feel guilty for Defecting. A Self-Interested player might respond better to threats or theoretical reasoning that is in line with his own goal. Note: The text in the situation Player: Defect, VA: Defect only appears when the player betrayed the VA after agreeing to Cooperate. M ore on this in paragraph 2.5. 2.5 Designing the Virtual Human The VA is designed by integrating aspects of the Tit-For-Tat strategy with the interactions proposed in paragraph 2.4. The goal is to overcome the weaknesses of Tit-For-Tat by achieving a solid base for predicting a player’s response to a certain situation and action that is followed by it. Because up to date, no studies can be found that give useful insight in to setting up a solid base, the VA will be designed to test the effect the proposed interaction has on the opponent. This information is vital for understanding to what extend it is possible to successfully use interaction with the opponent to your advantage. If we take a look at the Tit-For-Tat strategy and take into account that TFT starts off by Cooperating, the possible game states are shown in Figure 1. In the illustration, S represents the start of the game. DC describes a state where Alice’s opponent Defects, whilst Alice chooses to Cooperate. The numbers on the green circles illustrate the four unique game states. The yellow circles redirect to the green states. state where both players choose to Defect, Alice can propose the opponent from making a pact to Cooperate together. If the player agrees to Cooperate, Alice can (and in our case will) choose to Cooperate as well. A representation of the possible states is shown in Figure 2. A clearer illustration can be consulted in Appendix B. With the option to propose, a possible problem might arise when the opponent chooses to betray Alice’s trust by agreeing to Cooperate, whilst in fact choosing to Defect. For this, a trust system with so-called M ojo points is introduced. A player starts off with 1 point. Whenever he chooses to Defect, he will lose 1 point. The next round, conforming to TFT, Alice will choose to Defect. If the player chooses to Cooperate the following two rounds, he will be rewarded with a M ojo point. If he decides to Cooperate once, and Defect after that, he’ll end up back in state 2 with no M ojo points. If he decides to Defect two times in a row, he will reach state 4. If Alice trusts the player enough (player has equal to zero or more M ojo points), he will be offered a proposition to Cooperate. If he agrees, and Cooperates afterwards, Alice will reward him with a M ojo point and the player will end up back in state 1 with one M ojo point. He will keep this until he chooses to Defect again. This system resolves around the fact that the player will normally have 0 or 1 M ojo points. In a Cooperation cycle he will have 1 point, if he exits that cycle by Defecting, he will lose his M ojo point. If the game reaches state 4, the player is normally offered a proposition to Cooperate. If he agrees to Cooperate, but decides to Defect anyway, he will lose 1 M ojo point. As the saying goes: Fool me once, shame on you. Fool me twice, shame on me; whenever the opponent chooses to betray Alice he will end up in a cycle where it is impossible to reach zero or more M ojo points when getting in state 4. If the opponent ends up in state 4 when Alice’s trust is lost, Alice will move over to the general TFT strategy, while trying to both visually and textually persuade her opponent into Cooperating, as outlined in Table 3 and Table 4. Figure 1 Game states pure TFT Before we have grounds to successfully deceive and betray the opponent, we want to prevent the following two scenarios from happening: A game where players alternatively Cooperate and Defect A game that’s stuck where both players continue to defect We attempt to solve the first situation by mere textual and facial interaction with the opponent. By reacting positively to Cooperation from the opponent, and negatively to a Defection, we aim to influence the opponent into Cooperating more often. The textual response, as stated, depending on the type of opponent the VA is up against. To prevent a game from reaching a state where both players resort to doing nothing but choosing to Defect, Alice can propose her opponent to a pact. When the game reaches a Figure 2 Game S tates Alice 3. FINDINGS AND CONCLUSION 3.1 Experimental findings A population of 28 Dutch participants in the range from 11 to 51 years old and an average age of 23 years participated in this experiment. Before playing against Alice and the silent AI, the participants had to fill in the following question: If I were to give you 50 euros and ask you to divide it between you and your best friend, would you do it? He will not find out if you choose not to split it. From the 28 participants, only four answered the question by saying they would keep the money for themselves. At the end of the experiment the participants were asked whether they saw themselves as Cooperating or SelfInterested players. From those 4 participants that answered the question by choosing to keep the money (thus being marked as a SelfInterested player by the game), all identified themselves as a Self-Interested player. From the remaining 25 players, however, eight of them saw themselves as a Self-Interested player. This comes down to a correct interpretation of (1 – (8/29)) * 100% ≈ 72% At the end of the game, the participants were asked the additional following questions: 1. 2. 3. On a scale from 1 to 5 (5 meaning 'A lot'), did Alice influence you in making a decision? On a scale from 1 to 5 (5 meaning 'A lot'), did you feel remorse when you betrayed Alice? Did you think Alice was cleverer than the other PC player? The results to these questions are listed in Table 5. Response > Question \/ 1) 2) 3) Table 5 S urvey results 1 1 / 2 / 3 / 4 yes the same no 8 4 5 9 16 5 4 2 4 9 16 5 3 2 Oddly enough, from these results, we can conclude that most of the population thought of the silent player as the most clever one, while most off the population admitted to feel almost no remorse for the VA at all. Some participants responded that the VA succeeded in influencing their decision, but this could go both ways. In playing the game, twelve participants first encountered the silent AI first, while the remaining sixteen played a game against Alice first. Table 6 lists data from the experiment. To recapulate, a participant plays a round of 40 turns in total. He will either play against the Silent (TFT) AI first, or against Alice first. Both rounds take 20 turns each. Table 6 Data game order Silent first Alice first Number of participant 12 16 Avg. Cooperated moves per game 24.33 24.13 Avg. total cooperation moves vs. Silent AI 11.75 12.19 Avg. total cooperation moves vs. Alice 12.58 11.94 Total avg. points gained by player 101 100.25 Total avg. points gained by AIs 90.17 91.19 1 Note: This table shows 29 entries in total as with one of the games the data of the moves made were not transferred to the server correctly. From these data we can assume that the order in which the participants play the game does not significantly improve the total points gained. It does however affect the Cooperation rate. On average, players will Cooperate more often the second round they play, meaning. On average, the player shows an average increase of roughly 4.2% when playing for the second time. For comparing statistics, we therefor multiply the amount of Cooperative moves for the second game by 0.958. The results are shown in Table 7. Table 7 Data game order - adapted Silent first Alice first Avg. total cooperation moves vs. Silent AI 11.75 11..68 Avg. total cooperation moves vs. Alice 12.05 11.94 On average, this shows that people were 2.4% more likely to Cooperate with Alice than against a Tit-For-Tat strategy. Although this shows an increase in favor of the developed VA, this number is not significant enough to confidently conclude we were able to influence the participant’s behavior. If we look into the data of the moves played however, we see that of the 55 times that Alice proposed to Cooperate, people decided to agree with the proposal 47 times. From the eight times the proposal was rejected, three (individual) participants actually chose to Cooperate anyhow. From the 47 eight proposals that got accepted, 29 times the participant actually Cooperated, meaning that only in about 37% of the cases the participant kept true to his word. One of the goals of the tweaked TFT strategy was to prevent the game from reaching a state where both players Defected continuously. With the plane TFT strategy, there were 189 instances where two consecutive moves from the participant were to Defect and there were 140 instances where a participant Defected three times in a row. Against Alice however, only 154 instances occurred where a participant would Defect twice in a row, whilst it occurred only 83 times where a participant would Defect three times in a row. This comes down to a decrease of roughly 26% and 41% respectively. 3.2 Conclusion From the data listed in paragraph 3.1, we can safely conclude that we were not able to significantly alter the opponent’s playing style. The participants proved not to be trustworthy enough for the used (M ojo) model. Also, the data in Table 5 shows that we did not succeed in creating a strong enough bond between the player and the VA to make the player care about whether he betrayed Alice. M oreover, judging from amount of participants that did not care about betraying Alice and who thought that the silent AI was in fact the smartest, we may safely say that it is possible the participants mistook Alice’s kindness as a weakness. Because the number of Cooperated moves did not significantly increase with the designed VA and the success rate after an agreement to Cooperate is only around 37% we have not found any evidence to indicate a possibility to predict the opponent’s move based on interaction between the VA and the human opponent. Thus, the findings cannot be used to integrate predictions of what the opponent will do next into the decision making algorithm. Based on the statistics of consecutive moves defected by the opponent, we can however conclude that the used model contributes towards preventing continuing series of Defecting moves. Although a population of 28 is not representative enough to substantiate the claim that we succeeded in improving one of the flaws of the Tit For-Tat strategies, the results seem to indicate so. 4. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] Au, W.T. and Komorita, S.S. Effects of Initial Choices in the Prisoner's Dilemma. Journal of Behavioral Decision Making, 15 (4). 343-359. Axelrod, R. and Hamilton, W.D. The evolution of cooperation. Science, 211 (4489). 1390-1396. Billings, D., Davidson, A., Schaeffer, J. and Szafron, D. The challenge of poker. Artificial Intelligence, 134 (1-2). 201-240. Cohen, B.P. Conflict and conformity: a probability model and its application. M .I.T. Press, Cambridge, M ass., 1963. Doebeli, M . and Hauert, C. M odels of cooperation based on the Prisoner's Dilemma and the Snowdrift game. Ecology Letters, 8 (7). 748-766. Geiwitz, P.J. The effects of threats on prisoner's dilemma. Behavioral Science, 12 (3). 232-233. Imhof, L.A., Fudenberg, D. and Nowak, M .A. Titfor-tat or win-stay, lose-shift? Journal of Theoretical Biology, 247 (3). 574-580. Kuhlman, D.M . and M arshello, A.F. Individual differences in game motivation as moderators of preprogrammed strategy effects in prisoner's dilemma. Journal of Personality and Social Psychology, 32 (5). 922-931. Pelc, A. and Pelc, K.J. Same game, new tricks: What makes a good strategy in the Prisoner's Dilemma? Journal of Conflict Resolution, 53 (5). 774-793. Rapoport, A. and Dale, P. M odels for Prisoner's Dilemma. Journal of Mathematical Psychology, 3 (2). 269-286. Trivers, R. The Evolution of Reciprocal Altruism. The Quarterly Review of Biology, 46 (1). 35-57. APPENDICES 4.1 Appendix A 4.2 Appendix B
© Copyright 2026 Paperzz