Creating An Virtual Agent By Integrating Player Interaction Into Its

Creating an virtual agent by integrating player
interaction into its decision making algorithm
Martin de Laat
University of Twente
P.O. Box 217, 7500AE Enschede
The Netherlands
[email protected]
ABSTRACT
In this study an attempt is made to create a Virtual Agent
(VA) designed to exercise control in the opponent’s behavior
by interaction with its human opponent in order to influence
the outcome of a game. The goal is to create an artificial
player succeeding to influence its human opponent in a game
with no optimal strategy, one which is based around the
(iterated) prisoner’s dilemma.
Keywords
Virtual Agent, AI, Iterated Prisoner’s Dilemma, player
interaction, influencing opponent
1. INTRODUCTION
Nowadays, a lot of sophisticated AI’s exist designed to play
the best strategy in order to achieve its goal; such as winning a
game. But what if there exists no optimum strategy or what if
the optimal strategy is not sufficient for the required endstate? What if there is another, perhaps easier way, to
influence the outcome of a game or situation that goes beyond
the traditional boundaries of the game? Can a Virtual Agent
positively influence the outcome of the game by interacting
with its opponent? An attempt to find this out will hopefully
not only provide new options for existing AI’s, but might also
provide more insight in the ability for Virtual Agents to
influence the human opponent’s behavior. This study is aimed
towards finding out if, and in what way we can influence the
opponent’s behavior and to what extend this influence can be
used in creating a stronger virtual opponent.
1.1 Problem statement
With games like tic-tac-toe, checkers or even chess it’s within
our reach to create an AI that’s capable of outperforming
world’s top human players without fail. The state and goal of
the game are always known, and there always exists a best
move to be played.
When playing a game such as poker in real life, players don’t
solely follow a set of guidelines to decide whether to Fold,
Call or to Raise. Good poker players don’t base their
decisions purely on statistics and probability, but also on
reading their opponents behavior, facial expression or the way
they interact. Poker players sometimes bluff, or try to affect
an opponent’s next move by vocal or visual deception. In
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. T o copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
17 th Twente Student Conference on IT, June 25 th , 2012, Enschede, T he
Netherlands.
Copyright 2012, University of Twente, Faculty of Electrical Engineering,
Mathematics and Computer Science.
short, poker “is a game of imperfect information, where
multiple competing agents must deal with probabilistic
knowledge, risk assessment, and possible deception, not
unlike decisions made in the real world.” [3]
Up till now, rather successful attempts have been made to
integrate player analysis into an artificial poker player’s
decision making algorithm, such as Poki. “Poki uses learning
techniques to construct statistical models of each opponent,
and dynamically adapts to exploit observed patterns and
tendencies.”[3]. Although they’ve managed to adequately
analyze a player’s playing style, they have yet to reach the
level world-class poker players play at, and especially: player
analysis is just one of the many way s real-live poker players
influence the game.
Besides influencing the game by making a move based on
your own hand and the speculation of your opponent’s, one
can alter the outcome of a game by influencing the opponent’s
behavior, by making him (re)act in a way that’s favorable to
you. Just like a world class poker player would try to taunt or
deceive his opponent, a Virtual Agent could do more than just
follow a set of tactics; it could also exercise control by
interacting with its opponent. This study intends to explore to
what extend it’s possible to successfully integrate player
interaction into a VA’s decision making algorithm and does
so in an attempt to create one.
But even if we were able to create a VA capable of doing just
that, how can we confidently conclude that the human
opponent’s decision was in fact influenced by the VA? We
could ask the participant to fill in a questionnaire after playing
the game to describe how the VA influenced his decision
making process, but can someone really claim to know
exactly did and did not influence him into making a decision?
M ost thought patterns occur subconsciously and we’d like to
be able to statistically prove the VA works, not by
psychologically interpreting the participants’ rationale
afterwards. In order to successfully prove the Virtual
Human’s ability to influence the behavior of a human
opponent, we must design / choose a game where we can
easily interpret the data consisting of the moves played,
without the external factors and uncertainty a game like Poker
would bring. Ideally, we must use a game where without any
form of interaction; the game will have no definitive best
strategy making the only possible way to successfully play the
game by interaction with the opponent.
The game chosen for this is one based around the Iterated
Prisoner’s Dilemma. Why this game is ideally suitable for our
study is explained in paragraph 2.1: Designing the game.
A problem arises when we would try to exercise control upon
the human opponent. One human player differs from the
other, and might be susceptible to different methods of
persuasion. For this we must ask the question to what extend
human players differ from one other and how we can
successfully categorize them within the scope and time limits
of this research.
Concluding, this study aims to research the feasibility and
possibilities of integrating player interaction into an AI’s
decision making algorithm and does so by creating a Virtual
Human aimed at positively reinforcing the opponent to choose
to Cooperate in an Iterated Prisoner’s Dilemma game.
will only have to motivate its opponent in choosing one option
over the other.

Feasibility
As discussed in the points above, the narrow scope of the
game allows for easy communication and evaluation whilst
still holding true to the No optimal strategy -requirement. A
game like poker is not deemed feasible given the time frame.
1.2 Research Questions
The primary
research question is as followed:
To what extend can we integrate player interaction into an
AI’s decision making algorithm?
The answer to his question will be based upon an attempt to
realize a Virtual Human capable of interacting with its
opponent. In designing the Virtual Human the following sub
questions must be explored:

Which tactics are deemed successful and suitable
for our goals?

How can we distinguish between player’s
playing styles?

What is a suitable way to interact with this type
of player?

How can we feasibly integrate game theory and
interaction strategies into a working VA who’s
decision making algorithm integrates player
interaction into a usable strategy?
Only once these questions are explored a successful attempt
into creating the Virtual Agent can be made. Once the VA is
created, it shall be put to the test against a number of human
opponents and a comparable AI without interaction will be
used as reference material. With data that follows from these
tests we will be able to conclude to what extend we managed
to influence the opponent’s behavior and improved the used
strategy.
2.1.2 Designing the game’s mechanics
A game based on the Prisoner’s Dilemma, usually involves
around two players with a choice: either to Cooperate or to
Defect. If player 1 and 2 both choose to Cooperate, their
reward will be greater than if they both choose to Defect. If
one chooses to Defect and the other to Cooperate, the player
who chose to Defect will get a higher reward than in the first
scenario, and the player that Cooperated an even lower reward
than if they had both Defected.
For the point system we follow the system that’s most
commonly used, including [9]. Table 1 lists all possible
moves two players (1 and 2) can do.
Table 1 Possible moves Prisoner’s Dilemma
1) Cooperate
1) Defect
2)
Cooperate
A, A
C, B
2)
Defect
B, C
D, D
In this table, ‘B, C’ means that the player 2 chooses to Defect
whilst player 1 decides to Cooperate.
To be a Prisoner’s dilemma, the following must be true:

B>A>D>C
2. METHOD OF RESEARCH
For Iterated Prisoner Dilemma (IPD) games the following rule
generally applies as well:
In this chapter, paragraphs 1 through 5 describe the design
phases. In phase 1 we will define the game design and mechanics. Phase 2 gives an insight into proven tactics for the
IPD. Next, in phase 3, we will choose how to categorize an
opponent. In phase 4 we will discuss in what ways we can
influence the opponent and in phase 5 we will decide how to
combine the found game tactics with the ways to influence the
opponent. Finally, in chapter 3 we will evaluate the results
after having participants play against our VA and see to what
extend we’ve managed to succeed in creating a VA capable of
positively influencing the outcome of a game.

This way, continuous Cooperation will give a greater reward
than if two players decide to ‘cheat’ the system and
alternatively choose to Cooperate and Defect.
For the point distribution we follow the standard payoff
model, also used by Pelc and Pelc [9].
Table 2 Point distribution
3)
2.1 Setting up the test environment
2.1.1 Choosing the game
The game chosen to design the VA for will be one based
around the Iterated Prisoner’s Dilemma. This game will be
ideally suitable for the intended research for the following
reasons:

No optimal Strategy
As there is no optimal strategy the only way to excel is by
carefully guided interaction with the opponent.

Narrow scope
Because the game only allows the player to choose between 2
options, to either Cooperate or to Defect, it makes it easy to
evaluate the data without having to look at each specific
scenario and weighing all the factors as the current state of the
game, as you would with games like poker or chess. It will
also increase the likelihood of succeeding in letting the VA
successfully interact with its opponent because in this case it
2A > B + C
Cooperate
3)
Defect
4)
Cooperate
3, 3
0, 5
4)
Defect
5, 0
1, 1
If all moves were made randomly, the expected result of each
turn would be:

(3 + 0 + 5 + 1) / 4 = 2,25
This makes Cooperation between players the most efficient
way to play the game.
If we look at this point distribution closely however, we’ll see
that no matter what the opponent chooses to do, the best
outcome for you will always be to choose to Defect.
Therefore, especially if a game would consist of only 1 move,
one might claim that the best strategy would be to always
choose to Defect. This isn’t the case, as in an IPD situation,
your opponent will react to your defection accordingly. Just as
choosing to Defect always gives the best return no matter
what the opponent chooses, independent of your choice it is
also most beneficial for you if the opponent chooses to
Cooperate. Choosing to Cooperate to mutually benefit from a
scenario with the prospect of an increased future pay -off is the
basic idea of reciprocal altruism [11]. The only way to get a
higher reward than the expected result is by Cooperating with
your opponent and by creating a relationship of trust.
Depending on your estimation of what the opponent will do,
you may choose betray him at the right time to gain even
more points.The goal of the game for the players will be to
gain as much points as possible, regardless of what their
opponent does. The Prisoner’s Dilemma is meant to propose a
tough decision regardless of the faith of your opponent, and
we ensure this by taking the VA’s sessions off the ranking list.
Therefore, at the beginning of the game, the players will be
told that they have to gain as much points possible, and that
there score will be put on a ranking list. Additionally, the
participant will only be shown the points gained last round,
not to total scores. This is to prevent players from getting the
urge to play solely against the VA.
The goal for the VA will be to successfully interact with the
opponent; to see to what extent it can influence the player’s
decision. Ideally this would be if the VA would be able to
increase the amount of Cooperated moves the opponent plays,
and by effectively interact with the opponent.
As explained by Axelrod and Hamilton [2], the number of
rounds a game has affects an opponent’s playing style. If a
round contains 15 turns, it would be best to Defect on the last
turn as no future iterations can occur. If you can expect your
opponent to Defect on the last turn n, your best solution is to
also Defect on the (n – 1)th turn, all the way down to the first
turn. To prevent the number of rounds from affecting the
player’s decision, the exact number of rounds will not be
shown to the particip ant. Similarly, in designing the VA, we
will not use take this information into account for its decision
making algorithm.
wants to Cooperate or not. What he will actually do is up to
him. This is the only way for the participant to interact with
the VA, as he is not able to propose to cooperate with the VA
himself. This is to conform to the idea of keeping the
information flow going one direction only. The VA on the
other hand can interact with the human player via text
messages, a facial expression or, as stated, by proposing to
Cooperate.
Appendix A illustrates the playing environment of the game.
Here, the participant will be able to see the scores gained last
round, a facial expression of the VA, text messages from the
VA and the three buttons to play the game. When playing
against the silent AI, the participant will see no face or text
popups.
2.2 Game tactics
It is impossible to define a true tactic that will always yield
the best result. M ost strategies in the IPD incorporate a
decision algorithm that bases the decision in round n + 1 on
what happened in round n. Some more advanced strategies
model the behavior of the opponent as a M arkov process
and/or make use of the Bayesian inference, which are
regularly used for Poker agents as well.
Following Cohen’s conceptualization of iterated decisions [4],
we can expect some players to choose one decision
somewhere in the game, and stick to that decision. Numerous
studies have been conducted to capture a valid pattern
between a player’s moves based on the opponent’s last move.
A study from Rapoport and Dale [10] shows that in two
models the authors rather successfully succeeded in
resembling the progress of a game by separately assuming
that at one point a player will stick to a certain decision, and
by the assumption that “at each play the probability of
choosing C is a linear function of that probability on the
preceding play, whereby the parameters of the transformation
depend on the preceding outcome.”
2.1.3 The test environment
As it is custom with these kinds of studies, the participant will
have to fill in a survey before he can start. Besides the usual
information as a name and email address (to prevent someone
from participating multiple times) the participant is asked a
question that will help us determine what type of player he is.
What question this is, and why this is deemed necessary is
explained in paragraph 2.1.3. When the game is over the
participant will be asked to fill in a small survey to log his
experience. The outline of the survey can be found in
paragraph 3.1.After filling in the entry form, the participant
will play one round against the created VA, and one round
against a silent AI. In order to prevent misleading data due to
the issue that the participant could alter his playing style from
the experience gained playing the first game, the order of the
AI’s the participant is playing against is randomized. Apart
from being able to interpret the final data independent of the
order, we’re also able to note the difference between the first
and second game played.
After the survey the participant will be introduced to Alice,
the created Virtual Agent. While playing the game the player
will have the following two possible actions:

Cooperate

Defect
At all times, the human player will have access to the score
system (Table 2 Point distribution) and the scores achieved
last round to keep him compelled to the game and guide him
in his decision making process.
At a certain time, the VA, and the VA alone, can propose the
participant to Cooperate together. When the human player is
proposed to Cooperate, he can respond to the VA whether he


Au and Komorita [1] conclude in their study that “The results
of this study clearly show that (a) a cooperative strategy - one
that initiates unilateral cooperation at the outset and then
adopts a TFT strategy - is very effective in inducing
subsequent cooperation from the other party, (b) the
effectiveness of a cooperative strategy varies directly with the
cooperative orientation of the other party (a cooperative
strategy is more effective against a cooperative than a
competitive person), and (c) initial cooperation is more
effective if it is repeated more than once.”
Intriguingly, Axelrod and Hamilton [2] showed that perhaps
the best strategy of all would actually be one of the simplest in
design. In entering a computer tournament for the Prisoner’s
Dilemma they attained the highest average score with the
simplest of all strategies submitted: TIT FOR THAT (TFT).
The basis of the TFT strategy revolves around the fact that it
never Defects first, it retaliates when the opponent decides to
Defect but forgives its opponent when he decides to cooperate
again. The TFT strategy proved to be a very robust one.
M oreover, Axelrod and Hamilton showed that once TFT is
established, it can resist invasion by any possible mutant
strategy provided that the individuals who interact have a
sufficiently large probability of meeting again.
That is, if:
ώ ≥ (B – A) / (B – D)
 (5-3)/(5-1)= 2/4 ≤ 0.93
ώ ≥ (B – A) / (A – C)
(5-3)/(3) = 2/3≤0,93
They demonstrate that TFT is evolutionarily stable if ώ is
constant and sufficiently large.
One of the reasons why TFT works so well is because it only
remembers the previous situation. The downside is that if the
opponent only chooses to Defect, TFT will copy this pattern
without trying to persuade the opponent to Cooperate. If the
opponent wants to Cooperate the round after Defecting, the
TFT will choose to Defect that round, causing the opponent to
consider to Defect again the following move. This will result
in a spiral where one of the players has to ‘man up’ and
continue to Cooperate. This is a deficiency that can be
overcome with interaction between players, creating
possibilities for our AI.
Doebeli and Hauert [5] claim that “TFT performs poorly in a
noisy world, in which players are prone to make erroneous
moves that can cause long series of low paying retaliatory
behavior.” They suggest the introduction of probabilistic
strategies, for instance as used by the Generous TFT, where
the probability to retaliate is lowered to 2/3th.
Another way to conquer the weaknesses of TFT is another
equally simple strategy called ‘win – stay, lose –
shift’(WSLS) [7]. WSLS repeats the previous move if the
resulting payoff has met its aspiration level and changes
otherwise. The main argument for WSLS is that it is able to
correct mistakes from the opponent. Two TFT strategies put
against each other can each other to alternate between
Cooperating and Defecting, whilst WSLS corrects this. If the
payoff of a round is A or B (in our case 3 or 5 points resp.),
WSLS will repeat the same action. If the payoff is C or D (0
or 1 point resp.) it will change its move. A second argument
for WSLS is that it can dominate an all-cooperating strategy
by defecting, while the TFT-strategy will copy the opponent’s
move. A final reason to choose WSLS over TFT would be
because with TFT, the game can get stuck in a situation where
everybody chooses to Defect. WSLS gives the opponent a
reason to Cooperate. Imhof, et al. [7] showed that in an IPD
with noise, where the benefit-to-cost ratio exceeds 3, the
WSLS strategy (also called Pavlov) is superior to TFT. If not,
it’s best to always defect.
The issue with SLS however, is that it is only superior to TFT
in a noisy environment because it ‘forgives’ the opponent if
the game is going to wrong way. It also dominates AllCooperating strategies, but when playing against human
opponents or any capable AI, none of them will continue
Cooperating when the opponent has Defected for the last few
rounds. For both players to Cooperate, trust is necessary. This
creates the problem of having to ‘trust’ the opponent with
nothing to substantiate this trust. WSLS blindly trusts the
opponent if the game is going the wrong way. This way, a
strategy that always Defects, exploits WSLS in every second
round, whereas this is not the case with TFT.
Integrating player interaction with TFT, so that it can, up to a
certain extent, influence and predict the opponent’s next move
might surpass them both. In order to substantially evaluate the
effect of the interaction, the participant will play one round
against a silent TFT-strategy, and one round against a
modified TFT-strategy which incorporates player interaction.
the Cooperatively- and Individualistically orientated players
cooperated with a TFT strategy.
We’ll be using a strategy that is rather similar to TFT, thus
creating an overlap in playing styles between cooperatively
and individualistically oriented players. Because of this and
because the participants are enforced not to play
competitively, the participant will be categorized into one of
the following styles:
2.3 Categorizing the human player
This can be earned by either applying to the opponent’s
cognitive reasoning or by playing into his emotional state. A
Virtual Agent can use reasoning to make the opponent behave
as required, or can use kindness or guilt against him. Textual
attempts to play into the opponent’s emotions can be
accompanied with visual representation of a human emotion
(e.g. avatars or emoticons) in order to be more affective and
make the player feel like he’s playing against something more
than just a computer.
Geiwitz [6] concluded that cooperation between players
increased when the players were able to send both threats and
This way, we won’t have to base our assumptions on what
type of player we’re playing against on analysis of previous
moves played, but can instantly and fault free interact
accordingly.
In their experiment, Kuhlman and M arshello [8] divided their
participants into three group with the following motivational
orientations: Cooperative, Competitive and Individualistic.
Their study showed that Competitive players Defected against
TFT, 100% Cooperation and 100% Defection strategies while
1)
2)
Cooperating player
Self-Interested player
Cooperating players will try to Cooperate as much as possible,
because they see a mutual gain. They will not only look at
what’s best for him individually, but also sees value in
helping his opponent. He will be more easily persuaded into
Cooperating. The Self-Interested player on the other hand will
have a tendency to Defect, or only act in his best interest. He
will rationalize from his own perspective and only reacts on
what’s in his best interest.
In order to categorize the participant to match his playing
style, the participant will fill in a survey before starting the
game. Apart from the general questions, the participant will
fill in a question that will determine what type of player he is.
The participant will be asked the following question:

If I were to give you 50 euros and ask you to divide
it between you and your best friend, would you do
it? He will not find out if you choose not to split it.
This simple question in fact determines two aspect of the
participant. The first is to what extend he cares about the faith
of his friend. The second is to find out if he can be motivated
into splitting the money. Self-Interested players will see this
as an opportunity to gain 50 euros, especially if the friend
doesn’t find out. Cooperating players will have no problems
splitting the money as they got it for free and they are asked
to; they care about the faith of their friend. This question
proved to be accurate in 72% of the cases in regards to their
own insights (see paragraph 3.1).
2.4 Influencing the game
No matter what move you choose, it is most beneficial for you
if the opponent chooses to cooperate with you. A major factor
of the game is thus to convince the opponent to Cooperate
with you. This includes both positive stimulation when the
opponent chooses to Cooperate as well as a negative response
when the opponent chooses to Defect. In order to influence
the opponent’s decision, the Virtual Agent must be able to
interact with its opponent. The VA has the following ways to
interact:

Textual messaging

Facial expressions

Propose to cooperate

Choose what move to play (indirect)
messages of intention to each other in an IPD instead of solely
messages of intention. This indicate that threats do seem to
have function in an IPD.
Based upon the possible moves that are played in a round,
there can be four outcomes of the last round to respond to.
When the opponent and Alice both Cooperate, the visual
reaction must be joy. When both players choose to Defect,
shame/a sign of wastefulness would be correct because this
way, no one hardly gains any points. When the opponent
chooses to Defect while Alice Cooperates, anger will be
deemed as the correct emotional response. These are all fairly
straight forwarded emotional responses, but when Alice
Defects and the opponent Cooperates we can choose between
two responses to give. One would be to show joy/laughter.
The only effect his will achieve, however, is for the opponent
to get angry, and start Defecting out of retaliation; a situation
not preferable for both parties. The chosen emotional response
is that of a classic ‘oops’. This way, the opponent will
presumably not hold too much of a grudge against Alice.
The facial responses are deemed as suitable responses to both
a Cooperative as a Self-Interested player. The textual
messages however, are different for each player. Table 3
shows the Alice’s textual responses to a certain move played
last round, depending on the ty pe of player she’s up against.
Table 4 shows captions of Alice’s facial responses to the
moves previously played. In the actual game the participants
are shown a series of movies to illustrate Alice’s reaction in
contrast to just images.
Table 3 Textual responses
Last round’s moves played
start to Cooperate again!
Table 4 Facial responses
Last round’s moves played
Alice’s facial response
P: Coop. VA: Coop (joy)
P: Coop. VA: Defect (oops)
Textual response
P: Coop. VA: Coop
Against Coop player
We are an awesome team!
Against SI player
Let's keep Cooperating for
more points!
P: Defect VA: Coop. (anger)
P: Coop. VA: Defect
Against Coop player
Oops, I'm sorry. I wasn't sure
you would change your
mind. Let's both Cooperate
next round!
Against SI player
Your actions in the previous
rounds didn't convince me
you would Cooperate. Let's
start with a clean slate and
both Cooperate next round!
P: Defect VA: Defect
(shame/wastefulness)
P: Defect VA: Coop.
Against Coop player
Why did you do that? I
thought we were a team!
Let's work together
Against SI player
You betrayed me! You better
start Cooperating or I will
start working against you!
P: Defect VA: Defect
Against Coop player
I don't want to do this, but
you force me to. Please show
me a sign you want to start
working together again!
Against SI player
If this is how you want to
play it, I'll Defect until you
When playing against a Cooperating player, the VA will
choose to use kind words while meanwhile trying to make the
opponent feel guilty for Defecting. A Self-Interested player
might respond better to threats or theoretical reasoning that is
in line with his own goal.
Note: The text in the situation Player: Defect, VA: Defect
only appears when the player betrayed the VA after agreeing
to Cooperate. M ore on this in paragraph 2.5.
2.5 Designing the Virtual Human
The VA is designed by integrating aspects of the Tit-For-Tat
strategy with the interactions proposed in paragraph 2.4. The
goal is to overcome the weaknesses of Tit-For-Tat by
achieving a solid base for predicting a player’s response to a
certain situation and action that is followed by it. Because up
to date, no studies can be found that give useful insight in to
setting up a solid base, the VA will be designed to test the
effect the proposed interaction has on the opponent. This
information is vital for understanding to what extend it is
possible to successfully use interaction with the opponent to
your advantage.
If we take a look at the Tit-For-Tat strategy and take into
account that TFT starts off by Cooperating, the possible game
states are shown in Figure 1. In the illustration, S represents
the start of the game. DC describes a state where Alice’s
opponent Defects, whilst Alice chooses to Cooperate. The
numbers on the green circles illustrate the four unique game
states. The yellow circles redirect to the green states.
state where both players choose to Defect, Alice can propose
the opponent from making a pact to Cooperate together. If the
player agrees to Cooperate, Alice can (and in our case will)
choose to Cooperate as well. A representation of the possible
states is shown in Figure 2. A clearer illustration can be
consulted in Appendix B.
With the option to propose, a possible problem might arise
when the opponent chooses to betray Alice’s trust by agreeing
to Cooperate, whilst in fact choosing to Defect. For this, a
trust system with so-called M ojo points is introduced. A
player starts off with 1 point. Whenever he chooses to Defect,
he will lose 1 point. The next round, conforming to TFT,
Alice will choose to Defect. If the player chooses to
Cooperate the following two rounds, he will be rewarded with
a M ojo point. If he decides to Cooperate once, and Defect
after that, he’ll end up back in state 2 with no M ojo points. If
he decides to Defect two times in a row, he will reach state 4.
If Alice trusts the player enough (player has equal to zero or
more M ojo points), he will be offered a proposition to
Cooperate. If he agrees, and Cooperates afterwards, Alice will
reward him with a M ojo point and the player will end up back
in state 1 with one M ojo point. He will keep this until he
chooses to Defect again. This system resolves around the fact
that the player will normally have 0 or 1 M ojo points. In a
Cooperation cycle he will have 1 point, if he exits that cycle
by Defecting, he will lose his M ojo point. If the game reaches
state 4, the player is normally offered a proposition to
Cooperate. If he agrees to Cooperate, but decides to Defect
anyway, he will lose 1 M ojo point. As the saying goes: Fool
me once, shame on you. Fool me twice, shame on me;
whenever the opponent chooses to betray Alice he will end up
in a cycle where it is impossible to reach zero or more M ojo
points when getting in state 4. If the opponent ends up in state
4 when Alice’s trust is lost, Alice will move over to the
general TFT strategy, while trying to both visually and
textually persuade her opponent into Cooperating, as outlined
in Table 3 and
Table 4.
Figure 1 Game states pure TFT
Before we have grounds to successfully deceive and betray
the opponent, we want to prevent the following two scenarios
from happening:

A game where players alternatively Cooperate
and Defect

A game that’s stuck where both players continue
to defect
We attempt to solve the first situation by mere textual and
facial interaction with the opponent. By reacting positively to
Cooperation from the opponent, and negatively to a
Defection, we aim to influence the opponent into Cooperating
more often. The textual response, as stated, depending on the
type of opponent the VA is up against.
To prevent a game from reaching a state where both players
resort to doing nothing but choosing to Defect, Alice can
propose her opponent to a pact. When the game reaches a
Figure 2 Game S tates Alice
3. FINDINGS AND CONCLUSION
3.1 Experimental findings
A population of 28 Dutch participants in the range from 11 to
51 years old and an average age of 23 years participated in
this experiment.
Before playing against Alice and the silent AI, the participants
had to fill in the following question:

If I were to give you 50 euros and ask you to divide
it between you and your best friend, would you do
it? He will not find out if you choose not to split it.
From the 28 participants, only four answered the question by
saying they would keep the money for themselves.
At the end of the experiment the participants were asked
whether they saw themselves as Cooperating or SelfInterested players.
From those 4 participants that answered the question by
choosing to keep the money (thus being marked as a SelfInterested player by the game), all identified themselves as a
Self-Interested player. From the remaining 25 players,
however, eight of them saw themselves as a Self-Interested
player. This comes down to a correct interpretation of

(1 – (8/29)) * 100% ≈ 72%
At the end of the game, the participants were asked the
additional following questions:
1.
2.
3.
On a scale from 1 to 5 (5 meaning 'A lot'), did Alice
influence you in making a decision?
On a scale from 1 to 5 (5 meaning 'A lot'), did you feel
remorse when you betrayed Alice?
Did you think Alice was cleverer than the other PC
player?
The results to these questions are listed in Table 5.
Response >
Question \/
1)
2)
3)
Table 5 S urvey results 1
1 / 2
/ 3 / 4
yes
the same no
8
4
5
9
16
5
4
2
4
9
16
5
3
2
Oddly enough, from these results, we can conclude that most
of the population thought of the silent player as the most
clever one, while most off the population admitted to feel
almost no remorse for the VA at all. Some participants
responded that the VA succeeded in influencing their
decision, but this could go both ways.
In playing the game, twelve participants first encountered the
silent AI first, while the remaining sixteen played a game
against Alice first. Table 6 lists data from the experiment. To
recapulate, a participant plays a round of 40 turns in total. He
will either play against the Silent (TFT) AI first, or against
Alice first. Both rounds take 20 turns each.
Table 6 Data game order
Silent first
Alice
first
Number of participant
12
16
Avg. Cooperated moves per game
24.33
24.13
Avg. total cooperation moves vs.
Silent AI
11.75
12.19
Avg. total cooperation moves vs.
Alice
12.58
11.94
Total avg. points gained by player
101
100.25
Total avg. points gained by AIs
90.17
91.19
1
Note: This table shows 29 entries in total as with one of the
games the data of the moves made were not transferred to the
server correctly.
From these data we can assume that the order in which the
participants play the game does not significantly improve the
total points gained. It does however affect the Cooperation
rate. On average, players will Cooperate more often the
second round they play, meaning. On average, the player
shows an average increase of roughly 4.2% when playing for
the second time.
For comparing statistics, we therefor multiply the amount of
Cooperative moves for the second game by 0.958. The results
are shown in Table 7.
Table 7 Data game order - adapted
Silent first
Alice
first
Avg. total cooperation moves vs.
Silent AI
11.75
11..68
Avg. total cooperation moves vs.
Alice
12.05
11.94
On average, this shows that people were 2.4% more likely to
Cooperate with Alice than against a Tit-For-Tat strategy.
Although this shows an increase in favor of the developed
VA, this number is not significant enough to confidently
conclude we were able to influence the participant’s behavior.
If we look into the data of the moves played however, we see
that of the 55 times that Alice proposed to Cooperate, people
decided to agree with the proposal 47 times. From the eight
times the proposal was rejected, three (individual) participants
actually chose to Cooperate anyhow. From the 47 eight
proposals that got accepted, 29 times the participant actually
Cooperated, meaning that only in about 37% of the cases the
participant kept true to his word.
One of the goals of the tweaked TFT strategy was to prevent
the game from reaching a state where both players Defected
continuously. With the plane TFT strategy, there were 189
instances where two consecutive moves from the participant
were to Defect and there were 140 instances where a
participant Defected three times in a row. Against Alice
however, only 154 instances occurred where a participant
would Defect twice in a row, whilst it occurred only 83 times
where a participant would Defect three times in a row. This
comes down to a decrease of roughly 26% and 41%
respectively.
3.2 Conclusion
From the data listed in paragraph 3.1, we can safely conclude
that we were not able to significantly alter the opponent’s
playing style. The participants proved not to be trustworthy
enough for the used (M ojo) model. Also, the data in Table 5
shows that we did not succeed in creating a strong enough
bond between the player and the VA to make the player care
about whether he betrayed Alice. M oreover, judging from
amount of participants that did not care about betraying Alice
and who thought that the silent AI was in fact the smartest, we
may safely say that it is possible the participants mistook
Alice’s kindness as a weakness.
Because the number of Cooperated moves did not
significantly increase with the designed VA and the success
rate after an agreement to Cooperate is only around 37% we
have not found any evidence to indicate a possibility to
predict the opponent’s move based on interaction between the
VA and the human opponent. Thus, the findings cannot be
used to integrate predictions of what the opponent will do next
into the decision making algorithm. Based on the statistics of
consecutive moves defected by the opponent, we can however
conclude that the used model contributes towards preventing
continuing series of Defecting moves. Although a population
of 28 is not representative enough to substantiate the claim
that we succeeded in improving one of the flaws of the Tit For-Tat strategies, the results seem to indicate so.
4. REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Au, W.T. and Komorita, S.S. Effects of Initial
Choices in the Prisoner's Dilemma. Journal of
Behavioral Decision Making, 15 (4). 343-359.
Axelrod, R. and Hamilton, W.D. The evolution of
cooperation. Science, 211 (4489). 1390-1396.
Billings, D., Davidson, A., Schaeffer, J. and
Szafron, D. The challenge of poker. Artificial
Intelligence, 134 (1-2). 201-240.
Cohen, B.P. Conflict and conformity: a probability
model and its application. M .I.T. Press, Cambridge,
M ass., 1963.
Doebeli, M . and Hauert, C. M odels of cooperation
based on the Prisoner's Dilemma and the Snowdrift
game. Ecology Letters, 8 (7). 748-766.
Geiwitz, P.J. The effects of threats on prisoner's
dilemma. Behavioral Science, 12 (3). 232-233.
Imhof, L.A., Fudenberg, D. and Nowak, M .A. Titfor-tat or win-stay, lose-shift? Journal of
Theoretical Biology, 247 (3). 574-580.
Kuhlman, D.M . and M arshello, A.F. Individual
differences in game motivation as moderators of
preprogrammed strategy effects in prisoner's
dilemma. Journal of Personality and Social
Psychology, 32 (5). 922-931.
Pelc, A. and Pelc, K.J. Same game, new tricks:
What makes a good strategy in the Prisoner's
Dilemma? Journal of Conflict Resolution, 53 (5).
774-793.
Rapoport, A. and Dale, P. M odels for Prisoner's
Dilemma. Journal of Mathematical Psychology, 3
(2). 269-286.
Trivers, R. The Evolution of Reciprocal Altruism.
The Quarterly Review of Biology, 46 (1). 35-57.
APPENDICES
4.1 Appendix A
4.2 Appendix B

Download Report

Creating An Virtual Agent By Integrating Player Interaction Into Its

Paperzz.com

Your Paperzz