Expected Utility (Share) = Payoff Total × Probability

Iterated Prisoner’s Dilemma
Candidate Name: (redacted)
Candidate Number: (redacted)
GEMS Wellington International School
1
Table of Contents
Introduction and Background........................................................................................ 3
Key terms........................................................................................................................................ 4
Strategies ............................................................................................................................... 5
Dominant strategy equilibrium.............................................................................................. 5
Nash Equilibrium ......................................................................................................................... 5
The Experiment ................................................................................................................... 5
Probabilities ......................................................................................................................... 6
Probability Tree Diagram ................................................................................................ 7
Formula for binomial distribution: ........................................................................... 10
Analysis of the two methods utilized ........................................................................ 11
Utility ................................................................................................................................... 12
Conclusion .......................................................................................................................... 13
Bibliography ...................................................................................................................... 15
References .......................................................................................................................... 15
2
Introduction and Background
In my search for a math IA idea, I became fascinated with the idea of game theory,
particularly prisoner’s dilemma. My interest came about after watching a video about
the British game show ‘Golden Balls’. The idea is that each of two final contestants
must decide whether they want to split the winning money with each other or choose
the option to take the money for themselves. The two main issues are that each player
is not allowed to tell the other contestant about their decision and if both contestants
decide to take the money for themselves then they will both end up receiving none.
The element of trust and rational human thinking was an interesting concept I took
away from the video and I decided to investigate this further using the Prisoner’s
dilemma concept.. I was interested in testing the iterated prisoner’s theory for myself
so through this model, I decided to investigate the effects it would have on my little
brother and sister because they are both closer to each other than they are with me. I
elected to test the concept on my little brother and sister with a few slight changes.
Instead of jail sentences, I opened a pack of gummy bears and gave each of them two
choices; share or take. I aimed to use the Prisoner’s dilemma theory because I wanted
to test the trust between my two younger siblings and to see if they would put their
own benefits first rather than cooperate.
Prisoner’s dilemma is a concept that was derived by two American scientists; Merrill
Flood and Melvin Dresher. A Princeton mathematician called Albert W. Tucker later
formalized it. Basically the prisoner’s dilemma is a situation in which two or more
individuals must make decisions that will either improve their own situation while
leaving the others worse off or finding a mutual means of cooperation. A basic
example is two convicts suspected of being part of a robbery who are facing prison
sentences. The police are certain that they can extract additional information out of
both criminals to uncover the location of the stolen items and other suspects involved
in the crime.
The police take both convicts to different rooms and proceed to give both men two
choices: either confess to the crime or deny. The convicts are not allowed to discuss
with each other and therefore do not know the actions of one another.
Convict 1
Convict 2
Confess (C)
Deny (D)
Confess (C)
(-2,-2)
(0,-9)
(-9,0)
(-10,-10)
Deny (D)
Figure 1: The matrix showing the payoff for each convict. The
numbers denote the number of years both convicts may face in
prison
3
Figure 1 shows the list of possible options both convicts are given. The numbers are
denoted as the number of years in prison both convicts face depending on their
actions. The reason the number of years is given as a negative is because it would
be more beneficial for the convicts to minimize the time each of them spends in
jail or in other words, receive the highest payoff. The payoff is therefore larger
in the (C,C) box as -2 > -10 which makes it the most preferable choice for both
convicts. As shown, it is tempting for each criminal to confess to the police.
However, there is the possibility that both will confess leading to the maximum
sentenced prison time. On the other hand, both can confess and receive less
substantial time in prison. The biggest issue here is that the choices the convicts make
will affect each other.
I discovered through further research that even the simplest real life situations such
as the game of rock, paper, scissors, and penalty kicks taken in football all contain
similar elements of prisoner’s dilemma. That said, I decided to investigate the
mathematics behind iterated prisoner’s dilemma. Iterated prisoner’s dilemma is
essentially prisoner’s dilemma repeated over again in a situation. For example, in
figure 1 we saw the number of years both convicts faced depending on their actions.
Suppose convict 1 chose to confess and convict 2 decided to deny. Convict 2 would
serve prison time for nine years while convict 1 walk. If both convicts were caught in
a robbery again and faced with the same options, convict 2 may choose to confess
instead of denying based on convict 1’s previous actions.
Key terms
For iterated prisoner’s dilemma, each individual will have a strategy profile and
utility gained from using strategies within the strategy profile:
s
s’
Strategy profile (S)
Utility [U(S)]
This is a strategy within This is a strategy within This is a set of possible This is the
the strategy profile (S)
a second strategy profile strategies (s) that each
(expected) utility
(S’)
individual player can
that each individual
use in the game.
can gain from the
(Example: S convict 2 =
strategy profile (S).
(s1, s2, …, sn))
∀ = this notation means “for all”
∃ = this notation means “there exists”
Certain strategies (s) are classified as strongly dominate and weakly dominate:
Strongly dominate: A strategy where U(s) > U(s’) for all cases
∀ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) > U (s1’, s2’, s3’…, sn’) (Nau, 2010)
Weakly dominate: A strategy is weakly dominate over others if U(s) ≥ U(s’) but
there is one case where U(s) > U(s’) (Nau, 2010)
∀ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) ≥ U (s1’, s2’, s3’…, sn’)
4
and
∃ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) > U (s1’, s2’, s3’…, sn’)
Strategies
Dominant strategy equilibrium
This is when all strategies (s) in the set of (S) give U(S) > U(S’).
In Figure 1 with the convicts for example, the dominant strategy is (Confess, Confess)
as it gives both convicts relatively short jail sentences so it is therefore the most
appealing and worthwhile strategy.
Nash Equilibrium
The dominant strategy equilibrium is also known as Nash Equilibrium. This is the
strategy profile that gives the best utility meaning that no individual can achieve
better utility by switching strategies unilaterally (Nau, 2010)
The Experiment
Brother (B)
Sister (S)
Share (S)
Take (T)
Share (S)
5, 5
3, 7
Take (T)
7, 3
2, 2
Figure 2: The matrix showing the
payoff for each choice
The notation for each set of numbers is the number of gummy bears available to my
brother and sister depending on the choices they made. For example, should both
choose to share, each of them would receive 5 gummy bears each.
5
The payoff for each of them was as follows:
-
If one player chooses to share and one choose to take, the player sharing will
get only 3 gummy bears compared to the taker who will get 7
If both players choose to share, both will get an equal 5 gummy bears
If both players choose to take, both will get an equal 2 gummy bears
The dominant strategy profile, and therefore the Nash equilibrium, in this case was
(S,S). I was also curious to observe which strategies my siblings would pick.
Pure Strategy
This is a single action strategy that is chosen by the player in every round
Mixed Strategy
This strategy includes a mixture of pure strategies
To test the rationality of each of my siblings, I attempted to predict the probability of
different choices they would make based off the probability of the only two available
choices. Through this, I wanted to predict the payoff and overall expected utility that
each would gain. The game lasted for 5 rounds in total.
Probabilities
In round 1, my brother chose (Share) and my sister chose the (Take) option.
Notations:
Sb
Sx
x
p
b
Strategy
Strategy
This variable
Probability of
This variable
profile of my
profile of my
represents my
my little sister represents my
little brother
little sister
little sister
little brother
Sb (Round 1) = (S, T)
 Payoff b = 3
 Payoff x = 7
Consider the probability of my sister making either one of the choices:
Sx (Share) = p
Sx (Take) = 1 – p
Expected Utility(Share) = Payoff Total × Probability
 Expected U(Share) = ∑(𝑃𝑎𝑦𝑜𝑓𝑓)𝑃(𝑆𝑆′ )
US = (7 + 5) × (p)
6
US = 12p
Expected Utility (Take) = Payoff Total × Probability
US = (2 + 3) × (1 – p)
US = 5 – 5p
Since US should be the same for both cases:
12p = 5 – 5p
5
p = 17
So the two choices in my sister’s strategy profile had probabilities of:
5
S(Share) = 17
12
S(Take) = 17
Which therefore meant that my little brother’s strategies in his strategy profile had
probabilities of:
5
S(Take) = 17
12
S(Share) = 17
Probability Tree Diagram
Figure 3: A probability tree of my
sister’s choices
7
There are overall 32 combinations of strategies my sister could have chosen from:
12
12
12
12
12
(TTTTT) = (17) (17) (17) (17) (17) =
248,832
1,419,857
12
5
5
5
5
7,500
12
5
5
5
12
18,000
12
5
5
12
5
18,000
12
5
5
12
12
43,200
12
5
12
5
5
18,000
12
5
12
5
12
43,200
12
5
12
12
12
103,680
12
5
12
12
5
43,200
12
12
5
5
5
18,000
12
12
5
5
12
43,200
12
12
5
12
5
43,200
12
12
5
12
12
103,680
12
12
12
5
5
43,200
17
17
17
17
17
1,419,857
12
12
12
5
12
103,680
12
12
12
12
5
103,680
5
12
12
12
12
× 100 = 17.5%
(TSSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53%
(TSSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(TSSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(TSSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(TSTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(TSTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(TSTTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30%
(TSTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(TTSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(TTSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(TTSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(TTSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30%
(TTTSS) = ( ) ( ) ( ) ( ) ( ) =
× 100 = 3.04%
(TTTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30%
(TTTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30%
(STTTT) = (17) (17) (17) (17) (17) =
5
5
5
5
5
103,680
1,419,857
× 100 = 7.30%
3,125
(SSSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.22%
8
5
5
5
5
12
7,500
5
5
5
12
5
7,500
5
5
5
12
12
18,000
5
5
12
5
5
7,500
5
5
12
5
12
7,500
5
5
12
12
12
43,200
5
5
12
12
5
18,000
5
12
5
5
5
7,500
5
12
5
5
12
18,000
5
12
5
12
5
18,000
5
12
5
12
12
43,200
5
12
12
5
5
18,000
5
12
12
5
12
43,200
5
12
12
12
5
43,200
(SSSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53%
(SSSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53%
(SSSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(SSTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53%
(SSTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(SSTTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(SSTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(STSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53%
(STSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(STSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(STSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(STTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27%
(STTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
(STTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04%
For each strategy profile containing the same number of (Take) and (Share) actions, I
added the repeated percentages. Possible outcomes:
-
At least one (Share) action chosen by sister = (TTTTS), (TTSTT), (STTTT),
(TSTTT), (TTTST)
= 7.30% + 7.30% + 7.30% + 7.30% + 7.30%
= 36.5%
-
At least two (Share) actions chosen by sister = (TSSTT), (TSTST), (TSTTS),
(TTSST), (TTSTS), (TTTSS), (SSTTT), (STSTT), (STTST), (STTTS)
9
= 3.04% + 3.04% + 3.04% + 3.04% + 3.04%+3.04% + 3.04% + 3.04% +
3.04% + 3.04%
= 30.4 %
-
At least three (Share) actions chosen by sister = (TSSST), (TSSTS), (TSTSS),
(TTSSS), (SSSTT), (SSTST), (SSTTS), (STSST), (STSTS), (STTSS)
= 1.27% + 1.27% + 1.27% + 1.27% + 1.27%+1.27% + 1.27% + 1.27% +
1.27% + 1.27%
= 12.7%
-
At least four (Share) actions chosen by sister = (TSSSS), (SSSST), (SSSTS),
(SSTSS), (STSSS)
= 0.53% + 0.53% + 0.53% + 0.53% + 0.53%
= 2.65%
-
No (Share) actions chosen by sister = (TTTTT)
= 17.5%
This method was a tedious and repetitive, which may have led to errors in
calculations. So I opted to use binomial distribution, since the choices made during
the game were independent of each other. I used binomial distribution to calculate the
probability of a certain number of (Share) choices being made. I performed these
calculations for my sister’s strategy with the results of the first round on mind.
Formula for binomial distribution:
P(r) = (𝑛𝑟)(pr)(qn-r)
Where r is the number of (Share) choices out of 5 rounds, n is the number of rounds, p
is the probability of (Share) occurring, and q is 1 – p
Consider the possible outcomes:
-
At least one (Share) option chosen by sister = (TTTTS)
At least two (Share) options chosen by sister = (TTTSS)
At least three (Share) options chosen by sister = (TTSSS)
At least four (Share) options chosen by sister = (TSSSS)
No (Share) options chosen by sister = (TTTTT)
5 1 12 4
At least one: P(1) = (51) (17) (17)
5!
5 1 12 4
= (1!(5−1)!) (17) (17)
= 0.365107
= 36.5%
10
5 2 12 3
At least two: P(2) = (52) (17) (17)
5!
5 2 12 3
= (2!(5−2)!) (17) (17)
= 0.304256
= 30.4%
5 3 12 2
At least three: P(3) = (53) (17) (17)
5!
5 3 12 2
= (3!(5−3)!) (17) (17)
= 0.126773
= 12.7%
5 4 12 1
At least four: P(4) = (54) (17) (17)
5!
5 4 12 1
= (1!(5−4)!) (17) (17)
= 0.026411
= 2.6%
5 0 12 5
No (Share options: P(0) = (50) ( ) ( )
17
5!
17
5 0 12 5
= (0!(5−0)!) (17) (17)
= 0.175252
= 17.5%
To summarize:
Probability of no (Share) options chosen in 5 rounds = 17.5%
Probability of one (Share) option chosen in 5 rounds = 36.5%
Probability of two (Share) options chosen in 5 rounds = 30.4%
Probability of three (Share) options chosen in 5 rounds = 12.7%
Probability of four (Share) options chosen in 5 rounds = 2.6%
Analysis of the two methods utilized
The probability tree is a basic method of visually representing all possible actions that
either one of my little brother and sister could have chosen. It is essentially a visual
representation of the binomial distribution formula, which simplifies the repetitive
nature of the probability tree diagram.
For example, the probability of there being one (Share) action occurring would mean
that I would need to multiply one (Share) probability values with four (Take) values.
This would need to be repeated as there are different paths that could be taken to
obtain the same result, such as (TSTTT) and (TTSTT). I would also have to add all
11
these values to obtain the actual probability percentage ((TSTTT) + (TTSTT) +
(TTTST) +…).
§
The binomial distribution simplifies this tedious method. Using the formula for
𝑛!
combinations 𝑟!(𝑛−𝑟)! I can easily take into account all of the above repetitions for
three (Share) actions. I would simply multiply this by the two probability values of
(Share) and (Take), both of which would be to the power of how many times each
occurs e.g. three (Share) actions and two (Take) actions would mean multiplying
5 1
12 4
(17) and (17) .
The binomial distribution can be derived from the probability tree and thus shows the
relation between the two. By picking a certain probability of certain outcomes
occurring such as three (Share) and two (Take) actions, I need to know the number of
ways this can occur. As shown in the above calculations, I found that there are exactly
10 ways this outcome can happen. For example (SSTST).
Let p be the number of successful (Share) actions occurring. This would therefore
mean that p – 1 is the probability of the (Take) actions occurring. We can substitute q
for p – 1:
(p×p×q×p×q) = (p3q2) = (p3)(q2)
As stated earlier, the number of ways three (Share) actions occurs is 10. So the
probability of a number of 3 (Share) actions occurring P(X = 2) is:
P(X = 2) = 10(p3)(q2)
It can also be concluded that if we use the combinations formula where we
choose r possible number of (Share) actions in a set of n trials, or in other words
(𝑛𝑟),we should still get 10:
(53) = 10
So to simplify the equation, we get:
P(X = 2) = (53) (p3)(q2) which is the same as saying P(r) = (𝑛𝑟)(pr)(qn-r)
Utility
The Expected Utility is the outcome that is expected after selecting a certain action
(Silkin)
The path my sister ended up choosing after the final round was (TTSTS). The
expected utility was:
Ux = ∑(𝑃𝑎𝑦𝑜𝑓𝑓)𝑃(𝑆𝑆′ )
12
Ux = (Payoff1 + Payoff2 + Payoff3 + Payoff4 + Payoff5)(Probability of at least two
(Share) choices occurring)
Total Payoff = (7+2) + (7+2) + (5+3) + (7+2) + (5+3) = 43
30.4
Ux = [(7+2) + (7+2) + (5+3) + (7+2) + (5+3)]( 100 )
Ux = 13.072
This value is actually much higher than her actual expected utility as it takes into
consideration both possible payoffs if my sister chose (Take) which were either 7
gummy bears or 2 gummy bears. My brother’s strategy profile, (STTST) which is
commonly classified as being tit-for-tat (TFT), is not taken into account in this
equation.
If my brother’s responses were considered, then summation of my sister’s payoff is
actually 22, as it deducts the possibility of obtaining a payoff from both a (Share) or
(Take) action, compared to what was assumed above as being 43. In other words:
Total Payoff = 7+2+2+7+2 = 22
So the correct expected utility for my sister is:
30.4
Ux = (22)( 100 ) = 6.688
Conclusion
The experiment I carried out with the help of my brother and sister was to determine
the rational thinking of two related individuals given choices that would directly
affect how both individuals see each other. This was why I chose to use an item that
both my siblings had a common interest in and divided it between them. The
prisoner’s dilemma strikes me as a very unique topic of exploration due to the
existence of mathematics in such a psychological concept and how two reasonable
human beings will be forced to act under pressure or given tough decisions. It was
fascinating to see how my own siblings reacted given their limited options.
On the whole however, the results of this exploration are limited to only this situation
due to the fact that my siblings are very close and may generally find making
decisions in situations such as the one I placed them in very easy. My brother was
initially willing to share gummy bears while my sister decided to take them for herself
instead. Her actions would have evidently lead to my brother feeling betrayed and
thus making him retaliate purely for the desire of getting back at her. In a situation
where both individuals do not know each other, as was the case on the Golden Balls
show, there may be different courses of action taking as there would be less emotional
attachment to the other player. In the end however, the tit-for-tat nature of both
strategies ended in both receiving the same payoff, which should not to be confused
with expected utility.
Another observation I made was the situation I put them in. I tempted them both by
providing them with their favourite candy and the desire for this item may have made
13
both siblings less reasonable due to greed. The same can be said about any situation.
Humans are in fact not as rational as can be thought of. The desire to gain the most
utility in most prisoners’ dilemma situations would hamper any possibility of
cooperation between two players. So theoretically it may be possible to determine
expected utility through the probability of certain actions occurring however in
practice, results may differ.
Overall, using the Prisoner’s Dilemma theory, I was able to determine my aim which
was to see whether my brother and sister would put their own well being in front of
each other’s well-beings. My sister evidently decided to do so which lead to a reaction
from my brother to make the same choices she made (as aforementioned, a tit-for-tat
situation).
14
Bibliography

Nau, D. S. (2010). CMSC 421, Intro to AI - Spring 2010. Retrieved October
21, 2015, from http://www.cs.umd.edu/~nau/cmsc421/game-theory.pdf

Silkin, N. (n.d.). Game Theory and Expected Utlility Theory. Retrieved
October 17, 2015, from Academia:
http://www.academia.edu/5541272/Game_Theory_and_Expected_Utility
_Theory
References
-
http://www.econlib.org/library/Enc/PrisonersDilemma.html
-
http://math.ou.edu/~epearse/talks/prisoners-dilemma.pdf
-
http://ibmathsresources.com/tag/prisoner-dilemma/
-
https://www.youtube.com/watch?v=qIzC1-9PwQo
15