Iterated Prisoner’s Dilemma Candidate Name: (redacted) Candidate Number: (redacted) GEMS Wellington International School 1 Table of Contents Introduction and Background........................................................................................ 3 Key terms........................................................................................................................................ 4 Strategies ............................................................................................................................... 5 Dominant strategy equilibrium.............................................................................................. 5 Nash Equilibrium ......................................................................................................................... 5 The Experiment ................................................................................................................... 5 Probabilities ......................................................................................................................... 6 Probability Tree Diagram ................................................................................................ 7 Formula for binomial distribution: ........................................................................... 10 Analysis of the two methods utilized ........................................................................ 11 Utility ................................................................................................................................... 12 Conclusion .......................................................................................................................... 13 Bibliography ...................................................................................................................... 15 References .......................................................................................................................... 15 2 Introduction and Background In my search for a math IA idea, I became fascinated with the idea of game theory, particularly prisoner’s dilemma. My interest came about after watching a video about the British game show ‘Golden Balls’. The idea is that each of two final contestants must decide whether they want to split the winning money with each other or choose the option to take the money for themselves. The two main issues are that each player is not allowed to tell the other contestant about their decision and if both contestants decide to take the money for themselves then they will both end up receiving none. The element of trust and rational human thinking was an interesting concept I took away from the video and I decided to investigate this further using the Prisoner’s dilemma concept.. I was interested in testing the iterated prisoner’s theory for myself so through this model, I decided to investigate the effects it would have on my little brother and sister because they are both closer to each other than they are with me. I elected to test the concept on my little brother and sister with a few slight changes. Instead of jail sentences, I opened a pack of gummy bears and gave each of them two choices; share or take. I aimed to use the Prisoner’s dilemma theory because I wanted to test the trust between my two younger siblings and to see if they would put their own benefits first rather than cooperate. Prisoner’s dilemma is a concept that was derived by two American scientists; Merrill Flood and Melvin Dresher. A Princeton mathematician called Albert W. Tucker later formalized it. Basically the prisoner’s dilemma is a situation in which two or more individuals must make decisions that will either improve their own situation while leaving the others worse off or finding a mutual means of cooperation. A basic example is two convicts suspected of being part of a robbery who are facing prison sentences. The police are certain that they can extract additional information out of both criminals to uncover the location of the stolen items and other suspects involved in the crime. The police take both convicts to different rooms and proceed to give both men two choices: either confess to the crime or deny. The convicts are not allowed to discuss with each other and therefore do not know the actions of one another. Convict 1 Convict 2 Confess (C) Deny (D) Confess (C) (-2,-2) (0,-9) (-9,0) (-10,-10) Deny (D) Figure 1: The matrix showing the payoff for each convict. The numbers denote the number of years both convicts may face in prison 3 Figure 1 shows the list of possible options both convicts are given. The numbers are denoted as the number of years in prison both convicts face depending on their actions. The reason the number of years is given as a negative is because it would be more beneficial for the convicts to minimize the time each of them spends in jail or in other words, receive the highest payoff. The payoff is therefore larger in the (C,C) box as -2 > -10 which makes it the most preferable choice for both convicts. As shown, it is tempting for each criminal to confess to the police. However, there is the possibility that both will confess leading to the maximum sentenced prison time. On the other hand, both can confess and receive less substantial time in prison. The biggest issue here is that the choices the convicts make will affect each other. I discovered through further research that even the simplest real life situations such as the game of rock, paper, scissors, and penalty kicks taken in football all contain similar elements of prisoner’s dilemma. That said, I decided to investigate the mathematics behind iterated prisoner’s dilemma. Iterated prisoner’s dilemma is essentially prisoner’s dilemma repeated over again in a situation. For example, in figure 1 we saw the number of years both convicts faced depending on their actions. Suppose convict 1 chose to confess and convict 2 decided to deny. Convict 2 would serve prison time for nine years while convict 1 walk. If both convicts were caught in a robbery again and faced with the same options, convict 2 may choose to confess instead of denying based on convict 1’s previous actions. Key terms For iterated prisoner’s dilemma, each individual will have a strategy profile and utility gained from using strategies within the strategy profile: s s’ Strategy profile (S) Utility [U(S)] This is a strategy within This is a strategy within This is a set of possible This is the the strategy profile (S) a second strategy profile strategies (s) that each (expected) utility (S’) individual player can that each individual use in the game. can gain from the (Example: S convict 2 = strategy profile (S). (s1, s2, …, sn)) ∀ = this notation means “for all” ∃ = this notation means “there exists” Certain strategies (s) are classified as strongly dominate and weakly dominate: Strongly dominate: A strategy where U(s) > U(s’) for all cases ∀ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) > U (s1’, s2’, s3’…, sn’) (Nau, 2010) Weakly dominate: A strategy is weakly dominate over others if U(s) ≥ U(s’) but there is one case where U(s) > U(s’) (Nau, 2010) ∀ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) ≥ U (s1’, s2’, s3’…, sn’) 4 and ∃ (s1, s2, s3…, sn), U (s1, s2, s3…, sn) > U (s1’, s2’, s3’…, sn’) Strategies Dominant strategy equilibrium This is when all strategies (s) in the set of (S) give U(S) > U(S’). In Figure 1 with the convicts for example, the dominant strategy is (Confess, Confess) as it gives both convicts relatively short jail sentences so it is therefore the most appealing and worthwhile strategy. Nash Equilibrium The dominant strategy equilibrium is also known as Nash Equilibrium. This is the strategy profile that gives the best utility meaning that no individual can achieve better utility by switching strategies unilaterally (Nau, 2010) The Experiment Brother (B) Sister (S) Share (S) Take (T) Share (S) 5, 5 3, 7 Take (T) 7, 3 2, 2 Figure 2: The matrix showing the payoff for each choice The notation for each set of numbers is the number of gummy bears available to my brother and sister depending on the choices they made. For example, should both choose to share, each of them would receive 5 gummy bears each. 5 The payoff for each of them was as follows: - If one player chooses to share and one choose to take, the player sharing will get only 3 gummy bears compared to the taker who will get 7 If both players choose to share, both will get an equal 5 gummy bears If both players choose to take, both will get an equal 2 gummy bears The dominant strategy profile, and therefore the Nash equilibrium, in this case was (S,S). I was also curious to observe which strategies my siblings would pick. Pure Strategy This is a single action strategy that is chosen by the player in every round Mixed Strategy This strategy includes a mixture of pure strategies To test the rationality of each of my siblings, I attempted to predict the probability of different choices they would make based off the probability of the only two available choices. Through this, I wanted to predict the payoff and overall expected utility that each would gain. The game lasted for 5 rounds in total. Probabilities In round 1, my brother chose (Share) and my sister chose the (Take) option. Notations: Sb Sx x p b Strategy Strategy This variable Probability of This variable profile of my profile of my represents my my little sister represents my little brother little sister little sister little brother Sb (Round 1) = (S, T) Payoff b = 3 Payoff x = 7 Consider the probability of my sister making either one of the choices: Sx (Share) = p Sx (Take) = 1 – p Expected Utility(Share) = Payoff Total × Probability Expected U(Share) = ∑(𝑃𝑎𝑦𝑜𝑓𝑓)𝑃(𝑆𝑆′ ) US = (7 + 5) × (p) 6 US = 12p Expected Utility (Take) = Payoff Total × Probability US = (2 + 3) × (1 – p) US = 5 – 5p Since US should be the same for both cases: 12p = 5 – 5p 5 p = 17 So the two choices in my sister’s strategy profile had probabilities of: 5 S(Share) = 17 12 S(Take) = 17 Which therefore meant that my little brother’s strategies in his strategy profile had probabilities of: 5 S(Take) = 17 12 S(Share) = 17 Probability Tree Diagram Figure 3: A probability tree of my sister’s choices 7 There are overall 32 combinations of strategies my sister could have chosen from: 12 12 12 12 12 (TTTTT) = (17) (17) (17) (17) (17) = 248,832 1,419,857 12 5 5 5 5 7,500 12 5 5 5 12 18,000 12 5 5 12 5 18,000 12 5 5 12 12 43,200 12 5 12 5 5 18,000 12 5 12 5 12 43,200 12 5 12 12 12 103,680 12 5 12 12 5 43,200 12 12 5 5 5 18,000 12 12 5 5 12 43,200 12 12 5 12 5 43,200 12 12 5 12 12 103,680 12 12 12 5 5 43,200 17 17 17 17 17 1,419,857 12 12 12 5 12 103,680 12 12 12 12 5 103,680 5 12 12 12 12 × 100 = 17.5% (TSSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53% (TSSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (TSSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (TSSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (TSTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (TSTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (TSTTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30% (TSTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (TTSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (TTSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (TTSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (TTSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30% (TTTSS) = ( ) ( ) ( ) ( ) ( ) = × 100 = 3.04% (TTTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30% (TTTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 7.30% (STTTT) = (17) (17) (17) (17) (17) = 5 5 5 5 5 103,680 1,419,857 × 100 = 7.30% 3,125 (SSSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.22% 8 5 5 5 5 12 7,500 5 5 5 12 5 7,500 5 5 5 12 12 18,000 5 5 12 5 5 7,500 5 5 12 5 12 7,500 5 5 12 12 12 43,200 5 5 12 12 5 18,000 5 12 5 5 5 7,500 5 12 5 5 12 18,000 5 12 5 12 5 18,000 5 12 5 12 12 43,200 5 12 12 5 5 18,000 5 12 12 5 12 43,200 5 12 12 12 5 43,200 (SSSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53% (SSSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53% (SSSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (SSTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53% (SSTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (SSTTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (SSTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (STSSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 0.53% (STSST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (STSTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (STSTT) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (STTSS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 1.27% (STTST) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% (STTTS) = (17) (17) (17) (17) (17) = 1,419,857 × 100 = 3.04% For each strategy profile containing the same number of (Take) and (Share) actions, I added the repeated percentages. Possible outcomes: - At least one (Share) action chosen by sister = (TTTTS), (TTSTT), (STTTT), (TSTTT), (TTTST) = 7.30% + 7.30% + 7.30% + 7.30% + 7.30% = 36.5% - At least two (Share) actions chosen by sister = (TSSTT), (TSTST), (TSTTS), (TTSST), (TTSTS), (TTTSS), (SSTTT), (STSTT), (STTST), (STTTS) 9 = 3.04% + 3.04% + 3.04% + 3.04% + 3.04%+3.04% + 3.04% + 3.04% + 3.04% + 3.04% = 30.4 % - At least three (Share) actions chosen by sister = (TSSST), (TSSTS), (TSTSS), (TTSSS), (SSSTT), (SSTST), (SSTTS), (STSST), (STSTS), (STTSS) = 1.27% + 1.27% + 1.27% + 1.27% + 1.27%+1.27% + 1.27% + 1.27% + 1.27% + 1.27% = 12.7% - At least four (Share) actions chosen by sister = (TSSSS), (SSSST), (SSSTS), (SSTSS), (STSSS) = 0.53% + 0.53% + 0.53% + 0.53% + 0.53% = 2.65% - No (Share) actions chosen by sister = (TTTTT) = 17.5% This method was a tedious and repetitive, which may have led to errors in calculations. So I opted to use binomial distribution, since the choices made during the game were independent of each other. I used binomial distribution to calculate the probability of a certain number of (Share) choices being made. I performed these calculations for my sister’s strategy with the results of the first round on mind. Formula for binomial distribution: P(r) = (𝑛𝑟)(pr)(qn-r) Where r is the number of (Share) choices out of 5 rounds, n is the number of rounds, p is the probability of (Share) occurring, and q is 1 – p Consider the possible outcomes: - At least one (Share) option chosen by sister = (TTTTS) At least two (Share) options chosen by sister = (TTTSS) At least three (Share) options chosen by sister = (TTSSS) At least four (Share) options chosen by sister = (TSSSS) No (Share) options chosen by sister = (TTTTT) 5 1 12 4 At least one: P(1) = (51) (17) (17) 5! 5 1 12 4 = (1!(5−1)!) (17) (17) = 0.365107 = 36.5% 10 5 2 12 3 At least two: P(2) = (52) (17) (17) 5! 5 2 12 3 = (2!(5−2)!) (17) (17) = 0.304256 = 30.4% 5 3 12 2 At least three: P(3) = (53) (17) (17) 5! 5 3 12 2 = (3!(5−3)!) (17) (17) = 0.126773 = 12.7% 5 4 12 1 At least four: P(4) = (54) (17) (17) 5! 5 4 12 1 = (1!(5−4)!) (17) (17) = 0.026411 = 2.6% 5 0 12 5 No (Share options: P(0) = (50) ( ) ( ) 17 5! 17 5 0 12 5 = (0!(5−0)!) (17) (17) = 0.175252 = 17.5% To summarize: Probability of no (Share) options chosen in 5 rounds = 17.5% Probability of one (Share) option chosen in 5 rounds = 36.5% Probability of two (Share) options chosen in 5 rounds = 30.4% Probability of three (Share) options chosen in 5 rounds = 12.7% Probability of four (Share) options chosen in 5 rounds = 2.6% Analysis of the two methods utilized The probability tree is a basic method of visually representing all possible actions that either one of my little brother and sister could have chosen. It is essentially a visual representation of the binomial distribution formula, which simplifies the repetitive nature of the probability tree diagram. For example, the probability of there being one (Share) action occurring would mean that I would need to multiply one (Share) probability values with four (Take) values. This would need to be repeated as there are different paths that could be taken to obtain the same result, such as (TSTTT) and (TTSTT). I would also have to add all 11 these values to obtain the actual probability percentage ((TSTTT) + (TTSTT) + (TTTST) +…). § The binomial distribution simplifies this tedious method. Using the formula for 𝑛! combinations 𝑟!(𝑛−𝑟)! I can easily take into account all of the above repetitions for three (Share) actions. I would simply multiply this by the two probability values of (Share) and (Take), both of which would be to the power of how many times each occurs e.g. three (Share) actions and two (Take) actions would mean multiplying 5 1 12 4 (17) and (17) . The binomial distribution can be derived from the probability tree and thus shows the relation between the two. By picking a certain probability of certain outcomes occurring such as three (Share) and two (Take) actions, I need to know the number of ways this can occur. As shown in the above calculations, I found that there are exactly 10 ways this outcome can happen. For example (SSTST). Let p be the number of successful (Share) actions occurring. This would therefore mean that p – 1 is the probability of the (Take) actions occurring. We can substitute q for p – 1: (p×p×q×p×q) = (p3q2) = (p3)(q2) As stated earlier, the number of ways three (Share) actions occurs is 10. So the probability of a number of 3 (Share) actions occurring P(X = 2) is: P(X = 2) = 10(p3)(q2) It can also be concluded that if we use the combinations formula where we choose r possible number of (Share) actions in a set of n trials, or in other words (𝑛𝑟),we should still get 10: (53) = 10 So to simplify the equation, we get: P(X = 2) = (53) (p3)(q2) which is the same as saying P(r) = (𝑛𝑟)(pr)(qn-r) Utility The Expected Utility is the outcome that is expected after selecting a certain action (Silkin) The path my sister ended up choosing after the final round was (TTSTS). The expected utility was: Ux = ∑(𝑃𝑎𝑦𝑜𝑓𝑓)𝑃(𝑆𝑆′ ) 12 Ux = (Payoff1 + Payoff2 + Payoff3 + Payoff4 + Payoff5)(Probability of at least two (Share) choices occurring) Total Payoff = (7+2) + (7+2) + (5+3) + (7+2) + (5+3) = 43 30.4 Ux = [(7+2) + (7+2) + (5+3) + (7+2) + (5+3)]( 100 ) Ux = 13.072 This value is actually much higher than her actual expected utility as it takes into consideration both possible payoffs if my sister chose (Take) which were either 7 gummy bears or 2 gummy bears. My brother’s strategy profile, (STTST) which is commonly classified as being tit-for-tat (TFT), is not taken into account in this equation. If my brother’s responses were considered, then summation of my sister’s payoff is actually 22, as it deducts the possibility of obtaining a payoff from both a (Share) or (Take) action, compared to what was assumed above as being 43. In other words: Total Payoff = 7+2+2+7+2 = 22 So the correct expected utility for my sister is: 30.4 Ux = (22)( 100 ) = 6.688 Conclusion The experiment I carried out with the help of my brother and sister was to determine the rational thinking of two related individuals given choices that would directly affect how both individuals see each other. This was why I chose to use an item that both my siblings had a common interest in and divided it between them. The prisoner’s dilemma strikes me as a very unique topic of exploration due to the existence of mathematics in such a psychological concept and how two reasonable human beings will be forced to act under pressure or given tough decisions. It was fascinating to see how my own siblings reacted given their limited options. On the whole however, the results of this exploration are limited to only this situation due to the fact that my siblings are very close and may generally find making decisions in situations such as the one I placed them in very easy. My brother was initially willing to share gummy bears while my sister decided to take them for herself instead. Her actions would have evidently lead to my brother feeling betrayed and thus making him retaliate purely for the desire of getting back at her. In a situation where both individuals do not know each other, as was the case on the Golden Balls show, there may be different courses of action taking as there would be less emotional attachment to the other player. In the end however, the tit-for-tat nature of both strategies ended in both receiving the same payoff, which should not to be confused with expected utility. Another observation I made was the situation I put them in. I tempted them both by providing them with their favourite candy and the desire for this item may have made 13 both siblings less reasonable due to greed. The same can be said about any situation. Humans are in fact not as rational as can be thought of. The desire to gain the most utility in most prisoners’ dilemma situations would hamper any possibility of cooperation between two players. So theoretically it may be possible to determine expected utility through the probability of certain actions occurring however in practice, results may differ. Overall, using the Prisoner’s Dilemma theory, I was able to determine my aim which was to see whether my brother and sister would put their own well being in front of each other’s well-beings. My sister evidently decided to do so which lead to a reaction from my brother to make the same choices she made (as aforementioned, a tit-for-tat situation). 14 Bibliography Nau, D. S. (2010). CMSC 421, Intro to AI - Spring 2010. Retrieved October 21, 2015, from http://www.cs.umd.edu/~nau/cmsc421/game-theory.pdf Silkin, N. (n.d.). Game Theory and Expected Utlility Theory. Retrieved October 17, 2015, from Academia: http://www.academia.edu/5541272/Game_Theory_and_Expected_Utility _Theory References - http://www.econlib.org/library/Enc/PrisonersDilemma.html - http://math.ou.edu/~epearse/talks/prisoners-dilemma.pdf - http://ibmathsresources.com/tag/prisoner-dilemma/ - https://www.youtube.com/watch?v=qIzC1-9PwQo 15
© Copyright 2026 Paperzz