The Ghost in the Machine: Inferring Machine-Based Strategies from Observed Behavior Jim Engle-Warnick∗ William J. McCausland † John H. Miller ‡ September 3, 2007 Abstract We model strategies of experimental subjects playing indefinitely repeated games as finite state machines. In order to do likelihood-based inference on machines using data from experiments, we add a stochastic element to the machines. As in most machine models for repeated game strategies, state transitions are deterministic. Actions, however, are random and their distribution is state dependent. We describe Markov Chain Monte Carlo methods for simulating machine strategies from their posterior distribution. We apply our procedures to data from an experiment in which human subjects play an indefinitely repeated Prisoners’ Dilemma. We show examples of estimation, prediction and testing. Keywords: strategy inference, Bayesian methods, experimental economics, Prisoners’ Dilemma, repeated games. JEL classification nos. C11 , C15, C72, C73, C92 Acknowledgments: We thank Alexander Campbell, Julie Héroux, and Anthony Walsh for valuable research assistance. We gratefully acknowledge financial support from the Economic and Social Research Council, UK and Nuffield College, University of Oxford. The Centre for Interuniversity Research and Analysis on Organizations (CIRANO) and the Bell University Laboratory in Electronic Commerce and Experimental Economy graciously provided use of the experimental laboratory. Software used in this paper is open source and freely available: send a request to [email protected]. ∗ McGill University, Department of Economics, 855 Sherbrooke St. W., Montreal, QC, H3A 2T7, Canada, CIREQ and CIRANO. email: [email protected]. † (corresponding author) Université de Montréal, Département de sciences économiques, C.P. 6128, Succursale centre-ville Montreal, QC H3C-3J7, Canada, CIREQ and CIRANO. email: [email protected]. ‡ Carnegie Mellon University, Department of Social and Decision Sciences, Pittsburgh, PA 15213 and the Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, e-mail: [email protected]. 1 Introduction Understanding human strategic behavior in games requires both a theoretical perspective on what factors drive such decisions as well as a careful identification of the empirical foundations of actual behavior. We hypothesize that we can effectively capture observed strategic behavior by mapping it onto simple algorithms (here finite state machines) using the techniques described below. While we remain agnostic as to the underlying forces promoting such mappings—such machines could result from either a carefully derived rational process or an over reliance on simple heuristics—the analysis of such processes is valuable nonetheless. Using this technique, we can ask questions such as how complicated are the machines that drive human strategic behavior, what strategic patterns do such machines embrace, can we make better predictions of actual behavior, and can we identify strategic learning. Machines (or finite automata) are discrete-time finite-state systems that generate discrete outputs in response to discrete inputs. Aumann (1981) suggested that machines would be a useful way to represent repeated game strategies in economics. Inputs are the actions of opponents and outputs are the actions played by the machine strategy. Automata have been used to model bounded rationality in repeated games in Neyman (1985), Rubinstein (1986), Abreu and Rubinstein (1988), Kalai and Stanford (1988) and Ben-Porath (1993). Binmore and Samuelson (1992) used them to explore evolutionary games. Miller (1996) considered learning. Axelrod (1984) ran a tournament for computerized strategies in the repeated Prisoners’ Dilemma, and found that the simple two-state machine strategy Tit-For-Tat proved to be a remarkable and robust performer. This paper introduces likelihood-based inference for machines in a repeated game setting. For this purpose, we add a stochastic element to the machines. State transitions are still deterministic, but actions are random and their distribution is state dependent. We choose this approach over stochastic state transitions for three reasons. First, a machine with random actions and deterministic transitions corresponds more directly to a behavioral strategy, a well known theoretical construct. Second, random actions are more parsimonious than random state transitions, since actions depend only on the state and state transitions also depend on the action profile. Third, deterministic finite state machines already condense the history of play in a supergame to a discrete value identifying the current state; making state transitions stochastic adds a second source of 1 memory imperfection that some may find redundant. We stress that the randomness we introduce is in the random choice tradition, not the random preference tradition. Thus our approach is similar to that of El Gamal and Grether (1995), who analyse data from an individual dynamic discrete choice experiment. They suppose that the data generating process is governed by a decision rule and an error probability, and go on to estimate decision rules. In our approach, we can, but need not, interpret random actions as deviations from deterministic actions, and realizations of low probability actions as errors. We emphasize that these errors are not payoff related. In particular, our approach should not be confused with approaches used in some experimental work based on Quantal Response Equilibria, introduced by McKelvey and Palfrey (1995). There, actions are random because payoffs are random. We allow state transitions to depend not only on opponents’ actions but also the machine’s own action. This is in contrast to state transitions for deterministic machines, which depend explicitly on opponents’ actions but not the own action. For these machines, the own action is a deterministic function of the current state, so explicit dependence would be redundant. Once a machine’s actions are stochastic, however, own actions are no longer deterministic functions of the state: they bear relevant additional information that may be important for future stages of the supergame. We will see that for at least some experimental subjects, there is empirical evidence of own-action dependence. We also note that ruling out own-action dependence would involve only minor changes to the methods described below. Several authors have used experiments to study how frequencies of observed actions respond to changes in the parameters of a repeated game. Dal Bo and Frechette (2007) vary the continuation probability and the mutual cooperation payoff in a indefinitely repeated Prisoners’ Dilemma game and report how rates of mutual cooperation change with experience in situations where repeated cooperation is and is not an equililibrium of the repeated game. Norman and Wallace (2004) consider repeated Prisoners’ Dilemma games with three different termination rules and report rates of cooperation. Duffy and Ochs (2006) look at cooperation frequencies for different subject matching processes. Camera and Casari (2007) go a step further towards uncovering strategies, using a Probit model to estimate conditional action probabilities given features of previous play. They use these to classify subjects’ strategies in several variants of a repeated Prisoners’ 2 Dilemmas. Other papers have explicitly tried to uncover repeated game strategies. In Sel- ten, Mitzkewitz, and Urlich (1997), experienced traders in a duopoly game explicitly choose strategies, in the form of algorithms, rather than sequential actions. Aoyagi and Frechette (2004) estimate parameters of a threshold strategy in a repeated Prisoners’ Dilemma where subjects learn about opponents’ actions only through a noisy signal. Engle-Warnick and Slonin (2006) use a counting method to fit deterministic finite state machines to observed actions in repeated trust games. The main difference between this paper and Engle-Warnick and Slonin (2006) is that we introduce a stochastic element to machines and do formal inference. We assume that each subject is associated with a single machine throughout the sample used for inference. We know this assumption is not innocuous, since learning is plausible. However, we use statistical tests for break points to show that after five rounds (out of twenty), the play of all subjects except two (out of twelve) is plausibly stable. We use rounds 6 through 20 for inference and while we report results for all subjects, we note that for two of them, the assumption of stability is untenable. We stress that our inferential approach does not commit us to the assumption of stability. In the conclusions, we discuss ways to extend our inferential methods to models where subjects may change machines during an experiment. The main contribution of this paper is the introduction of the first formal statistical procedure inferring finite state machines as repeated game strategies. We introduce the procedure with the well known indefinitely repeated Prisoners’ Dilemma game. We illustrate the usefulness of the procedure with examples of estimation, prediction, and model selection. In Section 2, we give definitions and notation for our stochastic machines and other objects. Section 3 lays out assumptions about machine play in repeated game experiments. These assumptions specify a stochastic data generating process, which gives the conditional probability mass function for all actions in an experiment given a vector of machines, one for each subject. Interpreted as a likelihood function, this function factors by subject, which justifies subject-by-subject, as opposed to joint, inference. Section 4 provides a prior distribution for the unknown machines. The prior completes the model, since we now have a joint distribution of unknown machines and observed 3 actions. This allows Bayesian inference for the unknown machines. Section 5 shows how we can analytically integrate out action probabilities from the posterior distribution of machines and explains why this facilitates Bayesian inference. Section 6 introduces a Markov Chain Monte Carlo method for simulating machines from their posterior distribution, and a method for computing Bayes factors, which are important for model comparisons. While this section is technically difficult, it describes a method that delivers something conceptually simple: for each subject, a large sample of machines representing the uncertainty we have about her strategy after observing her play in the laboratory. Section 7 describes a laboratory experiment on the repeated Prisoners’ Dilemma. Twelve subjects play twenty rounds of supergames with a mean duration of five periods. Section 8 presents empirical results, including examples of estimation, testing and prediction. We obtain these results with Monte Carlo methods, using the large samples from the posterior distribution of each subject’s machine, drawn using the methods of Section 6. 2 Games and Machines Here we provide some formal definitions, and examples, that will provide the needed framework for what follows. We consider a world in which agents play a repeated stage game. We call a single instance of a repeated stage game a supergame. An agent’s repeated-game strategy is embodied by a machine, a representation that provides a compact description of a broad swath of potential strategies. Agents repeatedly play a stage game, γ, defined by the triple (N, A, (ui )i∈N ). The set of players is given by N. Each player i ∈ N has a set Ai of potential actions. Let A = ×i∈N Ai be the action profile set and let a = (ai )i∈N ∈ A be the action profile, the vector of actual actions chosen by the players during the stage game. The payoff function for each player i ∈ N is given by ui : A → R. For example, a Prisoner’s Dilemma stage game has N = {1, 2}, A = A1 ×A2 = {d, c}×{D, C} (where we use lower case to indicate the actions of Player 1), and u1 (d, D) = u2(d, D) = uP (punishment), u1 (c, D) = u2 (d, C) = uS (sucker), u1 (d, C) = u2 (c, D) = uT (temptation), and u1 (c, C) = u2 (c, C) = uR (reward). Agents employ machines to implement a given strategy. We define a machine by the 4 quadruple (Q, q ◦ , λ, µ). Each machine has a non-empty and finite set of states Q. One of these states, q ◦ ∈ Q, is the initial state. We define a (deterministic) state transition function λ : A × Q → Q, that maps the current action profile and state of the machine to the next state that the machine will enter. Thus, if the action profile is a ∈ A and the machine is in state q ∈ Q, then the machine will enter state λ(a; q). The machine takes a random action governed by a state dependent action probability mass function, µ : Ai × Q → R. Thus, if the machine is in state q ∈ Q, it plays ai ∈ Ai with probability µ(ai ; q). We also define, for a machine m = (Q, q ◦ , λ, µ), its machine skeleton m̄ ≡ (Q, q ◦ , λ). The following example illustrates the above ideas. We illustrate a machine m that implements an “85% Tit-For-Tat” strategy for player 1 in the Prisoner’s Dilemma: m = (Q, q ◦ , λ, µ), Q = {1, 2}, q◦ = 1 λ((d, D), 1) = λ((c, D), 1) = 2, λ((d, C), 1) = λ((c, C), 1) = 1 λ((d, D), 2) = λ((c, D), 2) = 2, λ((d, C), 2) = λ((c, C), 2) = 1 µ(d; 1) = 0.15, µ(c; 1) = 0.85, µ(d; 2) = 0.85, µ(c; 2) = 0.15. Figure 1 shows a graphical representation of the machine skeleton (Q, q ◦ , λ). An action profile (a1 , a2 ) on the arc from state q to state q ′ means that λ((a1 , a2 ), q) = q ′ , that is, the transition function evaluated at action profile (a1 , a2 ) and current state q, is q ′ . In both states, if the opponent plays C (which happens in action profiles (c, C) and (d, C)), the machine will be in State 1 for the next stage. If the opponent plays D (action profiles (c, D) and (d, D)) the machine will be in State 2. In State 1, the machine plays C with probability 0.85 and D with probability 0.15, while in State 2 the probabilities are reversed. Thus, the machine behaves very much like a traditional Tit-For-Tat except that it always has a chance (15%) of taking the opposite of the “traditional” action. Figure 1: Tit-For-Tat machine skeleton (c, D), (d, D) (c, C), (d, C) 1 2 (c, C), (d, C) 5 (c, D), (d, D) Table 1: Action aitσ by player i, period t and supergame σ 3 σ i s t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 1 1 1 c c d c c d d d 1 2 2 C D C C D C D D 2 1 2 d c c d d c c c c 2 2 1 C D C C C D C C C The Data Generating Process In this section and the next, we describe our model for the repeated game strategies of subjects in an experiment. This section concerns the data generating process, by which random actions are governed by machines. We complete the model in the next section by adding a prior distribution over the unknown machines. For illustrative purposes, we consider the following example (to which we frequently return throughout the paper). Two subjects, corresponding to s = 1 and s = 2, twice play a repeated prisoners’ dilemma with a random number of periods. We observe the data in Table 1. The first supergame, corresponding to σ = 1, has eight periods. Subject 1 plays as player i = 1 and subject 2 plays as player i = 2. The second supergame, corresponding to σ = 2, has nine periods. Here, subject 1 plays as player i = 2 and subject 2 plays as player i = 1. The table gives the actions aitσ played by player i at period t of supergame σ. 3.1 Data Structures We need data structures for representing supergame data collected in the laboratory. There is a set Σ of supergames, matches of random duration between two subjects. We let Tσ be the duration of supergame σ ∈ Σ in periods. In our example, Σ = {1, 2}, T1 = 8 and T2 = 9. We use the notation aitσ ∈ Ai to represent the action of player i in period t of supergame σ. Grouping across players, we define atσ ∈ A as the action profile (aitσ )i∈N . Thus, for i = 1, t = 4 and σ = 2, aitσ = d and atσ = (d, C). Grouping across time, we let aσ be the supergame action profile vector (a1σ , . . . , aTσ σ ). Thus for σ = 1, aσ consists of 6 all the actions in the first two rows of the table. Finally, we let a be the vector (aσ )σ∈Σ of all actions. We have a set S of subjects, and in our example, S = {1, 2} For each s ∈ S, we let Σs ⊆ Σ be the set of supergames in which subject s plays. In our example, Σ1 = Σ2 = Σ, but in experiments with several subjects, the Σs will be strict subsets. For each i ∈ N and σ ∈ Σ, we let siσ ∈ S be the subject playing as player i in supergame σ. For each s ∈ S and σ ∈ Σ, we let isσ be the player that subject s plays in supergame σ. 3.2 The Data Generating Process We consider actions to be endogenous and all other features of our experimental data, including subject pairings and supergame durations, to be exogenous. In this paper, we suppose that each subject, s, uses a single machine, ms , as a repeated game strategy in all the supergames it plays. This rules out learning, which is not innocuous. In our empirical analysis, we omit data during a transient learning phase, and suppose that subjects use the same machine in later supergames. In the concluding section of the paper, we discuss models where subjects can switch from machine to machine. The unknown variables of the model are, for each subject s ∈ S, a machine ms = (Qs , qs◦ , λs , µs ) governing the actions of subject s. We let m be the vector (ms )s∈S of all machines. Our objective is to learn about plausible machine strategies from data such as those in Table 1. We will use a likelihood-based approach to inference, and to do so, we need to be able to compute the probability of realized play for any vector (ms )s∈S of machines. As an example, here we compute the probability that two subjects using 85% TitFor-Tat machines generate the actions in Table 1. We make the following three assumptions. The first (Machine Play) is that if subject s plays player i in supergame σ, the conditional probability that s plays action ai at period t, given play in previous periods of that supergame, is given by µs (ai ; q), where q is the state that the player’s machine is in after previous play in supergame σ. Recall that q depends deterministically on the machine skeleton (Qs , λs , qs◦ ) and previous play in σ. So, for example, the conditional probability that player 1 plays d in period t = 3 of supergame σ = 1, given that action profiles (c, C) and (c, D) are observed in the first 7 two periods of that supergame, is Pr[a131 = d|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 ] = 0.85. This is because subject 1’s machine is in state 2 after the opponent defected in period 2, and subject 1’s machine defects with probability 0.85 in state 2. Assumption 3.1 (Machine Play) For all i ∈ N, t ∈ Tσ and σ ∈ Σ, the probability mass function for aitσ |a1σ , . . . , at−1,σ is given by µsiσ (aitσ ; q(a1σ , . . . , at−1,σ ; qs◦iσ , λsiσ )), where the state tracking function q(a1 , . . . , at−1 ; q ◦ , λ) gives the state that machine (Q, q ◦ , λ, µ) will be in after the action history (a1 , . . . , at−1 ). That is, q ◦ q(a1 , . . . , at−1 ; q ◦ , λ) ≡ λ(a , q(a , . . . , a t−1 1 t−2 ; q t=1 ◦ (1) , λ)) t ≥ 2. Our second assumption (Conditional Action Independence) is that in each period of a supergame, players’ actions are conditionally independent given previous play in the supergame. This rules out correlation devices. Assumption 3.2 (Conditional Action Independence) For all σ ∈ Σ and t ∈ Tσ , the actions aitσ , i ∈ N, are conditionally independent given (a1σ , . . . , at−1,σ ). In our example, the conditional probability of action profile (d, C) in period t = 3 of supergame σ = 1, given that action profiles (c, C) and (c, D) are observed in the first two periods of that supergame, is Pr[(a131 , a231 ) = (d, C)|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 , m2 ] = Pr[a131 = d|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 ] · Pr[a231 = C|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m2 ] = (0.85)2 . Our third and final assumption (Supergame Independence) is that play across supergames is independent. In our example, play in supergame 1 is independent of play in supergame 2. 8 Assumption 3.3 (Supergame Independence) The supergame action profile vectors aσ are independent. The assumptions imply that the conditional probability of the vector a of all actions, given machines (ms )s∈S is given by Pr(a|m) = Tσ Y YY µsiσ (aitσ ; q(a1σ , . . . , at−1,σ ; qs◦iσ , λsiσ )). (2) σ∈Σ t=1 i∈N 3.3 The Likelihood Function We can interpret the right hand side of (2) as a function of m for fixed observed data a (that is, as a likelihood function) and denote it L(m; a) The likelihood is a measure of the support that the data a gives to the vector m of machines. We can factor the likelihood by subject to give: L(m; a) = Y Ls (ms ; a), s∈S where Ls (ms ; a) ≡ Tσ Y Y µs (aisσ tσ ; q(a1σ , . . . , at−1,σ ; qs◦ , λs )). (3) σ∈Σs t=1 It is important to note that the factor Ls associated with subject s depends only on the machine ms used by subject s, and not the other machines. This makes the likelihood L multiplicatively separable in the machines ms . Intuitively, this means that we lose nothing by using each Ls separately for inference on machine ms , rather than using L for joint inference on the vector m. For the purposes of inference about ms , we can consider the actions of all other subjects as fixed. In a Bayesian approach, if the ms are a priori independent, they will also be a posteriori independent; that is, conditionally independent given the data. We can compute the likelihood factor Ls for machine ms in two steps. First, using the machine skeleton (the state set Qs , the initial state qs◦ and the state transition function λs ) and the data a, we can track the state that the machine is in at every period of every supergame, counting the number of times it plays each action in each state. In Table 2, we show the results of this exercise for our example. Each row gives the action count cs (ai , q), the number of times machine ms plays action ai in state q. So for example, the 9 Table 2: Action counts cs (ai , q) by machine state s q ai cs (ai , q) 1 1 c 10 1 1 d 1 1 2 c 1 1 2 d 5 2 1 c 7 2 1 d 5 2 2 c 3 2 2 d 2 first row tells us that machine 1 cooperates ten times while in state 1. This information is derived directly from the raw data in Table 1, which shows subject 1 cooperating twice at the beginning of a supergame and eight times immediately after its opponent cooperates, for a total of ten times. The second step is to use the state counts cs (ai , q) computed in the first step and the action probability mass function µs to compute the likelihood factor for each machine. In our example, we can rewrite the likelihood factor Ls (ms ; a) from equation (3) as Ls (ms ; a) ≡ 2 Y Y [µs (ai ; q)]cs(ai ,q) . q=1 ai ∈{c,d} Using the information in Table 2, we compute the likelihood factor for machine 1 as L1 (m1 , a) = µ1 (c; 1)c1(c,1) µ1 (d; 1)c1(d,1) · µ1 (c; 2)c1 (c,2) µ1 (d; 2)c1(d,2) = (0.85)10 (0.15)1 · (0.15)1 (0.85)5 ≈ 1.96547 × 10−3 . For machine 2, we compute L2 (m2 ; a) ≈ 5.93609×10−8. We see that the actions of player 1 are much more consistent with a 85% Tit-For-Tat machine than those of player 2. In the general case, we can write the likelihood factor for machine s as Y Y [µs (ai , q)]cs(ai ,q;λs ) , Ls (ms ; a) = q∈Qs ai ∈Ais 10 (4) where action counts cs (ai , q; λs ), giving the number of times the machine ms plays action ai in machine state q, are given by cs (ai , q; λ) = Tσ XX δq,q(ai1σ ,...,ai,t−1,σ ;λ) δai ,aitσ , ai ∈ Ais , q ∈ Qs . (5) σ∈Σs t=1 4 Identification and Prior Distributions In this section, we complete the model by providing a prior distribution over m, the vector of all machines. But first we need to address two identification issues arising over the initial state q ◦ and state transition function λ. First, any permutation of states gives an observationally equivalent combination of initial state and transition function. Second, if some states are unreachable from the initial state, the transition function is observationally equivalent to a transition function on a smaller state set, one that excludes the unreachable states. For convenience and without loss of generality, we assume hereafter that A = {1, . . . , |A|} and Q = {1, . . . , |Q|}. We resolve these identification issues by selecting one combination (q ◦ , λ) of initial state and transition function from each equivalence class to represent the class. We will assign positive prior probability to the chosen combination, and zero probability to its observationally equivalent alternatives. Specifically, we choose the (q ◦ , λ) such that q ◦ is State 1, all states are reachable, and λ satisfies the following state ordering condition: if we list the function values λ(a, q) in q-then-a lexicographic order, then the first appearances of the non-initial states (2, . . . , |Q|) occur in ascending order. In other terms, we choose the q ◦ and λ that satisfy the following three conditions. 1. (order of initial state) q ◦ = 1, 2. (no unreachable states) for every a ∈ A and every q ∈ Q\{q ◦ }, there exists a q ′ ∈ Q and a′ ∈ A such that q ′ < q and λ(a′ , q ′) = q, 3. (order of non-initial states) For all (a, q) ∈ A × Q, there exists a (a′ , q ′ ) ∈ A × Q such that (a) either (q ′ < q) or both (q ′ = q) and (a′ < a), and 11 (b) λ(a′ , q ′ ) = max(λ(a, q) − 1, 1). Having effectively set q ◦ equal to one, we suppress notation for initial states for the rest of the paper. We will call a state transition function λ regular if it satisfies conditions 2 and 3, and denote by Λ(A, Q) the set of regular state transition functions on A × Q. 4.1 Prior Distributions We complete the model by providing a prior distribution over all subjects’ machines. Machines are a priori independent and identically distributed. The prior distribution of a machine m = (Q, λ, µ) has the following conditional independence structure: f (m) = f (Q, λ, µ) = f (Q) · f (λ|Q) · f (µ|Q). (6) The three right-hand-side factors are addressed in the following three sections. Distribution Q We leave the prior on the number of machine states unspecified, and therefore up to the researcher to choose. For j ∈ N, we denote by θj the prior probability that the number of states |Q| is j. Thus the probability mass function for Q is f (Q) = θ|Q| . (7) We recommend choosing a prior that falls off quickly. This favors machine parsimony and computational simplicity. For the empirical examples in the paper, we take a Poisson distribution with mean 1 for |Q| − 1, and truncate the distribution of |Q| to the set {1, 2, 3, 4, 5}. Distribution λ|Q Given Q, the state transition function λ has a uniform distribution over the set Λ(A, Q) of regular transition functions. Thus the probability mass function for λ|Q is f (λ|Q) = 1 , n(|A|, |Q|) (8) where n(|A|, |Q|) is the number of regular state transition functions λ : A × Q → Q. For posterior simulation, we need to be able to compute n(|A|, |Q|) and we show how to do this in Appendix A. 12 We use a uniform prior because it is neutral with respect to state transitions: all machine skeletons with the same number of states have the same prior probability. In the conclusions, we discuss extensions where the prior favors some machines over others, either for theoretical plausibility or for structural parsimony. Distribution µ|Q Recall that for each machine state q ∈ Q, µ(·, q) is a probability mass function on the finite action set Ai . It can therefore be represented by a vector in the |Ai |-dimensional unit simplex. We choose a prior such that given Q, the µ(·, q), q ∈ Q, are independent and identically distributed, and each has a Dirichlet distribution. Dirichlet distributions are commonly used for distributions on simplexes, and in particular, for distributions over distributions over finite sets. Dirichlet distributions on the |Ai |-dimensional simplex are indexed by |Ai | free parameters. In our context, there is a parameter for each possible action. Setting all parameters equal to a single free parameter ν imposes a kind of symmetry over actions, specifically the exchangeability of action probabilities. This is a desirable feature if one wishes to be neutral with respect to different actions. In the context of our Prisoners’ Dilemma example, exchangeability means that the probability of cooperating and the probability of defecting have the same prior distribution, or equivalently, that the distribution of the cooperation probability is symmetric around 0.5. For ν = 1, the Dirichlet distribution with all parameters equal to ν is the uniform distribution on the |Ai |-dimensional simplex. Setting ν less than 1 favors action probabilities near 0 or 1. In our Prisoners’ Dilemma example, |Ai | = 2. Figure 2 shows the density of the cooperation probability for ν = 0.6. Its symmetry around 0.5 implies that it is also the density for the defection probability. We see that probabilities near 0 or 1 are favored. The concentration of prior probability near 0 or 1 is desirable if one wants to express the prior information that the machine is close to some deterministic machine, without taking a stand on which should be the high probability action in each machine state. Low values of ν impose a kind of penalty for deviations from the deterministic machine, and the penalty increases as ν approaches zero. This is probably reasonable for cases, like our Prisoners’ Dilemma example, where 13 3 2.5 2 1.5 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 2: Density of Di(0.6, 0.6) distribution 14 0.8 0.9 1 there are no mixed strategy equilibria for the stage game. For other cases, such as repeated matching pennies, one might prefer a uniform distribution over probabilities (ν = 1) or even a distribution with a single mode at 1/2 (ν > 1). We will see in the next section that the choice of the Dirichlet distribution for action probabilities allows us to integrate out µ from the posterior distribution, an enormous computational advantage. Given Q, the action probability mass functions µ(·; q) are i.i.d. Dirichlet with all parameters equal to ν. Thus the density for µ|Q is ( ) Y Y Γ(ν|Ai |) Y f (µ|Q) = f ((µ(ai; q))ai ∈Ai |Q) = [µ(ai ; q)]ν−1 . |Ai | [Γ(ν)] q∈Q q∈Q a ∈A i 5 (9) i Integrating out Action Probabilities In Section 3, we specified a data generating process; that is, a conditional distribution of actions in experiments given subjects’ machines. When we interpreted the expression for the density f (a|m) as a likelihood function, a function of m for fixed observed actions a, we denoted it L(m; a). We also factored L into likelihood factors Ls (ms ; a). Each factor Ls measures the degree of support that the data gives to the machine ms . In Section 4, we specified a prior (or marginal) density f (m) over subjects’ machines. The a priori independence of these machines implies that f (m) factors by machine. Our model is now complete, and gives the joint distribution of machines and actions. The fact that both prior and likelihood factor by machine implies that the posterior density f (m|a) does as well. In principle, we have everything we need to do Bayesian inference for unknown machines given observed actions. It would be possible to proceed directly to developing methods to simulate the posterior distribution, the conditional distribution of machines given states. As we will see, however, a little analytic integration goes a long way. First, it allows us to compute integrated likelihood factors that measure the degree of support that the data give to a machine skeleton. This allows us to perform the statistical tests we report in Section 8. It also makes numerically efficient posterior simulation possible. In this section we show how to analytically integrate out the action probability mass function µs from the posterior distribution of machine ms = (Qs , λs , µs ), thereby obtaining the marginal posterior distribution of machine skeleton m̄s = (Qs , λs ). We will 15 obtain an expression for f (a|m̄), where m̄ ≡ (m̄s )s∈S is the vector of all subjects’ machine skeletons. We can interpret this as an integrated likelihood function L̄(m̄; a). We show that the integrated likelihood function L̄, like the likelihood L, factors by subject. Each integrated likelihood factor L̄s (m̄s ; a) gives the degree of support that the data gives the machine skeleton m̄s . Multiplying the integrated likelihood factor L̄s (m̄s ; a) by the prior factor f (m̄s ), we obtain a function proportional to the marginal posterior density f (m̄s |a), which we can use to simulation this distribution. We also derive analytic expressions for f (µs |m̄s , a), which gives the conditional posterior distribution of action probabilities given machine skeletons. These are members of the well known Dirichlet family of distributions and we can draw variates directly from them using standard methods. Deriving these expressions serves two purposes. First, the closed form for L̄s (m̄s ; a), a measure of the support that the data a give to the machine skeleton m̄s , allows us easily to test the hypothesis that a subject uses a particular machine skeleton, such as the Tit-For-Tat skeleton illustrated in Figure 1. Section 8 gives several examples of such tests. Second, it allows us to simulate the posterior distribution of machine ms with high numerical efficiency. This is because we can simulate the marginal posterior distribution of the machine skeleton m̄s using Markov Chain Monte Carlo methods and then draw action probabilities µs from their conditional posterior distribution given machine skeleton m̄s . The high numerical efficiency is relative to a more obvious Gibbs sampling approach involving the simulation of the two conditional distributions Qs , λs |µs , a and µs |Qs , λs , a. The Gibbs sampling approach is numerically inefficient because of the strong a posteriori statistical dependence of λs and µs . We first show a simple example of how to integrate out action probabilities using well known properties of the Dirichlet distribution. We refer the reader unfamiliar with these properties to Bernardo and Smith (1994, p. 436) for details. Let pC and pD be the probabilities that a machine cooperates or defects, respectively, in a given state. Suppose that (pC , pD ) has the Dirichlet1 distribution Di(α, β) with parameters α > 0 and β > 0. Then the marginal (not conditional on (pC , pD )) probability that the machine cooperates 1 Dirichlet distributions on 2-dimensional simplexes are usually known as Beta distributions. We use the more general name here because we use it throughout the paper. 16 nc times and defects nd times while in that state is given by Γ(α + β) Γ(α + nc )Γ(β + nd ) · . Γ(α)Γ(β) Γ(α + β + nc + nd ) The posterior distribution of (pC , pD ) given observed values of nc and nd is the distribution Di(α + nc , β + nd ). We now return to our artificial data example and particularly the derived action counts in Table 2. If we specify a prior distribution Di(0.6, 0.6) for the action probabilities (µ1 (C; 1), µ1(D; 1)) of machine 1 in state 1, the probability of the sequence of observed actions for machine 1 in state 1 is Γ(0.6 + 0.6) Γ(10 + 0.6) · Γ(1 + 0.6) · ≈ 0.0081025. Γ(0.6)Γ(0.6) Γ(10 + 1 + 0.6 + 0.6) The conditional distribution of the action probabilities (µ1 (C; 1), µ1(D; 1)), given the data and the Tit-For-Tat skeleton, is Di(10.6, 1.6). The conditional density of µ1 (C; 1), the probability of cooperating, is plotted in Figure 3. After observing the machine cooperate ten times and defect once while in state 1, there is much less uncertainty about the probability of cooperating while in state 1, and much more posterior probability on high values of the cooperation probability in state 1. In a similar way, we can compute an integrated likelihood function L̄(m̄1 , m̄2 ; a), defined as the probability that fixed machine skeletons m̄1 and m̄2 generate actions a. The steps are as follows. We first multiply the prior density f (µ1 )f (µ2 ) by the likelihood factor L(m1 , m2 ; a), where mi , i = 1, 2, is the machine with skeleton m̄i and action probability mass function µi. Since L(m1 , m2 ; a) is the probability of a given µ1 , µ2 and fixed m̄i , the product gives the joint distribution of µ1 , µ2 and a. Integrating out µ1 and µ2 gives the desired marginal probability as Q 2 Y 2 Y Γ(2ν) ai ∈{c,d} Γ(ν + cs (ai , q)) P L̄(m̄1 , m̄2 ; a) = · . [Γ(ν)]2 Γ(2ν + ai ∈{c,d} cs (ai , q)) s=1 q=1 Just as the likelihood function L(m1 , m2 ; a) measures how well the data support various combinations (m1 , m2 ) of two machines, the integrated likelihood function measures how well the data support various combinations m̄1 = (Q1 , λ1 ) and m̄2 = (Q2 , λ2 ) of machine skeletons. 17 6 5 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 3: Density of Di(10.6, 1.6) distribution 18 0.8 0.9 1 We note that the integrated likelihood function also factors by machine. We define the integrated likelihood factor for machine s as Q 2 Y Γ(2ν) ai ∈{c,d} Γ(ν + cs (ai , q)) P · . L̄s (m̄s ; a) ≡ 2 [Γ(ν)] Γ(2ν + c (a , q)) s i a ∈{c,d} i q=1 We compute the integrated likelihood factor for player 1 as Γ(1.2) Γ(10.6)Γ(1.6) Γ(1.2) Γ(1.6)Γ(5.6) · [Γ(0.6)]2 Γ(12.2) [Γ(0.6)]2 Γ(7.2) −4 ≈ 1.7566 × 10 , L̄1 (m̄1 ; a) = and the integrated likelihood factor for player 2 as L̄2 (m̄2 ; a) ≈ 9.4698 × 10−7. We see that the actions of player 1 are much more consistent with a Tit-For-Tat machine skeleton than those of player 2. This is similar to the earlier result in Section 3 that the actions of player 1 are much more consistent with the 85% Tit-For-Tat machine than those of player 2. We emphasize that the conclusion based on the comparison of integrated likelihood factors is not dependent on any particular choice of µ1 and µ2 . It averages over various µ1 and µ2 , favoring action probabilities near 0 or 1. In the general case, we can write the integrated likelihood factor for machine s as Q [Γ(ν · |Ai |)]|Q| Y ai ∈Ai Γ(ν + cs (ai , q; λs )) P L̄s (m̄s ; a) = · , [Γ(ν)]|Ai ||Q| q∈Q Γ( ai ∈Ai ν + cs (ai , q; λs ) s where i is the player type played by subject s. Recall that the action counts cs (ai , q; λ) are given by (5). 6 Bayesian Inference for Machines Here we consider the problem of Bayesian inference for machines. We have seen that the prior and likelihood, and therefore the posterior, factor by subject. Thus the subjects’ machines are a posteriori independent, and we can do inference for each machine separately. We therefore focus on the problem of inference for the machine m = (Q, λ, µ) used by a particular subject s. To simplify the subsequent exposition, we will suppress the index s where possible. 19 We implement Bayesian inference by simulating the posterior distribution of unknown objects given observed data. Here, the relevant unknown object is the machine m. Posterior simulation delivers a large sample {m(j) }M j=1 of machines representing the posterior distribution ms |a. Reporting features of the posterior distribution using these samples is conceptually straightforward. For example, we estimate the posterior probability that the machine has three machine states by counting the number of machines m(j) with three machine states and dividing by M. More generally, we estimate posterior population moments by computing the corresponding sample moments. We choose M to control the numerical error of the estimates. The ability to analytically integrate out action probabilities allows us to draw a posterior sample from the distribution ms |a using the following two steps. First, we draw a posterior sample {m̄(j) }M j=1 of machine skeletons from the distribution m̄s |a and then for each skeleton m̄(j) , we draw the action probability mass functions µ(j) from the distribution µs |m̄s = m̄(j) , a. (In practice, we interleave the two steps, but the result is the same.) We draw the sample {m̄(j) }M j=1 of machine skeletons using a Markov Chain with stationary (or invariant) distribution m̄s |a. At each step j, we take as input a machine skeleton m̄(j) and apply a random update procedure to generate a new machine skeleton m̄(j+1) . The update procedure defines a Markov transition distribution m̄(j+1) |m̄(j) . Our update procedure consists of a chain of simpler update procedures, which we will ′ call blocks. There are three blocks, denoted BQ,λ, BQ,λ , and Bλ , and we describe them below. The blocks have the special property that they preserve the distribution m̄s |a. That is, if we apply one of the blocks to an input machine skeleton having the distribution m̄s |a, then the random output machine skeleton has the distribution m̄s |a as well. The blocks can be chained together in any order, possibly with repetition, and the compound update procedure will also preserve the distribution ms |a. Thus the compound update procedure has the posterior distribution m̄s |a as its stationary distribution. Drawing µ from the conditional posterior distribution µs |Qs , λs , a is easy, as we will show that this distribution is Dirichlet. We draw Dirichlet variates using standard methods. In the rest of Section 6 we show how to decompose the posterior distribution of 20 machines into a marginal posterior distribution of machine skeletons and a conditional posterior distribution of action probability mass functions. We then briefly describe the update blocks we use to simulate the posterior distribution of machine skeletons. Finally we show how to compute factors of the marginal likelihood, the evaluation of the marginal mass function f (a) at the observed value of the action vector a. The marginal likelihood factors are important for model comparison using Bayes factors. 6.1 Decomposing the Posterior Distribution We first integrate out the action probability mass function µ from the posterior Q, λ, µ|a to derive an unnormalized probability mass function for the skeleton Q, λ|a. We also derive the density function for the distribution µ|Q, λ, a. These give us a decomposition of the joint posterior distribution Q, λ, µ|a into the marginal posterior distribution Q, λ|a and the conditional posterior distribution µ|Q, λ, a. This decomposition is extremely useful. It allows us to simulate, using MCMC methods, the posterior distribution of the machine skeleton, rather than the posterior distribution of the machine itself. We can, and do, still obtain draws of machines, by drawing action probability mass functions directly from µ|Q, λ, a. We noted in Section 3 that the likelihood function factored by player and subject. Since the prior also factors in this way, the posterior does too. Bayes’ rule then tells us that the posterior distribution of a subject’s machine is proportional to the product of its prior distribution and the likelihood factor attributable to that player and subject: f (Q, λ, µ|a) ∝ f (Q, λ, µ) · L(Q, λ, µ; a). (10) It is convenient to express the likelihood factor L(Q, λ, µ; a) in terms of action counts. The action count c(ai , q; λ) gives the number of times the machine (Q, λ, µ) plays action ai in machine state q. That is, c(ai , q; λ) = Tσ XX δq,q(ai1σ ,...,ai,t−1,σ ;λ) δai ,aitσ , ai ∈ Ai , q ∈ Q, (11) σ∈Σs t=1 where δ denotes the Kronecker delta function. In terms of action counts, the likelihood factor L(Q, λ, µ; a) becomes L(Q, λ, µ; a) = Y Y q∈Q ai ∈Ai 21 [µ(ai , q)]c(ai ,q;λ) . (12) Combining equations (6), (9), (10) and (12), we obtain the posterior density f (Q, λ, µ|a) ∝ f (Q) · f (λ|Q) · [Γ(ν)]|Ai ||Q| Y Y · [µ(ai , q)]ν+c(ai ,q;λ)−1 . [Γ(ν|Ai |)]|Q| q∈Q a ∈A i (13) i We know that f (µ|Q, λ, a) is proportional (in µ) to f (Q, λ, µ|a) by Bayes’ rule, and therefore it is proportional in turn to the expression on the right hand side of equation (13). Taking this expression as a function of µ, we recognize it as proportional to a product (over Q) of Dirichlet densities with parameters (ν + c(ai , q; λ))ai ∈Ai . Since it must integrate to one, f (µ|Q, λ, a) is equal to this product of Dirichlet densities: " P # Y Γ( a ∈A ν + c(ai , q; λ)) Y ν+c(a ,q;λ)−1 i Q i i f (µ|, Q, λ, a) = . [µ(ai ; q)] Γ(ν + c(a , q; λ)) i ai ∈Ai q∈Q a ∈A i (14) i We see that the µ(·; q), q ∈ Q, are conditionally independent and Dirichlet distributed, given Q, λ and a. Since f (Q, λ, µ|a) = f (Q, λ|a) · f (µ|Q, λ, a), we can divide (13) by (14) to obtain the following unnormalized probability mass function for the marginal posterior distribution Q, λ|a: 6.2 P [Γ(ν)]|Ai ||Q| Y Γ( ai ∈Ai ν + c(ai , q; λ)) Q f (Q, λ|a) ∝ f (Q) · f (λ|Q) · . [Γ(ν|Ai |)]|Q| q∈Q ai ∈Ai Γ(ν + c(ai , q; λ)) (15) Machine Skeleton Update Blocks Here, we briefly describe the three update blocks that we chain together to construct our compound update block. Each block takes as input a machine skeleton (Q, λ) and randomly generates a machine skeleton (Q′ , λ′ ). All three blocks preserve the marginal posterior distribution of the skeleton and all are examples of Metropolis-Hastings updates. For more information on Metropolis-Hastings updates, we direct the reader to Geweke (2005) or Koop (2003). For details on these parameter blocks, we refer the reader to Appendix B. Block BQ,λ draws a proposal Q∗ from a distribution tailored to the posterior distribution Q|a, then a proposal λ∗ from the prior distribution λ|Q. The proposal is randomly accepted, in which case Q′ = Q∗ and λ′ = λ∗ ; or rejected, in which case Q′ = Q and λ′ = λ. 22 ′ Block BQ,λ draws the proposal Q∗ from the same posterior-tailored distribution, but draws the proposal λ∗ from a posterior distribution more closely resembling the posterior distribution Q, λ|a. The acceptance probability is higher, but so is the computational cost. Block Bλ does nothing to Q. That is, Q′ = Q. It generates a random proposal λ∗ by applying a random mutation to λ. If the proposal is accepted, λ′ = λ∗ . Otherwise, λ′ = λ. 6.3 Computing the Marginal Likelihood We use the method of Gelfand and Dey (1994) to compute the marginal likelihood, the evaluation of the marginal probability mass function f (a) at the observed vector of all actions. The marginal likelihood is a standard Bayesian model evaluation criterion, and has two important interpretations. First, it is the out-of-sample prediction record of the model, for data a. The greater the value, the better the out-of-sample prediction record. Second, it gives the weight of the model’s posterior probability in a compound model consisting of a mixture of models with equal prior probability. To compute f (a), we first factor it by player and subject. The marginal likelihood factor attributable to player i, subject s is ψis ≡ Tσ YY f (aitσ |a1σ , . . . , at−1,σ ). σ∈Σs t=1 We compute the ψis using the Gelfand and Dey (1994) procedure. The procedure requires the specification of a normalized distribution approximating the posterior distribution of unknown parameters. We describe our implementation in Appendix C. 7 Data We use laboratory data on the repeated Prisoners’ Dilemma described in Engle-Warnick (2007). In the stage game, the punishment (mutual defection) payoff is 60, the temptation payoff (play d when opponent plays C) is 180, the sucker payoff (play c when opponent plays D) is 0, and the reward payoff (play c when opponent plays C) is 90. Payoff units are Canadian cents. A supergame is an indefinitely repeated Prisoner’s Dilemma stage 23 game, with a continuation probability of 0.8. There are twenty rounds, in each of which all subjects are randomly and anonymously paired to play a supergame. In each round, all pairs play a supergame of the same random duration. Across rounds, durations are statistically independent. Details on experimental design and procedures, as well as some data analysis can be found in Engle-Warnick (2007), which refers to the stage game as “Hawk-Dove.” We use the second session of data, the first of three sessions with twelve subjects. 8 Results We have applied our methods to the data described in Section 7, generating a posterior sample of 100,000 machines for each of the twelve experimental subjects. We will first report some high posterior probability machine skeletons. Although we have a diffuse prior over a vast number of possible machine skeletons and a limited amount of data for each subject, we manage to discern some high probability machine skeletons in the posterior distribution. For most subjects, however, the posterior distribution scatters most of the posterior probability over a large number of individually improbable machine skeletons. We will see later in the section, however, that the summary statistics provide useful insights into these posterior distributions. There are many possible summary statistics to report, and different researchers will have different ideas about what features of the posterior distribution are important. We hope the many examples we offer in this section are interesting in and of themselves. In their variety, they also suggest the rich assortment of possibilities. We classify our results as examples of estimation, prediction and testing, corresponding to three major themes in statistics and econometrics. We caution the reader against attaching much significance to the results for subjects 7 and 10, since there is strong evidence that these subjects do not play the same strategy throughout the sample used for inference. 24 8.1 High Posterior Probability Machine Skeletons We now report, for each subject, any machine skeletons with high posterior probability. No machine skeleton has posterior probability greater than 0.01 for subjects 2, 7, 9 and 10. Subjects 3 and 5 have 8 and 2 skeletons, respectively, with posterior probability between 0.01 and 0.02, and none more probable. Subjects 8 and 12 play the one-state (memoryless) machine with probabilities 0.592 and 0.704, respectively, but their action probabilities are very different. For subject 8, the conditional posterior probability of cooperating, given the one-state machine, is 0.592, while for subject 12, the conditional posterior probability of defecting is 0.991. For subjects 1, 4, 6 and 11, there are several machine skeletons with posterior probability greater than 0.02 and these are shown in Table 3. Each column gives the value of λ evaluated at q, the current state; a1 , the machine’s action; and a2 , the opponent’s action. In Figure 4, we illustrate graphically the machine skeleton in the first row of the table. We draw attention to subject 1’s third skeleton in the table and note that this is also a high posterior probability skeleton for subjects 4 and 6. It is similar to the Tit-For-Tat skeleton of Figure 1 except that the states are reversed: the initial state is that associated with an opponent’s previous defection, rather than the state associated with opponent’s cooperation. Since both skeletons are equally deserving of the name Tit-For-Tat, we will call them C-Tit-For-Tat and D-Tit-For-Tat, according to the initial state. We include the C-Tit-For-Tat and D-Tit-For-Tat skeletons in Table 3 to allow comparison. The four equally probable (within numerical standard error) subject 1 machine skeletons of Table 3 agree on six of eight transition function values. Together, they cover the four possible combinations of the other two transition function values. It is difficult to distinguish among these four machines given the play we observed during the experiment, as we did not have any observations that would allow one to discriminate among these possible transitions. In general, Table 3 shows us that the various skeletons of subjects 1, 4, 6 and 11 are similar to each other (and D-Tit-For-Tat), both within- and between-subject. Although the posterior probability of any particular skeleton is moderate, the similarity among the skeletons implies a much more likely, fairly narrow range of prototypical behavior. 25 Table 3: High posterior probability machine skeletons, with C-Tit-For-Tat and D-TitFor-Tat skeletons Subject Pr 1, c, C 1, d, C 1, c, D 1, d, D 2, c, C 2, d, C 2, c, D 2, d, D 0.108 2 2 1 1 2 1 1 1 0.108 2 2 1 1 2 1 1 2 0.108 2 2 1 1 2 2 1 1 0.108 2 2 1 1 2 2 1 2 0.093 2 2 2 1 2 2 1 1 0.093 1 2 2 1 2 2 1 1 0.074 1 1 1 1 0.045 2 2 1 1 2 2 1 2 0.025 2 2 1 1 2 2 1 1 0.050 2 2 1 1 2 2 1 1 0.050 2 2 2 1 2 1 1 1 0.049 2 2 1 1 2 2 1 2 0.049 2 2 1 1 2 1 1 2 0.049 2 2 1 1 2 1 1 1 0.049 1 2 2 1 2 2 1 1 0.048 2 2 2 1 2 2 1 1 0.060 2 2 2 1 2 2 1 1 0.060 2 2 2 1 2 1 1 1 TFT-C 1 1 2 2 1 1 2 2 TFT-D 2 2 1 1 2 2 1 1 1 4 6 11 26 Figure 4: A high probability machine skeleton for subject 1 (c, C), (d, C) (c, D), (d, D) 1 2 (c, C) (d, C), (c, D), (d, D) 8.2 Estimation We first report the posterior distribution of a simple measure of the complexity of a machine strategy, the number of states. More sophisticated measures of machine complexity are certainly possible, perhaps taking into account how many distinct next states are possible in various current states. We can easily estimate the posterior probability that a subject’s machine has |Q| states. We simply count the numbers of |Q|-state machines in the posterior sample, and divide by the total number of posterior draws. We can make the numerical standard errors of these estimates as low as we like by setting the number of posterior draws. Table 4 gives the discrete posterior probability distribution of the number of states for all twelve subjects. It also gives the prior distribution to allow comparisons. All numerical standard errors are less than 0.01. We note several features of this distribution. For nine out of twelve subjects, the posterior probability of the one state machine is exceedingly low. This is clear evidence that these subjects react to previous play. We see some variation in complexity. Smallest 70%-high posterior probability sets for the 12 subjects are (in order) {2, 3}, {3, 4}, {2, 3}, {2, 3}, {2, 3}, {2}, {2, 3, 4}, {1, 2}, {3, 4}, {2, 3}, and {1}. We see that the posterior distributions assign quite high probability to fairly simple strategies. Only for subjects 2, 5, 7 and 10 is the posterior probability of a four or five state machine greater than 0.25. Revealingly, three of these four machines are also the machines for which there is strong (7 and 10) or moderate (2) evidence of a change in strategy over time. We can approximate other moments of the posterior distribution in the same way: an estimate of the population moment is the corresponding sample moment. Our second estimation exercise looks at a machine’s initial probability p0 of cooperating, and the minimal and maximal probabilities pmin and pmax of cooperating. The maxima and minima are taken over all machine states. Table 5 reports posterior moments relating to pmin . The first two columns show the mean and standard deviation 27 Table 4: Posterior distributions of complexity, by subject Subject |Q| = 1 |Q| = 2 |Q| = 3 |Q| = 4 |Q| = 5 (prior) 0.369 0.369 0.185 0.062 0.015 1 0.000 0.451 0.426 0.106 0.017 2 0.000 0.181 0.369 0.342 0.109 3 0.000 0.419 0.416 0.134 0.030 4 0.074 0.551 0.277 0.081 0.017 5 0.001 0.360 0.359 0.202 0.078 6 0.000 0.782 0.180 0.033 0.005 7 0.000 0.250 0.392 0.277 0.081 8 0.592 0.261 0.105 0.035 0.008 9 0.000 0.674 0.252 0.062 0.012 10 0.000 0.054 0.652 0.242 0.052 11 0.000 0.612 0.292 0.080 0.015 12 0.704 0.228 0.055 0.011 0.002 of this quantity. The third column shows the posterior probability that the machine’s probability of cooperating in the initial state is minimal. The fourth and fifth columns show the mean and standard deviation of p0 − pmin , the maximal decrease in the probability of cooperation that can be induced by the actions of an opponent. Table 6 reports analogous posterior moments relating to pmax . For most subjects, there is a large difference between E[pmax ] and E[pmin ], indicating a large degree of reactivity to opponents’ actions. The two subjects with low values for [pmax − pmin ], 8 and 12, are the same two subjects whose machines have high posterior probability of having only one state. Thus two results, one pertaining to the machine skeleton and the other pertaining to action probabilities, both suggest that these two subjects react weakly to the opponents’ actions. Even among the subjects who react strongly to previous play, there is considerable heterogeneity. We see this clearly in the statistics involving the initial state p0 . Some subjects are wary, starting a supergame with a low probability of cooperation but apparently prepared to greatly increase this probability in reaction to suitable previous play. Others are trusting, starting a supergame with a high probability of cooperation. 28 Table 5: Posterior moments involving minimum cooperation probabilities, by subject Subject E[pmin ] std[pmin ] Pr[pmin = p0 ] E[p0 − pmin ] std[p0 − pmin] 1 0.049 0.036 0.969 0.001 0.010 2 0.177 0.132 0.021 0.645 0.185 3 0.067 0.049 0.953 0.003 0.016 4 0.073 0.044 0.870 0.011 0.039 5 0.077 0.067 0.457 0.153 0.176 6 0.011 0.013 0.984 0.000 0.005 7 0.052 0.050 0.146 0.369 0.209 8 0.531 0.122 0.783 0.048 0.118 9 0.178 0.087 0.000 0.807 0.089 10 0.049 0.043 0.015 0.486 0.153 11 0.104 0.050 0.856 0.025 0.073 12 0.009 0.011 0.946 0.001 0.009 Subjects 1 and 9 are clear examples. Both react strongly to previous play, but while the probability that subject 1 begins in the state with minimal cooperation probability is 0.969, the probability that subject 9 begins in the state with maximal cooperation probability is 0.980. Figure 5 gives an idea of the whole distribution of p0 , pmin, and pmax for these two subjects. The figure shows posterior scatterplots of both pmin versus pmax (left) and p0 − pmin versus pmax − p0 . The left panels show how both subjects are similar: the cooperation probabilities vary greatly over states. The right panels show how they are different: subject 1 is quite trusting in the first state, while subject 9 is quite wary. 8.3 Prediction Using our posterior samples of machines, we predict mean supergame payoffs for all possible pairing of subjects 1, 8, 9, and 12. We define the mean supergame payoff vector for a match between two machines as the expected value of the vector of total accumulated payoffs at the end of the supergame. The expectation is over the random supergame duration and the random actions of both machines. Since there is posterior uncertainty about machines, there is posterior uncertainty about the mean supergame payoff vector. Figure 6 gives a matrix of histograms for the posterior distribution of 29 Figure 5: Posterior scatterplot of pmin versus pmax (left) and p0 − pmin versus pmax − p0 (right) for subjects 1 (top) and 9 (bottom) 30 Table 6: Posterior moments involving maximum cooperation probabilities, by subject Subject E[pmax ] std[pmax ] Pr[pmax = p0 ] E[pmax − p0 ] std[pmax − p0 ] 1 0.978 0.029 0.000 0.927 0.047 2 0.948 0.048 0.454 0.126 0.147 3 0.835 0.109 0.000 0.765 0.123 4 0.532 0.215 0.092 0.447 0.231 5 0.723 0.155 0.107 0.493 0.231 6 0.875 0.092 0.000 0.864 0.094 7 0.837 0.159 0.236 0.416 0.289 8 0.639 0.106 0.723 0.060 0.122 9 0.985 0.018 0.980 0.001 0.007 10 0.961 0.057 0.056 0.427 0.168 11 0.967 0.041 0.000 0.839 0.080 12 0.124 0.264 0.745 0.114 0.264 supergame payoffs. In rows 1 through 4, the machine playing as player 1 is 1, 8, 9 and 12, respectively. In columns 1 through 4, the machine playing as player 2 is 1, 8, 9 and 12, respectively. Each histogram gives the mean supergame payoff to player 1. Vertical lines mark the values 0, 300, 450 and 900. If a player who always defects plays a player who always cooperates, the former obtains the maximal payoff of 900 and the latter obtains the minimal payoff of 0. Two players who always defect receive an average payoff of 300; two players who always cooperate, 450. We have seen that subjects 1 and 9 react strongly to previous play and that they differ greatly in terms of their initial cooperation probability. Subjects 8 and 12 are the subjects for whom the posterior probability of the one-state machine skeleton is greater than 0.5. Recall, though, that their cooperation probabilities are very different. We point out a few features of this prediction exercise. Some predictions are more uncertain than others. We have relatively good predictions of how subjects 9 and 12 would fare against replicas of themselves, on average. We have much less certainty about subject 12’s average payoff against subjects 8 and 9. Among subjects 1 (reactive but wary), 9 (reactive but trusting), and 12 (frequent defector), mean payoffs are what we would expect. We have seen that subject 8 is 31 4 2 4 x 10 2 1 0 2 2 4 0 x 10 500 2 0 2 4 0 x 10 500 0 2 4 0 x 10 0 500 2 500 0 2 2 4 0 x 10 0 500 4 4 0 x 10 500 0 500 2 0 2 4 0 x 10 500 4 0 x 10 500 4 0 x 10 500 0 500 1 4 0 x 10 500 0 2 2 0 4 x 10 x 10 1 1 1 0 4 x 10 4 x 10 1 1 1 0 2 1 1 0 4 x 10 1 0 4 x 10 500 0 4 1 0 1 0 500 0 1 0 0 500 2 0 500 0 Figure 6: Histograms of row player mean supergame payoffs. Row and column players are 1, 8, 9 and 12. 32 unique among subjects in showing little reactivity to previous play, but (unlike subject 12) defecting and cooperating frequently. This may seem like a strange strategy, but we predict subject 8 to do quite well against subjects 1 and 9 on average, better than these subjects do against their replicas. Only against subject 12, who defects with high probability, does subject 8 do badly. It is possible that subject 8 is more concerned with inducing future cooperation in reactive opponents than reacting to previous play. 8.4 Testing We can use marginal likelihoods to compare a model with alternatives. Suppose we have two models, M1 and M2 , with marginal likelihoods Pr[a|M1 ] and Pr[a|M2 ]. We can construct a compound model to express uncertainty about the identity of the model, and assign a priori probabilities Pr[M1 ] and Pr[M2 ] to represent this uncertainty. Whether or not the compound model includes other models, the posterior odds ratio comparing the two models is given by the following expression, derived from Bayes’ rule: Pr[M1 |a] Pr[M1 ] Pr[a|M1 ] = · . Pr[M2 |a] Pr[M2 ] Pr[a|M2 ] The second right-hand-side factor, the ratio of marginal likelihoods, is known as the Bayes factor. Since each marginal likelihood gives the probability of the data without reference to unknown parameters of the respective model, we can interpret the Bayes factor as giving the relative out-of-sample prediction records of the two models: a value greater than 1 implies that M1 has the better out-of-sample prediction record. We have already seen that the marginal likelihood factors by subject, so we consider each subject separately. We will refer to the model described in Section 3 as “base.” We consider five different alternatives to base. The first is a model we will call “Breakpoint,” in which the subject changes machines between rounds r and r + 1, where r is uniformly distributed between 6 and 19, inclusive. We assume that actions before and actions after the breakpoint are independent. We also assume that the pre-break and post-break machines are a priori independent with the same prior distributions as in the basic model. We will interpret a comparison between the basic model and this alternative as a test of machine stability across rounds. 33 The four remaining alternatives are models in which the machine skeleton is fixed, not stochastic. Action probabilities remain random, and their prior distribution is unchanged. We call these four models “Grim”, “C-Tit-For-Tat”, “D-Tit-For-Tat” and “Memoryless”. Grim is a machine skeleton with two machine states, the initial state and an absorbing state the machine enters the first time opponent defects. Thus, grim is the machine skeleton that corresponds to the familiar “grim trigger” strategy. We stress that grim is a machine skeleton: action probabilities are unrestricted. In particular, there is no a priori information favoring cooperation in the initial state and defection in the absorbing state. We have already seen C-Tit-For-Tat and D-Tit-For-Tat, which both remember opponent’s previous action. They differ only in terms of the initial state. Memoryless is the unique one-state machine skeleton, and it has the property that actions do not depend on previous play in a supergame. Table 7 shows the log Bayes factors in favor of each alternate model, for each subject. Thus, the Bayes factor in favor of Breakpoint over the base model for subject 1 is B = exp(−0.643). Suppose we have a compound model in which Breakpoint and the base model both have a priori probability 0.5. Then the prior odds ratio equals 1 and the posterior odds ratio equals the Bayes factor. The Breakpoint model and the base model have a posteriori probability B/(B + 1) ≈ 0.345 and 1/(B + 1) ≈ 0.655, respectively. We see that for subjects 7 and 10, there is strong evidence against machine stability: if Breakpoint and Base are equally probable a priori, Breakpoint is more than 1000 times probable a posteriori. For subject 2, there is moderate evidence against machine stability, but with an odds ratio of about 16.22 in favor of a breakpoint, the base model is still plausible. For the remaining nine subjects, machine stability is quite reasonable. We now turn to the four machine skeleton alternatives. For most combinations of subject and machine skeleton, the Bayes factor favors the base model, often decisively. Furthermore, given that there is only one base model and several possible machine skeletons to consider, it makes sense to assign prior odds in favor of any particular machine skeleton that are considerably less than one. Even so, there is an intriguing result here. The D-Tit-For-Tat machine skeleton has considerable empirical support. For four subjects (1,3,4,6), the Bayes factor in favor of it over the base model exceeds 10. For another four subjects (5, 8, 11, and 12) D-Tit-ForTat is plausible. Of the remaining four subjects, C-Tit-For-Tat is plausible for two (2 34 Table 7: Log Bayes factors in favor of five alternatives, by subject Subject Breakpoint Grim C-Tit-For-Tat D-Tit-For-Tat Memoryless 1 -0.643 -29.948 -20.546 4.246 -30.044 2 2.786 -2.986 0.661 -3.213 -9.825 3 -0.347 -13.439 -9.479 2.443 -12.022 4 0.367 -2.780 -1.214 2.769 -1.622 5 -0.582 -7.172 -3.476 -0.215 -6.431 6 -1.885 -19.668 -10.613 3.455 -19.179 7 8.306 -11.672 -8.061 -9.199 -10.818 8 0.655 -0.546 -0.199 -0.734 0.472 9 0.620 -18.496 3.914 -11.238 -22.074 10 7.877 0.475 -3.768 -8.361 -21.056 11 -0.688 -21.494 -11.558 1.527 -21.498 12 -1.608 -1.391 -1.443 -0.890 0.651 and 9) and two (7 and 10) are the two subjects for which machine stability is implausible in any case. 9 Conclusions We introduced a probabilistic machine suitable for likelihood-based inference for repeatedgame strategies. Making actions random and keeping state transitions deterministic brings several advantages: it is relatively parsimonious, there is a direct interpretation of machine strategies as behavioral strategies, and there is only one way that memory of the history of play deteriorates. We provided numerically efficient methods for inferring machine strategies given all observed play associated with a machine. The key technical insight that makes inference tractable is the recognition that action probabilities can be integrated out analytically from the posterior distribution. This allowed us to separate the problem of inference for machines into two much simpler problems: inference for machine skeletons and conditional inference for action probabilities given machine skeletons. We grant that the more obvious solution of Gibbs sampling, with blocks for state transition functions and action 35 probabilities, also separates the problem into two simpler problems. However, the very strong posterior dependence between the transition function and action probabilities implies very low numerical efficiency for the Gibbs approach. An additional advantage of integrating out action probabilities is that we can compare our model with alternatives that describe fixed machine skeletons, without reference to action probabilities. Our empirical application illustrates some ways to report features of the posterior distribution of machines. While we hope these are interesting in and of themselves, we recognize that different researchers will be interested in different features of the posterior distribution. We hope, therefore, that these examples suggest the wide variety of possibilities. We learn that the strategy of most, but not all, subjects is plausibly stable after five rounds. Most subjects react strongly to previous play, but differ considerably in terms of their initial probability of cooperation. The play of most subjects is at least reasonably well captured by either of two simple machines: a stochastic version of the traditional Tit-For-Tat machine and a similar machine whose initial state is the “defect” state. In future work, we hope to relax two assumptions that we made here to simplify an already detailed paper. The first is that subjects play according to a single machine throughout the sample used for inference. The second is that subjects’ machines are a priori independent. We point out that some sort of within- or between-subject statistical dependence is necessary in order to do meaningful inference: we cannot learn very much about a machine using data from a single supergame. But the two assumptions above represent only one possible way of introducing such dependence, and it is an extreme case: they imply the strongest possible within-subject dependence and the weakest possible between-subject dependence. Fortunately, we will be able to consider much more flexible models of within- and between-subject variation, and still be able to use our methods for machine inference. We can do this using the following framework: there is a population of K machines, and a model stochastically assigns one of these machines to each subject in each supergame. Values of K less than the number of subjects ensure both within- and between-subject dependence while still allowing for subject heterogeneity as long as K > 1. We can easily introduce additional within-subject dependence by introducing dependence between machine assignments involving the same subject, but in different supergames. 36 Other ideas for future work involve non-uniform priors over state transition functions. One possibility is to introduce a penalty for machine complexity by putting more prior probability on less complex machines. A second is to introduce a theoretical element by putting higher prior probability on machines that have higher expected payoffs playing its machine opponents. The main goal of this work is to sensibly link the behavior we observe when thoughtful agents play games to the underlying “strategic programs” that might direct such behavior. Here we developed a statistically appropriate—and computationally feasible— technique for uncovering the strategic “machines” that can account for the observed behavior. This method provides a new window from which to understand better the behavior of subjects in strategic situations. In a simple repeated Prisoner’s Dilemma experiment, we found that humans tended to employ fairly stable, heterogeneous strategies (with some notable overlaps), encapsulated by relatively simple machines. This ability to illuminate the strategic programs used by subjects should provide new insights into actual behavior, as well as suggest new theoretical directions. Moreover, the mere existence of strategic ghosts in these machines raises interesting questions about how presumably complex strategic reasoning process can become embedded in such simple programs. A Counting machines We describe here how to compute the number n(|A|, |Q|) of maps λ : A × Q → Q that are regular. We first derive a recursive expression for the number n̄(|A|, |Q|) of maps satisfying condition 2 (no unreachable states) but not necessarily condition 3 (order of non-initial states). The total number of maps is |Q||A||Q|. The number of maps with exactly m unreachable states is ! |Q| − 1 m · n̄(|A|, |Q| − m) · |Q|m|A| . The first factor gives the number of choices of m unreachable states out of |Q| − 1 noninitial states. The second factor gives the number of maps on the |Q| − m reachable states where all states are indeed reachable. The third factor gives the number of maps A × Q∗ → Q, where Q∗ is a set of m unreachable states. For |Q| = 1, the total number of maps is 1, and for this map, all states are reachable. Therefore n̄(|A|, 1) = 1. For 37 |Q| > 1, we can calculate recursively |Q|−1 n̄(|A|, |Q|) = |Q||A||Q| − X |Q| − 1 m m=1 ! · n̄(A, |Q| − m) · |Q|m|A| . We now obtain n(|A|, |Q|) by dividing by the number of permutations of the non-initial states: n(|A|, |Q|) = B n̄(|A|, |Q|) . |Q − 1|! Detailed Descriptions of Blocks We describe here in detail the three stochastic update blocks that we use to update the skeleton (Q, λ). All blocks preserve the marginal posterior distribution of the skeleton. Each block takes as input a skeleton (Q, λ) and randomly generates a skeleton (Q′ , λ′). Block BQ,λ Block BQ,λ is an independence Metropolis-Hastings update. The first step is to draw (Q∗ , λ∗ ) from a proposal distribution. We draw Q∗ from the distribution with probability mass function fP (Q∗ ) = ϑ|Q∗ | , which we can tailor to approximate the posterior distribution Q|a. In practice, we use state number counts during a burn-in period to determine the probabilities ϑ|Q| . After we draw Q∗ , we draw λ∗ from the (uniform) prior distribution λ|Q = Q∗ . The next step is to randomly accept or reject the proposal. Acceptance means we set ′ Q = Q∗ and λ′ = λ∗ . Rejection means we set Q′ = Q and λ′ = λ. Thus, if (Q∗ , λ∗ ) is rejected, the skeleton remains unchanged. In order for the block to preserve the distribution Q, λ|a, we accept the proposal with probability f (Q∗ , λ∗ |a) fP (Q)f (λ|Q) min 1, . · f (Q, λ|a) fP (Q∗ )f (λ∗ |Q∗ ) The normalization constants for f (λ|Q), which we do not know, cancel, and we can evaluate the Hastings ratio, the second argument of the min function, as follows: P Q Γ( a ∈A ν+c(ai ,q;λ∗ )) i i ∗ Q ∗ q∈Q∗ θ|Q| ϑ|Q∗ | f (Q∗ , λ∗ |a) fP (Q)f (λ|Q) [Γ(ν)]|Ai |(|Q |−|Q|) ai ∈Ai Γ(ν+c(ai ,q;λ )) P · . · = · ∗ |−|Q| Q Γ( ν+c(a ,q;λ)) ∗ ∗ ∗ |Q i f (Q, λ|a) fP (Q )f (λ |Q ) [Γ(ν|Ai |)] θ|Q∗ | ϑ|Q| Q ai ∈Ai q∈Q 38 ai ∈Ai Γ(ν+c(ai ,q;λ)) ′ Block Bλ,Q ′ Block Bλ,Q is also an independence Metropolis-Hastings update, but features a proposal distribution more closely resembling the posterior distribution Q, λ|a. This increases the acceptance probability, but it takes longer to draw a proposal. Experience suggests that ′ it is sometimes optimal to use block BQ,λ, sometimes BQ,λ and sometimes both BQ,λ and ′ BQ,λ . Again, the first step is to draw a proposal (Q∗ , λ∗ ). As before, we draw Q∗ from the distribution with probability mass function fP (Q∗ ) = ϑ|Q∗ | . We construct λ∗ stochastically, one value λ∗ (a, q) at a time in q-then-a lexicographic order. Each λ∗ (a, q) has a distribution over Q favoring compatibility of the emerging λ∗ with the data. We measure the data compatibility of a incompletely specified transition function, with values given up to (a, q), by constructing a completion λ̄a,q on an infinite state set and then evaluating P Y [Γ(ν)]|Ai | Γ( a ∈A ν + c(ai , q; λ̄)) h(Q, λ̄; a) ≡ , ·Q i i [Γ(ν|A |)] Γ(ν + c(a , q; λ̄)) i i a ∈A i i q∈N which we take as our measure of data compatibility. Recalling equation (15), we recognize h(Q, λ̄; a) as the factor of f (Q, λ|a) depending on λ. We can also interpret it as the likelihood factor L(Q, λ, µ; a) with µ integrated with respect to the prior. The completion λ̄a,q : N × A is a state transition function on an infinite state set, defined as λ(a′ , q ′) (a′ , q ′ ) ≤ (a, q) λ̄a,q (a′ , q ′) = |Q∗ | + 1 (a′ , q ′ ) = (a, q)+ λ((a′ , q ′ ) ) + 1 (a′ , q ′ ) > (a, q) , − + where ≤, =, and > are derived from the q-then-a lexicographic order2 and (a, q)+ denotes the successor of (a, q). We see that λ̄a,q agrees with λ for arguments up to (a, λ). That is the sense in which is it a completion. In terms of its other arguments, λ̄a,q is maximally unstructured, in the following sense: once the machine encounters an action profile a′ in a machine state q ′ , where (a′ , q ′ ) > (a, q), the machine never revisits a state it has already been in. 2 Precisely, (a, q) > (a′ , q ′ ) iff (q > q ′ ) or (q = q ′ ∧ a > a′ ) 39 The value λ∗ (a, q) of the proposal λ∗ is chosen with the following probabilities. ∞ q ∗ = q + 1, a = |A|, max(a′ ,q′)<(a,q) λ∗ (a′ , q ′ ) = q Pr[λ∗ (a, q) = q ∗ ] ∝ h(Q, λ̄; a) q ∗ ∈ {1, . . . , max(a′ ,q′ )<(a,q) λ∗ (a′ , q ′) + 1} 0 otherwise. The value ∞ forces us to choose λ∗ = q + 1 if otherwise the resulting machine would have unreachable states. The support {1, . . . , max(a′ ,q′ )<(a,q) λ∗ (a′ , q ′ ) + 1} ensures that the non-initial states are correctly ordered, by preventing new state values to be introduced out of sequence. Note that with a finite amount of data, the action counts c(ai , q; λ̄) will be non-zero only for a finite number of states q ∈ N, and thus only a finite number of factors of h(Q, λ̄; a) will take on values other than one. The conditional probability mass function for the proposal λ∗ , given Q∗ , is therefore Y h(λ∗ |Q∗ , a) ≡ Pr[λ∗ (a, q)] (a,q)∈A×Q To ensure the block preserves the distribution Q, λ|a, we accept (Q∗ , λ∗ ) with probability P Q Γ( a ∈A ν+c(ai ,q;λ∗ )) i i ∗ Q ∗ q∈Q∗ h(λ|Q, a) θ|Q| ϑ|Q∗ | [Γ(ν)]|Ai |(|Q |−|Q|) ai ∈Ai Γ(ν+c(ai ,q;λ )) P min 1, · · . . · ∗ |−|Q| Q |Q Γ( ν+c(a ,q;λ)) i [Γ(ν|Ai |)] h(λ∗ |Q∗ , a) θ|Q∗ | ϑ|Q| Q ai ∈Ai q∈Q ai ∈Ai Γ(ν+c(ai ,q;λ)) Block Bλ Block Bλ is a random walk Metropolis update which proposes a λ∗ identical to λ except for a random mutation. We first determine the position of the mutation: we draw a random pair (a, q) from the uniform distribution on A × Q. We then mutate the value at (a, q): we draw λ∗ (a, q) from the uniform distribution on Q\{λ(a, q)}. For all other pairs (a′ , q ′ ), we set λ∗ (a′ , q ′ ) = λ(a′ , q ′ ). Then if necessary, we permute the non-initial states so that λ∗ satisfies the ordering condition for non-initial states. The proposal is a random walk on the space of transition functions satisfying condition 2 (order of non-initial states) but not necessarily 3 (no unreachable states). We accept the proposal λ∗ with probability Q min 1, P Γ( a ∈A ν+c(ai ,q;λ∗ )) i i Q ∗ q∈Q ai ∈Ai Γ(ν+c(ai ,q;λ )) P Q Γ( a ∈A ν+c(ai ,q;λ)) i i Q q∈Q a ∈A Γ(ν+c(ai ,q;λ)) i i 40 · 1Λ(A,Q) (λ∗ ) . The indicator function 1Λ(A,Q) , which comes from the prior on λ, means we reject λ∗ with certainty if it has unreachable states. C Computing Marginal Likelihood Factors We show here how to use the Gelfand and Dey (1994) procedure to compute the marginal likelihood factors ψis . In order to use this procedure, we construct a properly normalized ˆ λ). For efficiency, we choose a probability mass function probability mass function f(Q, resembling f (Q, λ|a) and having “thin tails” relative to it. We take fˆ to be the truncation of f (Q, λ|a) to the parameter subspace where |Q| ≤ Q̄: fˆ(Q, λ) ≡ f (Q, λ|a, |Q| ≤ Q̄). We choose Q̄ so that it is feasible to compute the normalization constant of this probability mass function by summing over all regular state transition functions with up to Q̄ states. For a two player game with two possible actions each, such as Prisoners’ Dilemma, |A| = 4. Since n(|A|, |Q|) = 343, 000 for |A| = 4 and |Q| = 3, truncation to Q̄ = 3 states is quite feasible. The reciprocal of the marginal likelihood factor ψis is approximated by the following posterior sample moment: M ˆ j , λj ) 1 X f(Q , Q QTσ M j=1 σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Qj , λj ) where M is the size of the posterior sample and (Qj , λj )j∈{1,...,M } are the posterior draws of (Q, λ) generated by our Markov chain. We ignore the posterior draws of µ. Provided that our posterior simulation chain is ergodic and has Q, λ, µ|a as its invariant distribution, this sample moment converges almost surely to its population counter- 41 part: M 1 X fˆ(Qj , λj ) Q QTσ M j=1 f (Qj ) · f (λj |Qj ) · σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Qj , λj ) fˆ(Q, λ) QTσ f (Q) · f (λ|Q) · σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Q, λ) X fˆ(Q, λ) · f (Q, λ|a) = Q QTσ f (Q) · f (λ|Q) · t=1 f (aitσ |a1σ , . . . , at−1,σ , Q, λ) σ∈Σ |Q|≤Q̄,λ∈Λ(A,Q) X 1 ˆ λ) f(Q, = Q QTσ f (a |a , . . . , a ) itσ 1σ t−1,σ |Q|≤Q̄,λ∈Λ(A,Q) t=1 σ∈Σ → E[ = Q 1 Q σ∈Σ QTσ t=1 f (aitσ |a1σ , . . . , at−1,σ ) , which is indeed the reciprocal of the marginal likelihood factor ψis . We compute the marginal likelihood f (a) as the product over all players i ∈ N and subjects s ∈ Si of these marginal likelihood factors. References Abreu, D., and Rubinstein, A. (1988). ‘The Structure of Nash Equilibrium in Repeated Games with Finite Automata’, Econometrica, 56: 1259–1281. Aoyagi, M., and Frechette, G. (2004). ‘Collusion in Repeated Games with Imperfect Monitoring’, Working paper, Harvard Business School. Aumann, R. J. (1981). ‘Survey of Repeated Games’, in R. J. Aumann (ed.), Essays in Game Theory and Mathematical Economics in Honor of Oscar Morgenstern, pp. 11–42. Bibliographisches Institut, Zurich. Axelrod, R. (1984). The Evolution of Cooperation. Basic Books, New York. Ben-Porath, E. (1993). ‘Repeated Games with Finite Automata’, Journal of Economic Theory, 59(1): 17–32. Bernardo, J. M., and Smith, A. F. M. (1994). Bayesian Theory. John Wiley and Sons, Chichester, England. 42 Binmore, K., and Samuelson, L. (1992). ‘Evolutionary stability in repeated games played by finite automata’, Journal of Economic Theory, 57: 278–305. Camera, G., and Casari, M. (2007). ‘Cooperation among strangers: an experiment with indefinite interaction’, Working Paper 1201, Krannert School of Management, Purdue Univerity. Dal Bo, P., and Frechette, G. (2007). ‘The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence’, Working paper, New York University, Department of Economics. Duffy, J., and Ochs, J. (2006). ‘Cooperative Behavior and the Frequency of Social Interaction’, Working paper 274, University of Pittsburgh, Department of Economics. El Gamal, M., and Grether, D. (1995). ‘Are People Bayesian? Uncovering Behavioral Strategies’, Journal of the American Statistical Association, 90: 1127–1145. Engle-Warnick, J. (2007). ‘Five Indefinitely Repeated Games in the Laboratory’, Série Scientifique, Centre interuniversitaire de recherche en analyse de organisations, no. 2007s-11. Engle-Warnick, J., and Slonin, R. L. (2006). ‘Inferring repeated-game strategies from actions: evidence from trust game experiments’, Economic Theory, 28: 603–632. Gelfand, A. E., and Dey, D. K. (1994). ‘Bayesian Model Choice: Asymptotics and Exact Calculations’, Journal of the Royal Statistical Society Series B, 56: 501–514. Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. Wiley, Hoboken, New Jersey. Kalai, E., and Stanford, W. (1988). ‘Finite Rationality and Interpersonal Complexity in Repeated Games’, Econometrica, 56(2): 387–410. Koop, G. (2003). Bayesian Econometrics. John Wiley & Sons, Chichester, England. McKelvey, R. D., and Palfrey, T. R. (1995). ‘Quantal Reponse Equilibria for Normal Form Games’, Games and Economic Behavior, 10: 6–38. 43 Miller, J. (1996). ‘The Coevolution of Automata in the Repeated Prisoner’s Dilemma’, Journal of Economic Behavior and Organization, 29: 87–112. Neyman, A. (1985). ‘Bounded Complexity Justifies Cooperation in the Finitely Repeated Prisoner’s Dilemma’, Economics Letters, 19: 227–229. Norman, H., and Wallace, B. (2004). ‘The Impact of the Termination Rule on Cooperation in a Prisoner’s Dilemma Experiment’, Discussion Paper dp04/11, University of London, Royal Holloway. Rubinstein, A. (1986). ‘Finite Automata Play the Repeated Prisoner’s Dilemma’, Journal of Economic Theory, 39: 83–96. Selten, R., Mitzkewitz, G., and Urlich, R. (1997). ‘Duopoly Strategies Programmed by Experienced Traders’, Econometrica, 65: 517–555. 44
© Copyright 2026 Paperzz