The Ghost in the Machine: Inferring Machine

The Ghost in the Machine:
Inferring Machine-Based Strategies from Observed
Behavior
Jim Engle-Warnick∗
William J. McCausland
†
John H. Miller
‡
September 3, 2007
Abstract
We model strategies of experimental subjects playing indefinitely repeated games
as finite state machines. In order to do likelihood-based inference on machines using
data from experiments, we add a stochastic element to the machines. As in most
machine models for repeated game strategies, state transitions are deterministic.
Actions, however, are random and their distribution is state dependent. We describe Markov Chain Monte Carlo methods for simulating machine strategies from
their posterior distribution. We apply our procedures to data from an experiment
in which human subjects play an indefinitely repeated Prisoners’ Dilemma. We
show examples of estimation, prediction and testing.
Keywords: strategy inference, Bayesian methods, experimental economics, Prisoners’ Dilemma,
repeated games.
JEL classification nos. C11 , C15, C72, C73, C92
Acknowledgments: We thank Alexander Campbell, Julie Héroux, and Anthony Walsh for
valuable research assistance. We gratefully acknowledge financial support from the Economic
and Social Research Council, UK and Nuffield College, University of Oxford. The Centre for
Interuniversity Research and Analysis on Organizations (CIRANO) and the Bell University
Laboratory in Electronic Commerce and Experimental Economy graciously provided use of the
experimental laboratory. Software used in this paper is open source and freely available: send
a request to [email protected].
∗
McGill University, Department of Economics, 855 Sherbrooke St. W., Montreal, QC, H3A 2T7,
Canada, CIREQ and CIRANO. email: [email protected].
†
(corresponding author) Université de Montréal, Département de sciences économiques, C.P.
6128, Succursale centre-ville Montreal, QC H3C-3J7, Canada, CIREQ and CIRANO. email:
[email protected].
‡
Carnegie Mellon University, Department of Social and Decision Sciences, Pittsburgh, PA 15213 and
the Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, e-mail: [email protected].
1
Introduction
Understanding human strategic behavior in games requires both a theoretical perspective
on what factors drive such decisions as well as a careful identification of the empirical
foundations of actual behavior. We hypothesize that we can effectively capture observed
strategic behavior by mapping it onto simple algorithms (here finite state machines) using
the techniques described below. While we remain agnostic as to the underlying forces
promoting such mappings—such machines could result from either a carefully derived
rational process or an over reliance on simple heuristics—the analysis of such processes is
valuable nonetheless. Using this technique, we can ask questions such as how complicated
are the machines that drive human strategic behavior, what strategic patterns do such
machines embrace, can we make better predictions of actual behavior, and can we identify
strategic learning.
Machines (or finite automata) are discrete-time finite-state systems that generate discrete outputs in response to discrete inputs. Aumann (1981) suggested that machines
would be a useful way to represent repeated game strategies in economics. Inputs are
the actions of opponents and outputs are the actions played by the machine strategy.
Automata have been used to model bounded rationality in repeated games in Neyman
(1985), Rubinstein (1986), Abreu and Rubinstein (1988), Kalai and Stanford (1988) and
Ben-Porath (1993). Binmore and Samuelson (1992) used them to explore evolutionary
games. Miller (1996) considered learning. Axelrod (1984) ran a tournament for computerized strategies in the repeated Prisoners’ Dilemma, and found that the simple two-state
machine strategy Tit-For-Tat proved to be a remarkable and robust performer.
This paper introduces likelihood-based inference for machines in a repeated game
setting. For this purpose, we add a stochastic element to the machines. State transitions
are still deterministic, but actions are random and their distribution is state dependent.
We choose this approach over stochastic state transitions for three reasons. First, a
machine with random actions and deterministic transitions corresponds more directly
to a behavioral strategy, a well known theoretical construct. Second, random actions
are more parsimonious than random state transitions, since actions depend only on the
state and state transitions also depend on the action profile. Third, deterministic finite
state machines already condense the history of play in a supergame to a discrete value
identifying the current state; making state transitions stochastic adds a second source of
1
memory imperfection that some may find redundant.
We stress that the randomness we introduce is in the random choice tradition, not
the random preference tradition. Thus our approach is similar to that of El Gamal and
Grether (1995), who analyse data from an individual dynamic discrete choice experiment.
They suppose that the data generating process is governed by a decision rule and an error
probability, and go on to estimate decision rules. In our approach, we can, but need
not, interpret random actions as deviations from deterministic actions, and realizations
of low probability actions as errors. We emphasize that these errors are not payoff
related. In particular, our approach should not be confused with approaches used in
some experimental work based on Quantal Response Equilibria, introduced by McKelvey
and Palfrey (1995). There, actions are random because payoffs are random.
We allow state transitions to depend not only on opponents’ actions but also the
machine’s own action. This is in contrast to state transitions for deterministic machines, which depend explicitly on opponents’ actions but not the own action. For these
machines, the own action is a deterministic function of the current state, so explicit dependence would be redundant. Once a machine’s actions are stochastic, however, own
actions are no longer deterministic functions of the state: they bear relevant additional
information that may be important for future stages of the supergame. We will see that
for at least some experimental subjects, there is empirical evidence of own-action dependence. We also note that ruling out own-action dependence would involve only minor
changes to the methods described below.
Several authors have used experiments to study how frequencies of observed actions
respond to changes in the parameters of a repeated game. Dal Bo and Frechette (2007)
vary the continuation probability and the mutual cooperation payoff in a indefinitely
repeated Prisoners’ Dilemma game and report how rates of mutual cooperation change
with experience in situations where repeated cooperation is and is not an equililibrium of
the repeated game. Norman and Wallace (2004) consider repeated Prisoners’ Dilemma
games with three different termination rules and report rates of cooperation. Duffy and
Ochs (2006) look at cooperation frequencies for different subject matching processes.
Camera and Casari (2007) go a step further towards uncovering strategies, using a
Probit model to estimate conditional action probabilities given features of previous play.
They use these to classify subjects’ strategies in several variants of a repeated Prisoners’
2
Dilemmas.
Other papers have explicitly tried to uncover repeated game strategies.
In Sel-
ten, Mitzkewitz, and Urlich (1997), experienced traders in a duopoly game explicitly
choose strategies, in the form of algorithms, rather than sequential actions. Aoyagi and
Frechette (2004) estimate parameters of a threshold strategy in a repeated Prisoners’
Dilemma where subjects learn about opponents’ actions only through a noisy signal.
Engle-Warnick and Slonin (2006) use a counting method to fit deterministic finite state
machines to observed actions in repeated trust games. The main difference between this
paper and Engle-Warnick and Slonin (2006) is that we introduce a stochastic element to
machines and do formal inference.
We assume that each subject is associated with a single machine throughout the
sample used for inference. We know this assumption is not innocuous, since learning
is plausible. However, we use statistical tests for break points to show that after five
rounds (out of twenty), the play of all subjects except two (out of twelve) is plausibly
stable. We use rounds 6 through 20 for inference and while we report results for all
subjects, we note that for two of them, the assumption of stability is untenable. We
stress that our inferential approach does not commit us to the assumption of stability.
In the conclusions, we discuss ways to extend our inferential methods to models where
subjects may change machines during an experiment.
The main contribution of this paper is the introduction of the first formal statistical
procedure inferring finite state machines as repeated game strategies. We introduce
the procedure with the well known indefinitely repeated Prisoners’ Dilemma game. We
illustrate the usefulness of the procedure with examples of estimation, prediction, and
model selection.
In Section 2, we give definitions and notation for our stochastic machines and other
objects. Section 3 lays out assumptions about machine play in repeated game experiments. These assumptions specify a stochastic data generating process, which gives the
conditional probability mass function for all actions in an experiment given a vector of
machines, one for each subject. Interpreted as a likelihood function, this function factors
by subject, which justifies subject-by-subject, as opposed to joint, inference.
Section 4 provides a prior distribution for the unknown machines. The prior completes
the model, since we now have a joint distribution of unknown machines and observed
3
actions. This allows Bayesian inference for the unknown machines. Section 5 shows how
we can analytically integrate out action probabilities from the posterior distribution of
machines and explains why this facilitates Bayesian inference.
Section 6 introduces a Markov Chain Monte Carlo method for simulating machines
from their posterior distribution, and a method for computing Bayes factors, which are
important for model comparisons. While this section is technically difficult, it describes
a method that delivers something conceptually simple: for each subject, a large sample
of machines representing the uncertainty we have about her strategy after observing her
play in the laboratory.
Section 7 describes a laboratory experiment on the repeated Prisoners’ Dilemma.
Twelve subjects play twenty rounds of supergames with a mean duration of five periods.
Section 8 presents empirical results, including examples of estimation, testing and
prediction. We obtain these results with Monte Carlo methods, using the large samples
from the posterior distribution of each subject’s machine, drawn using the methods of
Section 6.
2
Games and Machines
Here we provide some formal definitions, and examples, that will provide the needed
framework for what follows. We consider a world in which agents play a repeated stage
game. We call a single instance of a repeated stage game a supergame. An agent’s
repeated-game strategy is embodied by a machine, a representation that provides a compact description of a broad swath of potential strategies.
Agents repeatedly play a stage game, γ, defined by the triple (N, A, (ui )i∈N ). The set
of players is given by N. Each player i ∈ N has a set Ai of potential actions. Let A =
×i∈N Ai be the action profile set and let a = (ai )i∈N ∈ A be the action profile, the vector of
actual actions chosen by the players during the stage game. The payoff function for each
player i ∈ N is given by ui : A → R. For example, a Prisoner’s Dilemma stage game has
N = {1, 2}, A = A1 ×A2 = {d, c}×{D, C} (where we use lower case to indicate the actions
of Player 1), and u1 (d, D) = u2(d, D) = uP (punishment), u1 (c, D) = u2 (d, C) = uS
(sucker), u1 (d, C) = u2 (c, D) = uT (temptation), and u1 (c, C) = u2 (c, C) = uR (reward).
Agents employ machines to implement a given strategy. We define a machine by the
4
quadruple (Q, q ◦ , λ, µ). Each machine has a non-empty and finite set of states Q. One
of these states, q ◦ ∈ Q, is the initial state. We define a (deterministic) state transition
function λ : A × Q → Q, that maps the current action profile and state of the machine
to the next state that the machine will enter. Thus, if the action profile is a ∈ A
and the machine is in state q ∈ Q, then the machine will enter state λ(a; q). The
machine takes a random action governed by a state dependent action probability mass
function, µ : Ai × Q → R. Thus, if the machine is in state q ∈ Q, it plays ai ∈ Ai with
probability µ(ai ; q). We also define, for a machine m = (Q, q ◦ , λ, µ), its machine skeleton
m̄ ≡ (Q, q ◦ , λ).
The following example illustrates the above ideas. We illustrate a machine m that
implements an “85% Tit-For-Tat” strategy for player 1 in the Prisoner’s Dilemma:
m = (Q, q ◦ , λ, µ),
Q = {1, 2},
q◦ = 1
λ((d, D), 1) = λ((c, D), 1) = 2,
λ((d, C), 1) = λ((c, C), 1) = 1
λ((d, D), 2) = λ((c, D), 2) = 2,
λ((d, C), 2) = λ((c, C), 2) = 1
µ(d; 1) = 0.15,
µ(c; 1) = 0.85,
µ(d; 2) = 0.85,
µ(c; 2) = 0.15.
Figure 1 shows a graphical representation of the machine skeleton (Q, q ◦ , λ). An action
profile (a1 , a2 ) on the arc from state q to state q ′ means that λ((a1 , a2 ), q) = q ′ , that is,
the transition function evaluated at action profile (a1 , a2 ) and current state q, is q ′ . In
both states, if the opponent plays C (which happens in action profiles (c, C) and (d, C)),
the machine will be in State 1 for the next stage. If the opponent plays D (action profiles
(c, D) and (d, D)) the machine will be in State 2.
In State 1, the machine plays C with probability 0.85 and D with probability 0.15,
while in State 2 the probabilities are reversed. Thus, the machine behaves very much like
a traditional Tit-For-Tat except that it always has a chance (15%) of taking the opposite
of the “traditional” action.
Figure 1: Tit-For-Tat machine skeleton
(c, D), (d, D)
(c, C), (d, C)
1
2
(c, C), (d, C)
5
(c, D), (d, D)
Table 1: Action aitσ by player i, period t and supergame σ
3
σ
i
s
t=1 t=2 t=3 t=4
t=5
t=6 t=7 t=8 t=9
1
1
1
c
c
d
c
c
d
d
d
1
2
2
C
D
C
C
D
C
D
D
2
1
2
d
c
c
d
d
c
c
c
c
2
2
1
C
D
C
C
C
D
C
C
C
The Data Generating Process
In this section and the next, we describe our model for the repeated game strategies of
subjects in an experiment. This section concerns the data generating process, by which
random actions are governed by machines. We complete the model in the next section
by adding a prior distribution over the unknown machines.
For illustrative purposes, we consider the following example (to which we frequently
return throughout the paper). Two subjects, corresponding to s = 1 and s = 2, twice
play a repeated prisoners’ dilemma with a random number of periods. We observe the
data in Table 1. The first supergame, corresponding to σ = 1, has eight periods. Subject
1 plays as player i = 1 and subject 2 plays as player i = 2. The second supergame,
corresponding to σ = 2, has nine periods. Here, subject 1 plays as player i = 2 and
subject 2 plays as player i = 1. The table gives the actions aitσ played by player i at
period t of supergame σ.
3.1
Data Structures
We need data structures for representing supergame data collected in the laboratory.
There is a set Σ of supergames, matches of random duration between two subjects. We
let Tσ be the duration of supergame σ ∈ Σ in periods. In our example, Σ = {1, 2}, T1 = 8
and T2 = 9.
We use the notation aitσ ∈ Ai to represent the action of player i in period t of
supergame σ. Grouping across players, we define atσ ∈ A as the action profile (aitσ )i∈N .
Thus, for i = 1, t = 4 and σ = 2, aitσ = d and atσ = (d, C). Grouping across time, we let
aσ be the supergame action profile vector (a1σ , . . . , aTσ σ ). Thus for σ = 1, aσ consists of
6
all the actions in the first two rows of the table. Finally, we let a be the vector (aσ )σ∈Σ
of all actions.
We have a set S of subjects, and in our example, S = {1, 2} For each s ∈ S, we let
Σs ⊆ Σ be the set of supergames in which subject s plays. In our example, Σ1 = Σ2 = Σ,
but in experiments with several subjects, the Σs will be strict subsets. For each i ∈ N
and σ ∈ Σ, we let siσ ∈ S be the subject playing as player i in supergame σ. For each
s ∈ S and σ ∈ Σ, we let isσ be the player that subject s plays in supergame σ.
3.2
The Data Generating Process
We consider actions to be endogenous and all other features of our experimental data,
including subject pairings and supergame durations, to be exogenous. In this paper, we
suppose that each subject, s, uses a single machine, ms , as a repeated game strategy
in all the supergames it plays. This rules out learning, which is not innocuous. In our
empirical analysis, we omit data during a transient learning phase, and suppose that
subjects use the same machine in later supergames. In the concluding section of the
paper, we discuss models where subjects can switch from machine to machine.
The unknown variables of the model are, for each subject s ∈ S, a machine ms =
(Qs , qs◦ , λs , µs ) governing the actions of subject s. We let m be the vector (ms )s∈S of all
machines.
Our objective is to learn about plausible machine strategies from data such as those
in Table 1. We will use a likelihood-based approach to inference, and to do so, we need
to be able to compute the probability of realized play for any vector (ms )s∈S of machines.
As an example, here we compute the probability that two subjects using 85% TitFor-Tat machines generate the actions in Table 1.
We make the following three assumptions. The first (Machine Play) is that if subject
s plays player i in supergame σ, the conditional probability that s plays action ai at
period t, given play in previous periods of that supergame, is given by µs (ai ; q), where
q is the state that the player’s machine is in after previous play in supergame σ. Recall
that q depends deterministically on the machine skeleton (Qs , λs , qs◦ ) and previous play
in σ.
So, for example, the conditional probability that player 1 plays d in period t = 3 of
supergame σ = 1, given that action profiles (c, C) and (c, D) are observed in the first
7
two periods of that supergame, is
Pr[a131 = d|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 ] = 0.85.
This is because subject 1’s machine is in state 2 after the opponent defected in period 2,
and subject 1’s machine defects with probability 0.85 in state 2.
Assumption 3.1 (Machine Play) For all i ∈ N, t ∈ Tσ and σ ∈ Σ, the probability
mass function for aitσ |a1σ , . . . , at−1,σ is given by
µsiσ (aitσ ; q(a1σ , . . . , at−1,σ ; qs◦iσ , λsiσ )),
where the state tracking function q(a1 , . . . , at−1 ; q ◦ , λ) gives the state that machine (Q, q ◦ , λ, µ)
will be in after the action history (a1 , . . . , at−1 ). That is,

q ◦
q(a1 , . . . , at−1 ; q ◦ , λ) ≡
λ(a , q(a , . . . , a
t−1
1
t−2 ; q
t=1
◦
(1)
, λ)) t ≥ 2.
Our second assumption (Conditional Action Independence) is that in each period of
a supergame, players’ actions are conditionally independent given previous play in the
supergame. This rules out correlation devices.
Assumption 3.2 (Conditional Action Independence) For all σ ∈ Σ and t ∈ Tσ ,
the actions aitσ , i ∈ N, are conditionally independent given (a1σ , . . . , at−1,σ ).
In our example, the conditional probability of action profile (d, C) in period t = 3 of
supergame σ = 1, given that action profiles (c, C) and (c, D) are observed in the first
two periods of that supergame, is
Pr[(a131 , a231 ) = (d, C)|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 , m2 ]
= Pr[a131 = d|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m1 ]
·
Pr[a231 = C|(a111 , a211 ) = (c, C), (a121 , a221 ) = (c, D), m2 ]
= (0.85)2 .
Our third and final assumption (Supergame Independence) is that play across supergames is independent. In our example, play in supergame 1 is independent of play in
supergame 2.
8
Assumption 3.3 (Supergame Independence) The supergame action profile vectors
aσ are independent.
The assumptions imply that the conditional probability of the vector a of all actions,
given machines (ms )s∈S is given by
Pr(a|m) =
Tσ Y
YY
µsiσ (aitσ ; q(a1σ , . . . , at−1,σ ; qs◦iσ , λsiσ )).
(2)
σ∈Σ t=1 i∈N
3.3
The Likelihood Function
We can interpret the right hand side of (2) as a function of m for fixed observed data a
(that is, as a likelihood function) and denote it L(m; a) The likelihood is a measure of
the support that the data a gives to the vector m of machines.
We can factor the likelihood by subject to give:
L(m; a) =
Y
Ls (ms ; a),
s∈S
where
Ls (ms ; a) ≡
Tσ
Y Y
µs (aisσ tσ ; q(a1σ , . . . , at−1,σ ; qs◦ , λs )).
(3)
σ∈Σs t=1
It is important to note that the factor Ls associated with subject s depends only
on the machine ms used by subject s, and not the other machines. This makes the
likelihood L multiplicatively separable in the machines ms . Intuitively, this means that
we lose nothing by using each Ls separately for inference on machine ms , rather than
using L for joint inference on the vector m. For the purposes of inference about ms , we
can consider the actions of all other subjects as fixed. In a Bayesian approach, if the ms
are a priori independent, they will also be a posteriori independent; that is, conditionally
independent given the data.
We can compute the likelihood factor Ls for machine ms in two steps. First, using the
machine skeleton (the state set Qs , the initial state qs◦ and the state transition function
λs ) and the data a, we can track the state that the machine is in at every period of every
supergame, counting the number of times it plays each action in each state. In Table 2,
we show the results of this exercise for our example. Each row gives the action count
cs (ai , q), the number of times machine ms plays action ai in state q. So for example, the
9
Table 2: Action counts cs (ai , q) by machine state
s
q
ai
cs (ai , q)
1
1
c
10
1
1
d
1
1
2
c
1
1
2
d
5
2
1
c
7
2
1
d
5
2
2
c
3
2
2
d
2
first row tells us that machine 1 cooperates ten times while in state 1. This information is
derived directly from the raw data in Table 1, which shows subject 1 cooperating twice at
the beginning of a supergame and eight times immediately after its opponent cooperates,
for a total of ten times.
The second step is to use the state counts cs (ai , q) computed in the first step and the
action probability mass function µs to compute the likelihood factor for each machine.
In our example, we can rewrite the likelihood factor Ls (ms ; a) from equation (3) as
Ls (ms ; a) ≡
2
Y
Y
[µs (ai ; q)]cs(ai ,q) .
q=1 ai ∈{c,d}
Using the information in Table 2, we compute the likelihood factor for machine 1 as
L1 (m1 , a) = µ1 (c; 1)c1(c,1) µ1 (d; 1)c1(d,1) · µ1 (c; 2)c1 (c,2) µ1 (d; 2)c1(d,2)
= (0.85)10 (0.15)1 · (0.15)1 (0.85)5
≈ 1.96547 × 10−3 .
For machine 2, we compute L2 (m2 ; a) ≈ 5.93609×10−8. We see that the actions of player
1 are much more consistent with a 85% Tit-For-Tat machine than those of player 2.
In the general case, we can write the likelihood factor for machine s as
Y Y
[µs (ai , q)]cs(ai ,q;λs ) ,
Ls (ms ; a) =
q∈Qs ai ∈Ais
10
(4)
where action counts cs (ai , q; λs ), giving the number of times the machine ms plays action
ai in machine state q, are given by
cs (ai , q; λ) =
Tσ
XX
δq,q(ai1σ ,...,ai,t−1,σ ;λ) δai ,aitσ ,
ai ∈ Ais , q ∈ Qs .
(5)
σ∈Σs t=1
4
Identification and Prior Distributions
In this section, we complete the model by providing a prior distribution over m, the
vector of all machines. But first we need to address two identification issues arising
over the initial state q ◦ and state transition function λ. First, any permutation of states
gives an observationally equivalent combination of initial state and transition function.
Second, if some states are unreachable from the initial state, the transition function
is observationally equivalent to a transition function on a smaller state set, one that
excludes the unreachable states.
For convenience and without loss of generality, we assume hereafter that A = {1, . . . , |A|}
and Q = {1, . . . , |Q|}.
We resolve these identification issues by selecting one combination (q ◦ , λ) of initial
state and transition function from each equivalence class to represent the class. We
will assign positive prior probability to the chosen combination, and zero probability
to its observationally equivalent alternatives. Specifically, we choose the (q ◦ , λ) such
that q ◦ is State 1, all states are reachable, and λ satisfies the following state ordering
condition: if we list the function values λ(a, q) in q-then-a lexicographic order, then the
first appearances of the non-initial states (2, . . . , |Q|) occur in ascending order.
In other terms, we choose the q ◦ and λ that satisfy the following three conditions.
1. (order of initial state) q ◦ = 1,
2. (no unreachable states) for every a ∈ A and every q ∈ Q\{q ◦ }, there exists a q ′ ∈ Q
and a′ ∈ A such that q ′ < q and λ(a′ , q ′) = q,
3. (order of non-initial states) For all (a, q) ∈ A × Q, there exists a (a′ , q ′ ) ∈ A × Q
such that
(a) either (q ′ < q) or both (q ′ = q) and (a′ < a), and
11
(b) λ(a′ , q ′ ) = max(λ(a, q) − 1, 1).
Having effectively set q ◦ equal to one, we suppress notation for initial states for the rest
of the paper. We will call a state transition function λ regular if it satisfies conditions 2
and 3, and denote by Λ(A, Q) the set of regular state transition functions on A × Q.
4.1
Prior Distributions
We complete the model by providing a prior distribution over all subjects’ machines.
Machines are a priori independent and identically distributed. The prior distribution of
a machine m = (Q, λ, µ) has the following conditional independence structure:
f (m) = f (Q, λ, µ) = f (Q) · f (λ|Q) · f (µ|Q).
(6)
The three right-hand-side factors are addressed in the following three sections.
Distribution Q
We leave the prior on the number of machine states unspecified, and therefore up to the
researcher to choose. For j ∈ N, we denote by θj the prior probability that the number
of states |Q| is j. Thus the probability mass function for Q is
f (Q) = θ|Q| .
(7)
We recommend choosing a prior that falls off quickly. This favors machine parsimony and
computational simplicity. For the empirical examples in the paper, we take a Poisson
distribution with mean 1 for |Q| − 1, and truncate the distribution of |Q| to the set
{1, 2, 3, 4, 5}.
Distribution λ|Q
Given Q, the state transition function λ has a uniform distribution over the set Λ(A, Q)
of regular transition functions. Thus the probability mass function for λ|Q is
f (λ|Q) =
1
,
n(|A|, |Q|)
(8)
where n(|A|, |Q|) is the number of regular state transition functions λ : A × Q → Q. For
posterior simulation, we need to be able to compute n(|A|, |Q|) and we show how to do
this in Appendix A.
12
We use a uniform prior because it is neutral with respect to state transitions: all
machine skeletons with the same number of states have the same prior probability. In
the conclusions, we discuss extensions where the prior favors some machines over others,
either for theoretical plausibility or for structural parsimony.
Distribution µ|Q
Recall that for each machine state q ∈ Q, µ(·, q) is a probability mass function on the
finite action set Ai . It can therefore be represented by a vector in the |Ai |-dimensional
unit simplex. We choose a prior such that given Q, the µ(·, q), q ∈ Q, are independent
and identically distributed, and each has a Dirichlet distribution. Dirichlet distributions
are commonly used for distributions on simplexes, and in particular, for distributions
over distributions over finite sets.
Dirichlet distributions on the |Ai |-dimensional simplex are indexed by |Ai | free parameters. In our context, there is a parameter for each possible action. Setting all parameters
equal to a single free parameter ν imposes a kind of symmetry over actions, specifically
the exchangeability of action probabilities. This is a desirable feature if one wishes to
be neutral with respect to different actions. In the context of our Prisoners’ Dilemma
example, exchangeability means that the probability of cooperating and the probability
of defecting have the same prior distribution, or equivalently, that the distribution of the
cooperation probability is symmetric around 0.5.
For ν = 1, the Dirichlet distribution with all parameters equal to ν is the uniform
distribution on the |Ai |-dimensional simplex. Setting ν less than 1 favors action probabilities near 0 or 1. In our Prisoners’ Dilemma example, |Ai | = 2. Figure 2 shows the
density of the cooperation probability for ν = 0.6. Its symmetry around 0.5 implies that
it is also the density for the defection probability. We see that probabilities near 0 or 1
are favored.
The concentration of prior probability near 0 or 1 is desirable if one wants to express
the prior information that the machine is close to some deterministic machine, without
taking a stand on which should be the high probability action in each machine state.
Low values of ν impose a kind of penalty for deviations from the deterministic machine,
and the penalty increases as ν approaches zero.
This is probably reasonable for cases, like our Prisoners’ Dilemma example, where
13
3
2.5
2
1.5
1
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 2: Density of Di(0.6, 0.6) distribution
14
0.8
0.9
1
there are no mixed strategy equilibria for the stage game. For other cases, such as
repeated matching pennies, one might prefer a uniform distribution over probabilities
(ν = 1) or even a distribution with a single mode at 1/2 (ν > 1).
We will see in the next section that the choice of the Dirichlet distribution for action
probabilities allows us to integrate out µ from the posterior distribution, an enormous
computational advantage.
Given Q, the action probability mass functions µ(·; q) are i.i.d. Dirichlet with all
parameters equal to ν. Thus the density for µ|Q is
(
)
Y
Y Γ(ν|Ai |) Y
f (µ|Q) =
f ((µ(ai; q))ai ∈Ai |Q) =
[µ(ai ; q)]ν−1 .
|Ai |
[Γ(ν)]
q∈Q
q∈Q
a ∈A
i
5
(9)
i
Integrating out Action Probabilities
In Section 3, we specified a data generating process; that is, a conditional distribution of
actions in experiments given subjects’ machines. When we interpreted the expression for
the density f (a|m) as a likelihood function, a function of m for fixed observed actions a,
we denoted it L(m; a). We also factored L into likelihood factors Ls (ms ; a). Each factor
Ls measures the degree of support that the data gives to the machine ms .
In Section 4, we specified a prior (or marginal) density f (m) over subjects’ machines.
The a priori independence of these machines implies that f (m) factors by machine.
Our model is now complete, and gives the joint distribution of machines and actions.
The fact that both prior and likelihood factor by machine implies that the posterior
density f (m|a) does as well.
In principle, we have everything we need to do Bayesian inference for unknown machines given observed actions. It would be possible to proceed directly to developing
methods to simulate the posterior distribution, the conditional distribution of machines
given states. As we will see, however, a little analytic integration goes a long way. First,
it allows us to compute integrated likelihood factors that measure the degree of support
that the data give to a machine skeleton. This allows us to perform the statistical tests
we report in Section 8. It also makes numerically efficient posterior simulation possible.
In this section we show how to analytically integrate out the action probability mass
function µs from the posterior distribution of machine ms = (Qs , λs , µs ), thereby obtaining the marginal posterior distribution of machine skeleton m̄s = (Qs , λs ). We will
15
obtain an expression for f (a|m̄), where m̄ ≡ (m̄s )s∈S is the vector of all subjects’ machine skeletons. We can interpret this as an integrated likelihood function L̄(m̄; a). We
show that the integrated likelihood function L̄, like the likelihood L, factors by subject.
Each integrated likelihood factor L̄s (m̄s ; a) gives the degree of support that the data gives
the machine skeleton m̄s . Multiplying the integrated likelihood factor L̄s (m̄s ; a) by the
prior factor f (m̄s ), we obtain a function proportional to the marginal posterior density
f (m̄s |a), which we can use to simulation this distribution.
We also derive analytic expressions for f (µs |m̄s , a), which gives the conditional posterior distribution of action probabilities given machine skeletons. These are members of
the well known Dirichlet family of distributions and we can draw variates directly from
them using standard methods.
Deriving these expressions serves two purposes. First, the closed form for L̄s (m̄s ; a),
a measure of the support that the data a give to the machine skeleton m̄s , allows us
easily to test the hypothesis that a subject uses a particular machine skeleton, such as
the Tit-For-Tat skeleton illustrated in Figure 1. Section 8 gives several examples of such
tests.
Second, it allows us to simulate the posterior distribution of machine ms with high
numerical efficiency. This is because we can simulate the marginal posterior distribution
of the machine skeleton m̄s using Markov Chain Monte Carlo methods and then draw action probabilities µs from their conditional posterior distribution given machine skeleton
m̄s . The high numerical efficiency is relative to a more obvious Gibbs sampling approach
involving the simulation of the two conditional distributions Qs , λs |µs , a and µs |Qs , λs , a.
The Gibbs sampling approach is numerically inefficient because of the strong a posteriori
statistical dependence of λs and µs .
We first show a simple example of how to integrate out action probabilities using
well known properties of the Dirichlet distribution. We refer the reader unfamiliar with
these properties to Bernardo and Smith (1994, p. 436) for details. Let pC and pD be the
probabilities that a machine cooperates or defects, respectively, in a given state. Suppose
that (pC , pD ) has the Dirichlet1 distribution Di(α, β) with parameters α > 0 and β > 0.
Then the marginal (not conditional on (pC , pD )) probability that the machine cooperates
1
Dirichlet distributions on 2-dimensional simplexes are usually known as Beta distributions. We use
the more general name here because we use it throughout the paper.
16
nc times and defects nd times while in that state is given by
Γ(α + β) Γ(α + nc )Γ(β + nd )
·
.
Γ(α)Γ(β) Γ(α + β + nc + nd )
The posterior distribution of (pC , pD ) given observed values of nc and nd is the distribution
Di(α + nc , β + nd ).
We now return to our artificial data example and particularly the derived action
counts in Table 2. If we specify a prior distribution Di(0.6, 0.6) for the action probabilities
(µ1 (C; 1), µ1(D; 1)) of machine 1 in state 1, the probability of the sequence of observed
actions for machine 1 in state 1 is
Γ(0.6 + 0.6) Γ(10 + 0.6) · Γ(1 + 0.6)
·
≈ 0.0081025.
Γ(0.6)Γ(0.6) Γ(10 + 1 + 0.6 + 0.6)
The conditional distribution of the action probabilities (µ1 (C; 1), µ1(D; 1)), given the data
and the Tit-For-Tat skeleton, is Di(10.6, 1.6). The conditional density of µ1 (C; 1), the
probability of cooperating, is plotted in Figure 3. After observing the machine cooperate
ten times and defect once while in state 1, there is much less uncertainty about the
probability of cooperating while in state 1, and much more posterior probability on high
values of the cooperation probability in state 1.
In a similar way, we can compute an integrated likelihood function L̄(m̄1 , m̄2 ; a),
defined as the probability that fixed machine skeletons m̄1 and m̄2 generate actions a.
The steps are as follows. We first multiply the prior density f (µ1 )f (µ2 ) by the likelihood
factor L(m1 , m2 ; a), where mi , i = 1, 2, is the machine with skeleton m̄i and action
probability mass function µi. Since L(m1 , m2 ; a) is the probability of a given µ1 , µ2 and
fixed m̄i , the product gives the joint distribution of µ1 , µ2 and a. Integrating out µ1 and
µ2 gives the desired marginal probability as
Q
2 Y
2
Y
Γ(2ν)
ai ∈{c,d} Γ(ν + cs (ai , q))
P
L̄(m̄1 , m̄2 ; a) =
·
.
[Γ(ν)]2 Γ(2ν + ai ∈{c,d} cs (ai , q))
s=1 q=1
Just as the likelihood function L(m1 , m2 ; a) measures how well the data support various combinations (m1 , m2 ) of two machines, the integrated likelihood function measures
how well the data support various combinations m̄1 = (Q1 , λ1 ) and m̄2 = (Q2 , λ2 ) of
machine skeletons.
17
6
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 3: Density of Di(10.6, 1.6) distribution
18
0.8
0.9
1
We note that the integrated likelihood function also factors by machine. We define
the integrated likelihood factor for machine s as
Q
2
Y
Γ(2ν)
ai ∈{c,d} Γ(ν + cs (ai , q))
P
·
.
L̄s (m̄s ; a) ≡
2
[Γ(ν)]
Γ(2ν
+
c
(a
,
q))
s
i
a
∈{c,d}
i
q=1
We compute the integrated likelihood factor for player 1 as
Γ(1.2) Γ(10.6)Γ(1.6) Γ(1.2) Γ(1.6)Γ(5.6)
·
[Γ(0.6)]2
Γ(12.2)
[Γ(0.6)]2
Γ(7.2)
−4
≈ 1.7566 × 10 ,
L̄1 (m̄1 ; a) =
and the integrated likelihood factor for player 2 as L̄2 (m̄2 ; a) ≈ 9.4698 × 10−7. We see
that the actions of player 1 are much more consistent with a Tit-For-Tat machine skeleton
than those of player 2. This is similar to the earlier result in Section 3 that the actions of
player 1 are much more consistent with the 85% Tit-For-Tat machine than those of player
2. We emphasize that the conclusion based on the comparison of integrated likelihood
factors is not dependent on any particular choice of µ1 and µ2 . It averages over various
µ1 and µ2 , favoring action probabilities near 0 or 1.
In the general case, we can write the integrated likelihood factor for machine s as
Q
[Γ(ν · |Ai |)]|Q| Y ai ∈Ai Γ(ν + cs (ai , q; λs ))
P
L̄s (m̄s ; a) =
·
,
[Γ(ν)]|Ai ||Q| q∈Q Γ( ai ∈Ai ν + cs (ai , q; λs )
s
where i is the player type played by subject s. Recall that the action counts cs (ai , q; λ)
are given by (5).
6
Bayesian Inference for Machines
Here we consider the problem of Bayesian inference for machines. We have seen that the
prior and likelihood, and therefore the posterior, factor by subject. Thus the subjects’
machines are a posteriori independent, and we can do inference for each machine separately. We therefore focus on the problem of inference for the machine m = (Q, λ, µ)
used by a particular subject s. To simplify the subsequent exposition, we will suppress
the index s where possible.
19
We implement Bayesian inference by simulating the posterior distribution of unknown
objects given observed data. Here, the relevant unknown object is the machine m. Posterior simulation delivers a large sample {m(j) }M
j=1 of machines representing the posterior
distribution ms |a.
Reporting features of the posterior distribution using these samples is conceptually
straightforward. For example, we estimate the posterior probability that the machine
has three machine states by counting the number of machines m(j) with three machine
states and dividing by M. More generally, we estimate posterior population moments by
computing the corresponding sample moments. We choose M to control the numerical
error of the estimates.
The ability to analytically integrate out action probabilities allows us to draw a posterior sample from the distribution ms |a using the following two steps. First, we draw
a posterior sample {m̄(j) }M
j=1 of machine skeletons from the distribution m̄s |a and then
for each skeleton m̄(j) , we draw the action probability mass functions µ(j) from the distribution µs |m̄s = m̄(j) , a. (In practice, we interleave the two steps, but the result is the
same.)
We draw the sample {m̄(j) }M
j=1 of machine skeletons using a Markov Chain with
stationary (or invariant) distribution m̄s |a. At each step j, we take as input a machine
skeleton m̄(j) and apply a random update procedure to generate a new machine skeleton
m̄(j+1) . The update procedure defines a Markov transition distribution m̄(j+1) |m̄(j) .
Our update procedure consists of a chain of simpler update procedures, which we will
′
call blocks. There are three blocks, denoted BQ,λ, BQ,λ
, and Bλ , and we describe them
below. The blocks have the special property that they preserve the distribution m̄s |a.
That is, if we apply one of the blocks to an input machine skeleton having the distribution
m̄s |a, then the random output machine skeleton has the distribution m̄s |a as well. The
blocks can be chained together in any order, possibly with repetition, and the compound
update procedure will also preserve the distribution ms |a. Thus the compound update
procedure has the posterior distribution m̄s |a as its stationary distribution.
Drawing µ from the conditional posterior distribution µs |Qs , λs , a is easy, as we will
show that this distribution is Dirichlet. We draw Dirichlet variates using standard methods.
In the rest of Section 6 we show how to decompose the posterior distribution of
20
machines into a marginal posterior distribution of machine skeletons and a conditional
posterior distribution of action probability mass functions. We then briefly describe the
update blocks we use to simulate the posterior distribution of machine skeletons. Finally
we show how to compute factors of the marginal likelihood, the evaluation of the marginal
mass function f (a) at the observed value of the action vector a. The marginal likelihood
factors are important for model comparison using Bayes factors.
6.1
Decomposing the Posterior Distribution
We first integrate out the action probability mass function µ from the posterior Q, λ, µ|a
to derive an unnormalized probability mass function for the skeleton Q, λ|a. We also
derive the density function for the distribution µ|Q, λ, a. These give us a decomposition
of the joint posterior distribution Q, λ, µ|a into the marginal posterior distribution Q, λ|a
and the conditional posterior distribution µ|Q, λ, a.
This decomposition is extremely useful. It allows us to simulate, using MCMC methods, the posterior distribution of the machine skeleton, rather than the posterior distribution of the machine itself. We can, and do, still obtain draws of machines, by drawing
action probability mass functions directly from µ|Q, λ, a.
We noted in Section 3 that the likelihood function factored by player and subject.
Since the prior also factors in this way, the posterior does too. Bayes’ rule then tells us
that the posterior distribution of a subject’s machine is proportional to the product of
its prior distribution and the likelihood factor attributable to that player and subject:
f (Q, λ, µ|a) ∝ f (Q, λ, µ) · L(Q, λ, µ; a).
(10)
It is convenient to express the likelihood factor L(Q, λ, µ; a) in terms of action counts.
The action count c(ai , q; λ) gives the number of times the machine (Q, λ, µ) plays action
ai in machine state q. That is,
c(ai , q; λ) =
Tσ
XX
δq,q(ai1σ ,...,ai,t−1,σ ;λ) δai ,aitσ ,
ai ∈ Ai , q ∈ Q,
(11)
σ∈Σs t=1
where δ denotes the Kronecker delta function. In terms of action counts, the likelihood
factor L(Q, λ, µ; a) becomes
L(Q, λ, µ; a) =
Y Y
q∈Q ai ∈Ai
21
[µ(ai , q)]c(ai ,q;λ) .
(12)
Combining equations (6), (9), (10) and (12), we obtain the posterior density
f (Q, λ, µ|a) ∝ f (Q) · f (λ|Q) ·
[Γ(ν)]|Ai ||Q| Y Y
·
[µ(ai , q)]ν+c(ai ,q;λ)−1 .
[Γ(ν|Ai |)]|Q| q∈Q a ∈A
i
(13)
i
We know that f (µ|Q, λ, a) is proportional (in µ) to f (Q, λ, µ|a) by Bayes’ rule, and
therefore it is proportional in turn to the expression on the right hand side of equation
(13). Taking this expression as a function of µ, we recognize it as proportional to a
product (over Q) of Dirichlet densities with parameters (ν + c(ai , q; λ))ai ∈Ai . Since it
must integrate to one, f (µ|Q, λ, a) is equal to this product of Dirichlet densities:
" P
#
Y Γ( a ∈A ν + c(ai , q; λ)) Y
ν+c(a
,q;λ)−1
i
Q i i
f (µ|, Q, λ, a) =
.
[µ(ai ; q)]
Γ(ν
+
c(a
,
q;
λ))
i
ai ∈Ai
q∈Q
a ∈A
i
(14)
i
We see that the µ(·; q), q ∈ Q, are conditionally independent and Dirichlet distributed,
given Q, λ and a.
Since f (Q, λ, µ|a) = f (Q, λ|a) · f (µ|Q, λ, a), we can divide (13) by (14) to obtain the
following unnormalized probability mass function for the marginal posterior distribution
Q, λ|a:
6.2
P
[Γ(ν)]|Ai ||Q| Y Γ( ai ∈Ai ν + c(ai , q; λ))
Q
f (Q, λ|a) ∝ f (Q) · f (λ|Q) ·
.
[Γ(ν|Ai |)]|Q| q∈Q ai ∈Ai Γ(ν + c(ai , q; λ))
(15)
Machine Skeleton Update Blocks
Here, we briefly describe the three update blocks that we chain together to construct
our compound update block. Each block takes as input a machine skeleton (Q, λ) and
randomly generates a machine skeleton (Q′ , λ′ ). All three blocks preserve the marginal
posterior distribution of the skeleton and all are examples of Metropolis-Hastings updates.
For more information on Metropolis-Hastings updates, we direct the reader to Geweke
(2005) or Koop (2003). For details on these parameter blocks, we refer the reader to
Appendix B.
Block BQ,λ draws a proposal Q∗ from a distribution tailored to the posterior distribution Q|a, then a proposal λ∗ from the prior distribution λ|Q. The proposal is randomly
accepted, in which case Q′ = Q∗ and λ′ = λ∗ ; or rejected, in which case Q′ = Q and
λ′ = λ.
22
′
Block BQ,λ
draws the proposal Q∗ from the same posterior-tailored distribution, but
draws the proposal λ∗ from a posterior distribution more closely resembling the posterior
distribution Q, λ|a. The acceptance probability is higher, but so is the computational
cost.
Block Bλ does nothing to Q. That is, Q′ = Q. It generates a random proposal λ∗
by applying a random mutation to λ. If the proposal is accepted, λ′ = λ∗ . Otherwise,
λ′ = λ.
6.3
Computing the Marginal Likelihood
We use the method of Gelfand and Dey (1994) to compute the marginal likelihood, the
evaluation of the marginal probability mass function f (a) at the observed vector of all
actions. The marginal likelihood is a standard Bayesian model evaluation criterion, and
has two important interpretations. First, it is the out-of-sample prediction record of the
model, for data a. The greater the value, the better the out-of-sample prediction record.
Second, it gives the weight of the model’s posterior probability in a compound model
consisting of a mixture of models with equal prior probability.
To compute f (a), we first factor it by player and subject. The marginal likelihood
factor attributable to player i, subject s is
ψis ≡
Tσ
YY
f (aitσ |a1σ , . . . , at−1,σ ).
σ∈Σs t=1
We compute the ψis using the Gelfand and Dey (1994) procedure. The procedure
requires the specification of a normalized distribution approximating the posterior distribution of unknown parameters. We describe our implementation in Appendix C.
7
Data
We use laboratory data on the repeated Prisoners’ Dilemma described in Engle-Warnick
(2007). In the stage game, the punishment (mutual defection) payoff is 60, the temptation
payoff (play d when opponent plays C) is 180, the sucker payoff (play c when opponent
plays D) is 0, and the reward payoff (play c when opponent plays C) is 90. Payoff units
are Canadian cents. A supergame is an indefinitely repeated Prisoner’s Dilemma stage
23
game, with a continuation probability of 0.8. There are twenty rounds, in each of which
all subjects are randomly and anonymously paired to play a supergame. In each round,
all pairs play a supergame of the same random duration. Across rounds, durations are
statistically independent. Details on experimental design and procedures, as well as some
data analysis can be found in Engle-Warnick (2007), which refers to the stage game as
“Hawk-Dove.” We use the second session of data, the first of three sessions with twelve
subjects.
8
Results
We have applied our methods to the data described in Section 7, generating a posterior
sample of 100,000 machines for each of the twelve experimental subjects.
We will first report some high posterior probability machine skeletons. Although
we have a diffuse prior over a vast number of possible machine skeletons and a limited
amount of data for each subject, we manage to discern some high probability machine
skeletons in the posterior distribution.
For most subjects, however, the posterior distribution scatters most of the posterior
probability over a large number of individually improbable machine skeletons. We will
see later in the section, however, that the summary statistics provide useful insights into
these posterior distributions.
There are many possible summary statistics to report, and different researchers will
have different ideas about what features of the posterior distribution are important. We
hope the many examples we offer in this section are interesting in and of themselves. In
their variety, they also suggest the rich assortment of possibilities. We classify our results
as examples of estimation, prediction and testing, corresponding to three major themes
in statistics and econometrics.
We caution the reader against attaching much significance to the results for subjects
7 and 10, since there is strong evidence that these subjects do not play the same strategy
throughout the sample used for inference.
24
8.1
High Posterior Probability Machine Skeletons
We now report, for each subject, any machine skeletons with high posterior probability.
No machine skeleton has posterior probability greater than 0.01 for subjects 2, 7, 9 and
10. Subjects 3 and 5 have 8 and 2 skeletons, respectively, with posterior probability
between 0.01 and 0.02, and none more probable.
Subjects 8 and 12 play the one-state (memoryless) machine with probabilities 0.592
and 0.704, respectively, but their action probabilities are very different. For subject 8, the
conditional posterior probability of cooperating, given the one-state machine, is 0.592,
while for subject 12, the conditional posterior probability of defecting is 0.991.
For subjects 1, 4, 6 and 11, there are several machine skeletons with posterior probability greater than 0.02 and these are shown in Table 3. Each column gives the value
of λ evaluated at q, the current state; a1 , the machine’s action; and a2 , the opponent’s
action. In Figure 4, we illustrate graphically the machine skeleton in the first row of the
table.
We draw attention to subject 1’s third skeleton in the table and note that this is also
a high posterior probability skeleton for subjects 4 and 6. It is similar to the Tit-For-Tat
skeleton of Figure 1 except that the states are reversed: the initial state is that associated
with an opponent’s previous defection, rather than the state associated with opponent’s
cooperation. Since both skeletons are equally deserving of the name Tit-For-Tat, we will
call them C-Tit-For-Tat and D-Tit-For-Tat, according to the initial state. We include
the C-Tit-For-Tat and D-Tit-For-Tat skeletons in Table 3 to allow comparison.
The four equally probable (within numerical standard error) subject 1 machine skeletons of Table 3 agree on six of eight transition function values. Together, they cover the
four possible combinations of the other two transition function values. It is difficult to
distinguish among these four machines given the play we observed during the experiment,
as we did not have any observations that would allow one to discriminate among these
possible transitions.
In general, Table 3 shows us that the various skeletons of subjects 1, 4, 6 and 11 are
similar to each other (and D-Tit-For-Tat), both within- and between-subject. Although
the posterior probability of any particular skeleton is moderate, the similarity among the
skeletons implies a much more likely, fairly narrow range of prototypical behavior.
25
Table 3: High posterior probability machine skeletons, with C-Tit-For-Tat and D-TitFor-Tat skeletons
Subject
Pr
1, c, C
1, d, C
1, c, D
1, d, D
2, c, C
2, d, C
2, c, D
2, d, D
0.108
2
2
1
1
2
1
1
1
0.108
2
2
1
1
2
1
1
2
0.108
2
2
1
1
2
2
1
1
0.108
2
2
1
1
2
2
1
2
0.093
2
2
2
1
2
2
1
1
0.093
1
2
2
1
2
2
1
1
0.074
1
1
1
1
0.045
2
2
1
1
2
2
1
2
0.025
2
2
1
1
2
2
1
1
0.050
2
2
1
1
2
2
1
1
0.050
2
2
2
1
2
1
1
1
0.049
2
2
1
1
2
2
1
2
0.049
2
2
1
1
2
1
1
2
0.049
2
2
1
1
2
1
1
1
0.049
1
2
2
1
2
2
1
1
0.048
2
2
2
1
2
2
1
1
0.060
2
2
2
1
2
2
1
1
0.060
2
2
2
1
2
1
1
1
TFT-C
1
1
2
2
1
1
2
2
TFT-D
2
2
1
1
2
2
1
1
1
4
6
11
26
Figure 4: A high probability machine skeleton for subject 1
(c, C), (d, C)
(c, D), (d, D)
1
2
(c, C)
(d, C), (c, D), (d, D)
8.2
Estimation
We first report the posterior distribution of a simple measure of the complexity of a
machine strategy, the number of states. More sophisticated measures of machine complexity are certainly possible, perhaps taking into account how many distinct next states
are possible in various current states.
We can easily estimate the posterior probability that a subject’s machine has |Q|
states. We simply count the numbers of |Q|-state machines in the posterior sample, and
divide by the total number of posterior draws. We can make the numerical standard
errors of these estimates as low as we like by setting the number of posterior draws.
Table 4 gives the discrete posterior probability distribution of the number of states
for all twelve subjects. It also gives the prior distribution to allow comparisons. All
numerical standard errors are less than 0.01.
We note several features of this distribution. For nine out of twelve subjects, the
posterior probability of the one state machine is exceedingly low. This is clear evidence
that these subjects react to previous play. We see some variation in complexity. Smallest
70%-high posterior probability sets for the 12 subjects are (in order) {2, 3}, {3, 4}, {2, 3},
{2, 3}, {2, 3}, {2}, {2, 3, 4}, {1, 2}, {3, 4}, {2, 3}, and {1}. We see that the posterior
distributions assign quite high probability to fairly simple strategies. Only for subjects
2, 5, 7 and 10 is the posterior probability of a four or five state machine greater than
0.25. Revealingly, three of these four machines are also the machines for which there is
strong (7 and 10) or moderate (2) evidence of a change in strategy over time.
We can approximate other moments of the posterior distribution in the same way:
an estimate of the population moment is the corresponding sample moment.
Our second estimation exercise looks at a machine’s initial probability p0 of cooperating, and the minimal and maximal probabilities pmin and pmax of cooperating. The
maxima and minima are taken over all machine states. Table 5 reports posterior moments relating to pmin . The first two columns show the mean and standard deviation
27
Table 4: Posterior distributions of complexity, by subject
Subject |Q| = 1 |Q| = 2 |Q| = 3 |Q| = 4 |Q| = 5
(prior)
0.369
0.369
0.185
0.062
0.015
1
0.000
0.451
0.426
0.106
0.017
2
0.000
0.181
0.369
0.342
0.109
3
0.000
0.419
0.416
0.134
0.030
4
0.074
0.551
0.277
0.081
0.017
5
0.001
0.360
0.359
0.202
0.078
6
0.000
0.782
0.180
0.033
0.005
7
0.000
0.250
0.392
0.277
0.081
8
0.592
0.261
0.105
0.035
0.008
9
0.000
0.674
0.252
0.062
0.012
10
0.000
0.054
0.652
0.242
0.052
11
0.000
0.612
0.292
0.080
0.015
12
0.704
0.228
0.055
0.011
0.002
of this quantity. The third column shows the posterior probability that the machine’s
probability of cooperating in the initial state is minimal. The fourth and fifth columns
show the mean and standard deviation of p0 − pmin , the maximal decrease in the probability of cooperation that can be induced by the actions of an opponent. Table 6 reports
analogous posterior moments relating to pmax .
For most subjects, there is a large difference between E[pmax ] and E[pmin ], indicating
a large degree of reactivity to opponents’ actions. The two subjects with low values for
[pmax − pmin ], 8 and 12, are the same two subjects whose machines have high posterior
probability of having only one state. Thus two results, one pertaining to the machine
skeleton and the other pertaining to action probabilities, both suggest that these two
subjects react weakly to the opponents’ actions.
Even among the subjects who react strongly to previous play, there is considerable
heterogeneity. We see this clearly in the statistics involving the initial state p0 . Some
subjects are wary, starting a supergame with a low probability of cooperation but apparently prepared to greatly increase this probability in reaction to suitable previous play.
Others are trusting, starting a supergame with a high probability of cooperation.
28
Table 5: Posterior moments involving minimum cooperation probabilities, by subject
Subject E[pmin ] std[pmin ] Pr[pmin = p0 ] E[p0 − pmin ] std[p0 − pmin]
1
0.049
0.036
0.969
0.001
0.010
2
0.177
0.132
0.021
0.645
0.185
3
0.067
0.049
0.953
0.003
0.016
4
0.073
0.044
0.870
0.011
0.039
5
0.077
0.067
0.457
0.153
0.176
6
0.011
0.013
0.984
0.000
0.005
7
0.052
0.050
0.146
0.369
0.209
8
0.531
0.122
0.783
0.048
0.118
9
0.178
0.087
0.000
0.807
0.089
10
0.049
0.043
0.015
0.486
0.153
11
0.104
0.050
0.856
0.025
0.073
12
0.009
0.011
0.946
0.001
0.009
Subjects 1 and 9 are clear examples. Both react strongly to previous play, but while
the probability that subject 1 begins in the state with minimal cooperation probability
is 0.969, the probability that subject 9 begins in the state with maximal cooperation
probability is 0.980. Figure 5 gives an idea of the whole distribution of p0 , pmin, and pmax
for these two subjects. The figure shows posterior scatterplots of both pmin versus pmax
(left) and p0 − pmin versus pmax − p0 . The left panels show how both subjects are similar:
the cooperation probabilities vary greatly over states. The right panels show how they
are different: subject 1 is quite trusting in the first state, while subject 9 is quite wary.
8.3
Prediction
Using our posterior samples of machines, we predict mean supergame payoffs for all
possible pairing of subjects 1, 8, 9, and 12. We define the mean supergame payoff
vector for a match between two machines as the expected value of the vector of total
accumulated payoffs at the end of the supergame. The expectation is over the random
supergame duration and the random actions of both machines. Since there is posterior
uncertainty about machines, there is posterior uncertainty about the mean supergame
payoff vector. Figure 6 gives a matrix of histograms for the posterior distribution of
29
Figure 5: Posterior scatterplot of pmin versus pmax (left) and p0 − pmin versus pmax − p0
(right) for subjects 1 (top) and 9 (bottom)
30
Table 6: Posterior moments involving maximum cooperation probabilities, by subject
Subject E[pmax ] std[pmax ] Pr[pmax = p0 ] E[pmax − p0 ] std[pmax − p0 ]
1
0.978
0.029
0.000
0.927
0.047
2
0.948
0.048
0.454
0.126
0.147
3
0.835
0.109
0.000
0.765
0.123
4
0.532
0.215
0.092
0.447
0.231
5
0.723
0.155
0.107
0.493
0.231
6
0.875
0.092
0.000
0.864
0.094
7
0.837
0.159
0.236
0.416
0.289
8
0.639
0.106
0.723
0.060
0.122
9
0.985
0.018
0.980
0.001
0.007
10
0.961
0.057
0.056
0.427
0.168
11
0.967
0.041
0.000
0.839
0.080
12
0.124
0.264
0.745
0.114
0.264
supergame payoffs. In rows 1 through 4, the machine playing as player 1 is 1, 8, 9 and
12, respectively. In columns 1 through 4, the machine playing as player 2 is 1, 8, 9 and 12,
respectively. Each histogram gives the mean supergame payoff to player 1. Vertical lines
mark the values 0, 300, 450 and 900. If a player who always defects plays a player who
always cooperates, the former obtains the maximal payoff of 900 and the latter obtains
the minimal payoff of 0. Two players who always defect receive an average payoff of 300;
two players who always cooperate, 450.
We have seen that subjects 1 and 9 react strongly to previous play and that they
differ greatly in terms of their initial cooperation probability. Subjects 8 and 12 are the
subjects for whom the posterior probability of the one-state machine skeleton is greater
than 0.5. Recall, though, that their cooperation probabilities are very different.
We point out a few features of this prediction exercise. Some predictions are more
uncertain than others. We have relatively good predictions of how subjects 9 and 12
would fare against replicas of themselves, on average. We have much less certainty about
subject 12’s average payoff against subjects 8 and 9.
Among subjects 1 (reactive but wary), 9 (reactive but trusting), and 12 (frequent
defector), mean payoffs are what we would expect. We have seen that subject 8 is
31
4
2
4
x 10
2
1
0
2
2
4
0
x 10
500
2
0
2
4
0
x 10
500
0
2
4
0
x 10
0
500
2
500
0
2
2
4
0
x 10
0
500
4
4
0
x 10
500
0
500
2
0
2
4
0
x 10
500
4
0
x 10
500
4
0
x 10
500
0
500
1
4
0
x 10
500
0
2
2
0
4
x 10
x 10
1
1
1
0
4
x 10
4
x 10
1
1
1
0
2
1
1
0
4
x 10
1
0
4
x 10
500
0
4
1
0
1
0
500
0
1
0
0
500
2
0
500
0
Figure 6: Histograms of row player mean supergame payoffs. Row and column players
are 1, 8, 9 and 12.
32
unique among subjects in showing little reactivity to previous play, but (unlike subject
12) defecting and cooperating frequently. This may seem like a strange strategy, but
we predict subject 8 to do quite well against subjects 1 and 9 on average, better than
these subjects do against their replicas. Only against subject 12, who defects with high
probability, does subject 8 do badly. It is possible that subject 8 is more concerned with
inducing future cooperation in reactive opponents than reacting to previous play.
8.4
Testing
We can use marginal likelihoods to compare a model with alternatives. Suppose we
have two models, M1 and M2 , with marginal likelihoods Pr[a|M1 ] and Pr[a|M2 ]. We can
construct a compound model to express uncertainty about the identity of the model, and
assign a priori probabilities Pr[M1 ] and Pr[M2 ] to represent this uncertainty. Whether
or not the compound model includes other models, the posterior odds ratio comparing
the two models is given by the following expression, derived from Bayes’ rule:
Pr[M1 |a]
Pr[M1 ] Pr[a|M1 ]
=
·
.
Pr[M2 |a]
Pr[M2 ] Pr[a|M2 ]
The second right-hand-side factor, the ratio of marginal likelihoods, is known as the
Bayes factor. Since each marginal likelihood gives the probability of the data without
reference to unknown parameters of the respective model, we can interpret the Bayes
factor as giving the relative out-of-sample prediction records of the two models: a value
greater than 1 implies that M1 has the better out-of-sample prediction record.
We have already seen that the marginal likelihood factors by subject, so we consider
each subject separately.
We will refer to the model described in Section 3 as “base.” We consider five different
alternatives to base.
The first is a model we will call “Breakpoint,” in which the subject changes machines
between rounds r and r + 1, where r is uniformly distributed between 6 and 19, inclusive.
We assume that actions before and actions after the breakpoint are independent. We also
assume that the pre-break and post-break machines are a priori independent with the
same prior distributions as in the basic model. We will interpret a comparison between
the basic model and this alternative as a test of machine stability across rounds.
33
The four remaining alternatives are models in which the machine skeleton is fixed, not
stochastic. Action probabilities remain random, and their prior distribution is unchanged.
We call these four models “Grim”, “C-Tit-For-Tat”, “D-Tit-For-Tat” and “Memoryless”.
Grim is a machine skeleton with two machine states, the initial state and an absorbing
state the machine enters the first time opponent defects. Thus, grim is the machine
skeleton that corresponds to the familiar “grim trigger” strategy. We stress that grim
is a machine skeleton: action probabilities are unrestricted. In particular, there is no a
priori information favoring cooperation in the initial state and defection in the absorbing
state. We have already seen C-Tit-For-Tat and D-Tit-For-Tat, which both remember
opponent’s previous action. They differ only in terms of the initial state. Memoryless
is the unique one-state machine skeleton, and it has the property that actions do not
depend on previous play in a supergame.
Table 7 shows the log Bayes factors in favor of each alternate model, for each subject.
Thus, the Bayes factor in favor of Breakpoint over the base model for subject 1 is B =
exp(−0.643). Suppose we have a compound model in which Breakpoint and the base
model both have a priori probability 0.5. Then the prior odds ratio equals 1 and the
posterior odds ratio equals the Bayes factor. The Breakpoint model and the base model
have a posteriori probability B/(B + 1) ≈ 0.345 and 1/(B + 1) ≈ 0.655, respectively.
We see that for subjects 7 and 10, there is strong evidence against machine stability:
if Breakpoint and Base are equally probable a priori, Breakpoint is more than 1000 times
probable a posteriori. For subject 2, there is moderate evidence against machine stability,
but with an odds ratio of about 16.22 in favor of a breakpoint, the base model is still
plausible. For the remaining nine subjects, machine stability is quite reasonable.
We now turn to the four machine skeleton alternatives. For most combinations of
subject and machine skeleton, the Bayes factor favors the base model, often decisively.
Furthermore, given that there is only one base model and several possible machine skeletons to consider, it makes sense to assign prior odds in favor of any particular machine
skeleton that are considerably less than one.
Even so, there is an intriguing result here. The D-Tit-For-Tat machine skeleton has
considerable empirical support. For four subjects (1,3,4,6), the Bayes factor in favor of it
over the base model exceeds 10. For another four subjects (5, 8, 11, and 12) D-Tit-ForTat is plausible. Of the remaining four subjects, C-Tit-For-Tat is plausible for two (2
34
Table 7: Log Bayes factors in favor of five alternatives, by subject
Subject Breakpoint Grim C-Tit-For-Tat D-Tit-For-Tat Memoryless
1
-0.643
-29.948
-20.546
4.246
-30.044
2
2.786
-2.986
0.661
-3.213
-9.825
3
-0.347
-13.439
-9.479
2.443
-12.022
4
0.367
-2.780
-1.214
2.769
-1.622
5
-0.582
-7.172
-3.476
-0.215
-6.431
6
-1.885
-19.668
-10.613
3.455
-19.179
7
8.306
-11.672
-8.061
-9.199
-10.818
8
0.655
-0.546
-0.199
-0.734
0.472
9
0.620
-18.496
3.914
-11.238
-22.074
10
7.877
0.475
-3.768
-8.361
-21.056
11
-0.688
-21.494
-11.558
1.527
-21.498
12
-1.608
-1.391
-1.443
-0.890
0.651
and 9) and two (7 and 10) are the two subjects for which machine stability is implausible
in any case.
9
Conclusions
We introduced a probabilistic machine suitable for likelihood-based inference for repeatedgame strategies. Making actions random and keeping state transitions deterministic
brings several advantages: it is relatively parsimonious, there is a direct interpretation
of machine strategies as behavioral strategies, and there is only one way that memory of
the history of play deteriorates.
We provided numerically efficient methods for inferring machine strategies given all
observed play associated with a machine. The key technical insight that makes inference
tractable is the recognition that action probabilities can be integrated out analytically
from the posterior distribution. This allowed us to separate the problem of inference for
machines into two much simpler problems: inference for machine skeletons and conditional inference for action probabilities given machine skeletons. We grant that the more
obvious solution of Gibbs sampling, with blocks for state transition functions and action
35
probabilities, also separates the problem into two simpler problems. However, the very
strong posterior dependence between the transition function and action probabilities implies very low numerical efficiency for the Gibbs approach. An additional advantage of
integrating out action probabilities is that we can compare our model with alternatives
that describe fixed machine skeletons, without reference to action probabilities.
Our empirical application illustrates some ways to report features of the posterior
distribution of machines. While we hope these are interesting in and of themselves, we
recognize that different researchers will be interested in different features of the posterior distribution. We hope, therefore, that these examples suggest the wide variety of
possibilities.
We learn that the strategy of most, but not all, subjects is plausibly stable after five
rounds. Most subjects react strongly to previous play, but differ considerably in terms of
their initial probability of cooperation. The play of most subjects is at least reasonably
well captured by either of two simple machines: a stochastic version of the traditional
Tit-For-Tat machine and a similar machine whose initial state is the “defect” state.
In future work, we hope to relax two assumptions that we made here to simplify an
already detailed paper. The first is that subjects play according to a single machine
throughout the sample used for inference. The second is that subjects’ machines are a
priori independent. We point out that some sort of within- or between-subject statistical
dependence is necessary in order to do meaningful inference: we cannot learn very much
about a machine using data from a single supergame. But the two assumptions above
represent only one possible way of introducing such dependence, and it is an extreme case:
they imply the strongest possible within-subject dependence and the weakest possible
between-subject dependence.
Fortunately, we will be able to consider much more flexible models of within- and
between-subject variation, and still be able to use our methods for machine inference.
We can do this using the following framework: there is a population of K machines, and
a model stochastically assigns one of these machines to each subject in each supergame.
Values of K less than the number of subjects ensure both within- and between-subject
dependence while still allowing for subject heterogeneity as long as K > 1. We can easily introduce additional within-subject dependence by introducing dependence between
machine assignments involving the same subject, but in different supergames.
36
Other ideas for future work involve non-uniform priors over state transition functions.
One possibility is to introduce a penalty for machine complexity by putting more prior
probability on less complex machines. A second is to introduce a theoretical element by
putting higher prior probability on machines that have higher expected payoffs playing
its machine opponents.
The main goal of this work is to sensibly link the behavior we observe when thoughtful agents play games to the underlying “strategic programs” that might direct such
behavior. Here we developed a statistically appropriate—and computationally feasible—
technique for uncovering the strategic “machines” that can account for the observed
behavior. This method provides a new window from which to understand better the
behavior of subjects in strategic situations. In a simple repeated Prisoner’s Dilemma experiment, we found that humans tended to employ fairly stable, heterogeneous strategies
(with some notable overlaps), encapsulated by relatively simple machines. This ability to
illuminate the strategic programs used by subjects should provide new insights into actual behavior, as well as suggest new theoretical directions. Moreover, the mere existence
of strategic ghosts in these machines raises interesting questions about how presumably
complex strategic reasoning process can become embedded in such simple programs.
A
Counting machines
We describe here how to compute the number n(|A|, |Q|) of maps λ : A × Q → Q that
are regular. We first derive a recursive expression for the number n̄(|A|, |Q|) of maps
satisfying condition 2 (no unreachable states) but not necessarily condition 3 (order of
non-initial states). The total number of maps is |Q||A||Q|. The number of maps with
exactly m unreachable states is
!
|Q| − 1
m
· n̄(|A|, |Q| − m) · |Q|m|A| .
The first factor gives the number of choices of m unreachable states out of |Q| − 1 noninitial states. The second factor gives the number of maps on the |Q| − m reachable
states where all states are indeed reachable. The third factor gives the number of maps
A × Q∗ → Q, where Q∗ is a set of m unreachable states. For |Q| = 1, the total number
of maps is 1, and for this map, all states are reachable. Therefore n̄(|A|, 1) = 1. For
37
|Q| > 1, we can calculate recursively
|Q|−1
n̄(|A|, |Q|) = |Q||A||Q| −
X
|Q| − 1
m
m=1
!
· n̄(A, |Q| − m) · |Q|m|A| .
We now obtain n(|A|, |Q|) by dividing by the number of permutations of the non-initial
states:
n(|A|, |Q|) =
B
n̄(|A|, |Q|)
.
|Q − 1|!
Detailed Descriptions of Blocks
We describe here in detail the three stochastic update blocks that we use to update the
skeleton (Q, λ). All blocks preserve the marginal posterior distribution of the skeleton.
Each block takes as input a skeleton (Q, λ) and randomly generates a skeleton (Q′ , λ′).
Block BQ,λ
Block BQ,λ is an independence Metropolis-Hastings update. The first step is to draw
(Q∗ , λ∗ ) from a proposal distribution. We draw Q∗ from the distribution with probability mass function fP (Q∗ ) = ϑ|Q∗ | , which we can tailor to approximate the posterior
distribution Q|a. In practice, we use state number counts during a burn-in period to
determine the probabilities ϑ|Q| . After we draw Q∗ , we draw λ∗ from the (uniform) prior
distribution λ|Q = Q∗ .
The next step is to randomly accept or reject the proposal. Acceptance means we set
′
Q = Q∗ and λ′ = λ∗ . Rejection means we set Q′ = Q and λ′ = λ. Thus, if (Q∗ , λ∗ ) is
rejected, the skeleton remains unchanged.
In order for the block to preserve the distribution Q, λ|a, we accept the proposal with
probability
f (Q∗ , λ∗ |a)
fP (Q)f (λ|Q)
min 1,
.
·
f (Q, λ|a) fP (Q∗ )f (λ∗ |Q∗ )
The normalization constants for f (λ|Q), which we do not know, cancel, and we can
evaluate the Hastings ratio, the second argument of the min function, as follows:
P
Q
Γ( a ∈A ν+c(ai ,q;λ∗ ))
i
i
∗
Q
∗
q∈Q∗
θ|Q| ϑ|Q∗ |
f (Q∗ , λ∗ |a)
fP (Q)f (λ|Q)
[Γ(ν)]|Ai |(|Q |−|Q|)
ai ∈Ai Γ(ν+c(ai ,q;λ ))
P
·
.
·
=
·
∗ |−|Q|
Q
Γ(
ν+c(a
,q;λ))
∗
∗
∗
|Q
i
f (Q, λ|a) fP (Q )f (λ |Q )
[Γ(ν|Ai |)]
θ|Q∗ | ϑ|Q|
Q ai ∈Ai
q∈Q
38
ai ∈Ai
Γ(ν+c(ai ,q;λ))
′
Block Bλ,Q
′
Block Bλ,Q
is also an independence Metropolis-Hastings update, but features a proposal
distribution more closely resembling the posterior distribution Q, λ|a. This increases the
acceptance probability, but it takes longer to draw a proposal. Experience suggests that
′
it is sometimes optimal to use block BQ,λ, sometimes BQ,λ
and sometimes both BQ,λ and
′
BQ,λ
.
Again, the first step is to draw a proposal (Q∗ , λ∗ ). As before, we draw Q∗ from the
distribution with probability mass function fP (Q∗ ) = ϑ|Q∗ | .
We construct λ∗ stochastically, one value λ∗ (a, q) at a time in q-then-a lexicographic
order. Each λ∗ (a, q) has a distribution over Q favoring compatibility of the emerging λ∗
with the data. We measure the data compatibility of a incompletely specified transition
function, with values given up to (a, q), by constructing a completion λ̄a,q on an infinite
state set and then evaluating
P
Y [Γ(ν)]|Ai | Γ( a ∈A ν + c(ai , q; λ̄))
h(Q, λ̄; a) ≡
,
·Q i i
[Γ(ν|A
|)]
Γ(ν
+
c(a
,
q;
λ̄))
i
i
a
∈A
i
i
q∈N
which we take as our measure of data compatibility.
Recalling equation (15), we recognize h(Q, λ̄; a) as the factor of f (Q, λ|a) depending
on λ. We can also interpret it as the likelihood factor L(Q, λ, µ; a) with µ integrated
with respect to the prior.
The completion λ̄a,q : N × A is a state transition function on an infinite state set,
defined as


λ(a′ , q ′)
(a′ , q ′ ) ≤ (a, q)



λ̄a,q (a′ , q ′) = |Q∗ | + 1
(a′ , q ′ ) = (a, q)+



λ((a′ , q ′ ) ) + 1 (a′ , q ′ ) > (a, q) ,
−
+
where ≤, =, and > are derived from the q-then-a lexicographic order2 and (a, q)+ denotes
the successor of (a, q). We see that λ̄a,q agrees with λ for arguments up to (a, λ). That is
the sense in which is it a completion. In terms of its other arguments, λ̄a,q is maximally
unstructured, in the following sense: once the machine encounters an action profile a′
in a machine state q ′ , where (a′ , q ′ ) > (a, q), the machine never revisits a state it has
already been in.
2
Precisely, (a, q) > (a′ , q ′ ) iff (q > q ′ ) or (q = q ′ ∧ a > a′ )
39
The value λ∗ (a, q) of the proposal λ∗ is chosen with the following probabilities.


∞
q ∗ = q + 1, a = |A|, max(a′ ,q′)<(a,q) λ∗ (a′ , q ′ ) = q



Pr[λ∗ (a, q) = q ∗ ] ∝ h(Q, λ̄; a) q ∗ ∈ {1, . . . , max(a′ ,q′ )<(a,q) λ∗ (a′ , q ′) + 1}



0
otherwise.
The value ∞ forces us to choose λ∗ = q + 1 if otherwise the resulting machine would have
unreachable states. The support {1, . . . , max(a′ ,q′ )<(a,q) λ∗ (a′ , q ′ ) + 1} ensures that the
non-initial states are correctly ordered, by preventing new state values to be introduced
out of sequence. Note that with a finite amount of data, the action counts c(ai , q; λ̄) will
be non-zero only for a finite number of states q ∈ N, and thus only a finite number of
factors of h(Q, λ̄; a) will take on values other than one.
The conditional probability mass function for the proposal λ∗ , given Q∗ , is therefore
Y
h(λ∗ |Q∗ , a) ≡
Pr[λ∗ (a, q)]
(a,q)∈A×Q
To ensure the block preserves the distribution Q, λ|a, we accept (Q∗ , λ∗ ) with probability


P
Q
Γ( a ∈A ν+c(ai ,q;λ∗ ))
i
i
∗
Q
∗
q∈Q∗
h(λ|Q, a) θ|Q| ϑ|Q∗ | 
 [Γ(ν)]|Ai |(|Q |−|Q|)
ai ∈Ai Γ(ν+c(ai ,q;λ ))
P
min 1,
·
·
. .
·
∗ |−|Q|
Q
|Q
Γ(
ν+c(a
,q;λ))
i
[Γ(ν|Ai |)]
h(λ∗ |Q∗ , a) θ|Q∗ | ϑ|Q|
Q ai ∈Ai
q∈Q
ai ∈Ai
Γ(ν+c(ai ,q;λ))
Block Bλ
Block Bλ is a random walk Metropolis update which proposes a λ∗ identical to λ except
for a random mutation. We first determine the position of the mutation: we draw a
random pair (a, q) from the uniform distribution on A × Q. We then mutate the value
at (a, q): we draw λ∗ (a, q) from the uniform distribution on Q\{λ(a, q)}. For all other
pairs (a′ , q ′ ), we set λ∗ (a′ , q ′ ) = λ(a′ , q ′ ). Then if necessary, we permute the non-initial
states so that λ∗ satisfies the ordering condition for non-initial states.
The proposal is a random walk on the space of transition functions satisfying condition
2 (order of non-initial states) but not necessarily 3 (no unreachable states). We accept
the proposal λ∗ with probability
 Q

min 1,
P
Γ( a ∈A ν+c(ai ,q;λ∗ ))
i
i
Q
∗
q∈Q
ai ∈Ai Γ(ν+c(ai ,q;λ ))
P
Q
Γ( a ∈A ν+c(ai ,q;λ))
i
i
Q
q∈Q
a ∈A Γ(ν+c(ai ,q;λ))
i
i
40


· 1Λ(A,Q) (λ∗ ) .
The indicator function 1Λ(A,Q) , which comes from the prior on λ, means we reject λ∗ with
certainty if it has unreachable states.
C
Computing Marginal Likelihood Factors
We show here how to use the Gelfand and Dey (1994) procedure to compute the marginal
likelihood factors ψis . In order to use this procedure, we construct a properly normalized
ˆ λ). For efficiency, we choose a probability mass function
probability mass function f(Q,
resembling f (Q, λ|a) and having “thin tails” relative to it. We take fˆ to be the truncation
of f (Q, λ|a) to the parameter subspace where |Q| ≤ Q̄:
fˆ(Q, λ) ≡ f (Q, λ|a, |Q| ≤ Q̄).
We choose Q̄ so that it is feasible to compute the normalization constant of this probability mass function by summing over all regular state transition functions with up to Q̄
states. For a two player game with two possible actions each, such as Prisoners’ Dilemma,
|A| = 4. Since n(|A|, |Q|) = 343, 000 for |A| = 4 and |Q| = 3, truncation to Q̄ = 3 states
is quite feasible.
The reciprocal of the marginal likelihood factor ψis is approximated by the following
posterior sample moment:
M
ˆ j , λj )
1 X
f(Q
,
Q
QTσ
M j=1 σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Qj , λj )
where M is the size of the posterior sample and (Qj , λj )j∈{1,...,M } are the posterior draws
of (Q, λ) generated by our Markov chain. We ignore the posterior draws of µ.
Provided that our posterior simulation chain is ergodic and has Q, λ, µ|a as its invariant distribution, this sample moment converges almost surely to its population counter-
41
part:
M
1 X
fˆ(Qj , λj )
Q
QTσ
M j=1 f (Qj ) · f (λj |Qj ) · σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Qj , λj )
fˆ(Q, λ)
QTσ
f (Q) · f (λ|Q) · σ∈Σ t=1 f (aitσ |a1σ , . . . , at−1,σ , Q, λ)
X
fˆ(Q, λ) · f (Q, λ|a)
=
Q
QTσ
f
(Q)
·
f
(λ|Q)
·
t=1 f (aitσ |a1σ , . . . , at−1,σ , Q, λ)
σ∈Σ
|Q|≤Q̄,λ∈Λ(A,Q)
X
1
ˆ λ)
f(Q,
= Q
QTσ
f
(a
|a
,
.
.
.
,
a
)
itσ 1σ
t−1,σ |Q|≤Q̄,λ∈Λ(A,Q)
t=1
σ∈Σ
→ E[
=
Q
1
Q
σ∈Σ
QTσ
t=1
f (aitσ |a1σ , . . . , at−1,σ )
,
which is indeed the reciprocal of the marginal likelihood factor ψis . We compute the
marginal likelihood f (a) as the product over all players i ∈ N and subjects s ∈ Si of
these marginal likelihood factors.
References
Abreu, D., and Rubinstein, A. (1988). ‘The Structure of Nash Equilibrium in Repeated
Games with Finite Automata’, Econometrica, 56: 1259–1281.
Aoyagi, M., and Frechette, G. (2004). ‘Collusion in Repeated Games with Imperfect
Monitoring’, Working paper, Harvard Business School.
Aumann, R. J. (1981). ‘Survey of Repeated Games’, in R. J. Aumann (ed.), Essays
in Game Theory and Mathematical Economics in Honor of Oscar Morgenstern, pp.
11–42. Bibliographisches Institut, Zurich.
Axelrod, R. (1984). The Evolution of Cooperation. Basic Books, New York.
Ben-Porath, E. (1993). ‘Repeated Games with Finite Automata’, Journal of Economic
Theory, 59(1): 17–32.
Bernardo, J. M., and Smith, A. F. M. (1994). Bayesian Theory. John Wiley and Sons,
Chichester, England.
42
Binmore, K., and Samuelson, L. (1992). ‘Evolutionary stability in repeated games played
by finite automata’, Journal of Economic Theory, 57: 278–305.
Camera, G., and Casari, M. (2007). ‘Cooperation among strangers: an experiment with
indefinite interaction’, Working Paper 1201, Krannert School of Management, Purdue
Univerity.
Dal Bo, P., and Frechette, G. (2007). ‘The Evolution of Cooperation in Infinitely Repeated
Games: Experimental Evidence’, Working paper, New York University, Department
of Economics.
Duffy, J., and Ochs, J. (2006). ‘Cooperative Behavior and the Frequency of Social Interaction’, Working paper 274, University of Pittsburgh, Department of Economics.
El Gamal, M., and Grether, D. (1995). ‘Are People Bayesian? Uncovering Behavioral
Strategies’, Journal of the American Statistical Association, 90: 1127–1145.
Engle-Warnick, J. (2007). ‘Five Indefinitely Repeated Games in the Laboratory’, Série
Scientifique, Centre interuniversitaire de recherche en analyse de organisations, no.
2007s-11.
Engle-Warnick, J., and Slonin, R. L. (2006). ‘Inferring repeated-game strategies from
actions: evidence from trust game experiments’, Economic Theory, 28: 603–632.
Gelfand, A. E., and Dey, D. K. (1994). ‘Bayesian Model Choice: Asymptotics and Exact
Calculations’, Journal of the Royal Statistical Society Series B, 56: 501–514.
Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. Wiley, Hoboken,
New Jersey.
Kalai, E., and Stanford, W. (1988). ‘Finite Rationality and Interpersonal Complexity in
Repeated Games’, Econometrica, 56(2): 387–410.
Koop, G. (2003). Bayesian Econometrics. John Wiley & Sons, Chichester, England.
McKelvey, R. D., and Palfrey, T. R. (1995). ‘Quantal Reponse Equilibria for Normal
Form Games’, Games and Economic Behavior, 10: 6–38.
43
Miller, J. (1996). ‘The Coevolution of Automata in the Repeated Prisoner’s Dilemma’,
Journal of Economic Behavior and Organization, 29: 87–112.
Neyman, A. (1985). ‘Bounded Complexity Justifies Cooperation in the Finitely Repeated
Prisoner’s Dilemma’, Economics Letters, 19: 227–229.
Norman, H., and Wallace, B. (2004). ‘The Impact of the Termination Rule on Cooperation
in a Prisoner’s Dilemma Experiment’, Discussion Paper dp04/11, University of London,
Royal Holloway.
Rubinstein, A. (1986). ‘Finite Automata Play the Repeated Prisoner’s Dilemma’, Journal
of Economic Theory, 39: 83–96.
Selten, R., Mitzkewitz, G., and Urlich, R. (1997). ‘Duopoly Strategies Programmed by
Experienced Traders’, Econometrica, 65: 517–555.
44