Adaptive Learning Models
Christos A. Ioannou
Motivation
I
Which models describe human behavior best?
I
Why is this important?
I
Design effective and efficient market mechanisms.
I
Apply counterfactual analysis.
I
Refine strategy sets.
Types of Adaptive Action Learning Models
I
I
I
Belief-based Models
I
Fictitious Play - Brown (1951)
I
Cournot Best Response - Cournot (1960)
I
γ-Weighted Beliefs - Cheung & Friedman (1997)
Reinforcement-based Models
I
Cumulative Reinforcement - Harley (1981), Erev & Roth
(1995, 1998)
I
Averaged Reinforcement - Mookerjhee & Sopher (1994, 1997)
Hybridized Models
I
Experience Weighted Attraction (EWA) - Camerer & Ho
(1999)
I
Inertia, Sampling And Weighting (I-SAW) - Nevo & Erev
(2012)
Preliminaries
I
The set of players is denoted by I = {1, ..., n}.
I
Each player i ∈ I has an action set denoted by Ai . An action
profile a = (ai , a−i ) consists of the action of player i and the
actions of the other players, denoted by
a−i = (a1 , ..., ai−1 , ai+1 , ..., an ) ∈ A−i .
I
In addition, each player i has a real-valued, stage-game, payoff
function gi : A → R, which maps every action profile a ∈ A
into a payoff for i, where A denotes the cartesian product of
I
I
I
the action spaces Ai , written as A ≡ × Ai .
i=1
j
The indicator function I ai , ai (t) equals 1 if aij = ai (t) and
0 otherwise.
In the infinitely-repeated game with perfect monitoring, the
stage game in each time period t = 0, 1, ... is played with the
action profile chosen in period t publicly observed at the end
of that period.
Preliminaries
I
The history of play at time t is denoted by
ht = (a0 , ..., at−1 ) ∈ At , where ar = (a1r , ..., anr ) denotes the
actions taken in period r .
I
The set of histories is given by
∞
[
H=
At ,
t=0
where we define the initial history to the null set A0 = {∅}.
I
A strategy si ∈ Si for player i is, then, a function si : H → Ai ,
where the strategy space of i consists of Ki discrete strategies;
that is, Si = {si1 , si2 , ..., siKi }.
I
The set of joint-strategy profiles is denoted by
S = S1 × · · · × Sn .
I
Each player i has a payoff function πit : S → R, which
represents the average payoff per period.
Experience Weighted Attraction (EWA)
I
Attraction for player i’s action is updated as follows:
N (t) = ρ · N (t − 1) + 1,
Aji (t) =
I
h
i φ · N (t − 1) Aji (t − 1) + δ + (1 − δ) I aij , ai (t) gi aij , a−i (t)
N (t)
I
φ - discount factor on previous attractions
I
ρ - discount factor on previous experience
I
δ - weight on forgone (hypothetical) payoffs
Players choose an action each period with
j
Pij
e λ·Ai (t)
(t + 1) = Pm
i
k=1 e
λ·Aki (t)
.
Nested Models
I
φ
δ
ρ
N(0)
Learning Model
1
1
1
-
Fictitious Play
0
1
0
-
Cournot Best-Response
φ ∈ (0, 1)
1
φ
-
Weighted Fictitious Play
φ ∈ [0, 1]
0
0
1
Cumulative Reinforcement
φ ∈ [0, 1]
0
φ
1
1−φ
Averaged Reinforcement
EWA fits experimental data better than belief-learning and
reinforcement-learning models in:
I
constant-sum games with a unique mixed equilibrium,
I
coordination games with multiple Pareto-ranked equilibria, and
I
beauty contest games with a unique dominance-solvable
equilibrium.
Self-Tuning EWA
I
Attraction for player i’s action is updated as follows:
N (t) = φi (t) · N (t − 1) + 1,
Aji (t) =
h
i φi (t) · N (t − 1) Aji (t − 1) + δij + 1 − δij I aij , ai (t) gi aij , a−i (t)
N (t)
I
The model makes parameters φ, δ, and ρ self-tuning functions.
I
Players choose an action each period with
j
e λ·Ai (t)
Pij (t + 1) = Pmi λ·Ak (t) .
i
k=1 e
Self-Tuning Functions
I
I
The attention function, δij (t), is
(
1 if gi aij , a−i (t) ≥ gi (t)
j
δi (t) =
0 otherwise.
The decay function, φi (t), consists of
I
the cumulative history vector across the other players’ actions
k, which records the historical frequencies (including the last
period t) of the choices by other players
t
k
σ(a−i
, t) =
I
so that the surprise index is
Si (t) =
I
1X
k
I a−i
, a−i (τ ) ,
t τ =1
m−i
X
k=1
k
k
(σ(a−i
, t) − r (a−i
, t))2 .
Finally, the decay function is
1
φi (t) = 1 − Si (t).
2
Inertia, Sampling and Weighting (I-SAW)
I
I-SAW is an instance-based model, which allows for three
response modes: exploration, inertia and exploitation.
I
In each period, a player enters one of the modes with different
probabilities.
I
There are n players in the game.
I
Each player has a set of parameters (pA , εi , πi , µi , ρi , ωi ).
I
The parameter pA ∈ [0, 1] is the same for all agents.
I
The other parameters are idiosyncratic with εi ∼ U [0, ε],
πi ∼ U [0, π], µi ∼ U [0, µ], ρi ∼ U [0, ρ] and ωi ∼ U [0, ω].
Inertia, Sampling and Weighting (I-SAW)
I
For simplicity, assume two actions.
I
Player i’s action set is Ai = {A, B}.
I
Let ait be the action
n of player i that was
o played in period t,
t1 t1 +1
t2
where hi (t1 , t2 ) = ai , ai , . . . , ai for t1 ≤ t2 .
I
t be the actions of players other than i in
Similarly, let a−i
n
o
t1 +1
t2
t1
, a−i
, . . . , a−i
for
period t, where h−i (t1 , t2 ) = a−i
t1 ≤ t2 .
I
We explain next the three response modes.
Schematic of I-SAW
New
Period
1 − εi
No
Explore
No
Inertia
πiSurprise
εi
Start
1 − πiSurprise
Explore
Inertia
Calcuate
ESV
Exploration
I
In exploration, each player chooses action A with probability
pA and action B with probability 1 − pA . The probabilities are
the same for all players.
Inertia
I
A player might enter the inertia mode after period 2 with
Surprise(t)
probability πi
, where πi ∈ [0, 1] and
Surprise(t) ∈ [0, 1]. Specifically,
Gap(t) =
2
2
X
X
1
|Obtainedj (t − 1) − Obtainedj (t)| +
|GMeanj (t) − Obtainedj (t)|
4
j=1
j=1
Surprise(t) =
Gap(t)
MeanGap(t) + Gap(t)
1
1
MeanGap(t + 1) = MeanGap(t)(1 − ) + Gap(t)
r
r
where r is the expected number of trials in the experiment.
Exploitation
I
In exploitation trials, an individual selects the action with the
highest Estimated Subjective Value (ESV).
I
To determine the ESV, player i randomly selects µi elements
from h−i (0, t − 1) with replacement; let us call this set
M−i (0, t − 1).
I
This set is chosen according to the following: with probability
t−1
ρi player chooses a−i
and with probability 1 − ρ−i player
chooses uniformly over h−i (0, t − 1).
I
The same set M−i is used for each ai ∈ Ai .
I
The sample mean for action ai0 is then defined as
X
1
gi ai0 , a−i .
SampleM(ai0 , t) =
|M−i (0, t − 1)|
a−i ∈M−i (0,t−1)
Exploitation
I
The GrandM(ai0 , t) is defined as
GrandM(ai0 , t) =
I
1
|h−i (0, t − 1)|
X
a−i ∈h−i (0,t−1)
gi ai0 , a−i .
Then, player i’s ESV of action ai0 is
ESV ai0 = (1 − ωi ) · SampleM(ai0 , t) + ωi · GrandM(ai0 , t),
where ω is the weight assigned on the payoff based on the
entire history (GrandM) and 1 − ω is the weight assigned on
the payoff based on the sample from the history (SampleM).
I
Then, the player simply chooses the ai0 that maximizes ESV
(and chooses randomly in ties).
γ-Weighted Beliefs
I
A player updates his beliefs on the opponent’s actions with
parameter γ; that is,
I ak = a (t) + Pt−1 γ t−r I ak = a (r )
−i
−i
r =1
−i
−i
k
bi a−i , t =
.
Pt−1 t−r
1 + r =1 γ
I
Setting γ = 0 yields the Cournot learning rule, and setting
γ = 1 yields fictitious play.
I
We have adaptive learning when 0 < γ < 1.
I
He then calculates the expected payoff of action aij as
Ei (aij , t) =
m−i
X
k=1
k
k
bi a−i
, t gi aij , a−i
.
γ-Weighted Beliefs
I
Finally, an action is selected via the logit specification with
parameter λ; that is, the probability of choosing action aij in
period t + 1 is
Pi
aij , t
j
e λ·Ei (ai ,t )
.
+1 = P
k
mi
e λ·Ei (ai ,t )
k=1
Why Not Repeated-Game Strategy Learning
Models?
Incorporating a richer specification of strategies is
important because stage-game strategies are not always
the most natural candidates for the strategies that
players learn about.
Camerer and Ho (1999)
1. The set of possible strategies in repeated games is infinite.
2. Beliefs become more complex because several different
strategies can lead to the same history.
3. Repeated-game strategies need several periods to be
evaluated.
A Generalized Approach
I
We propose novel rules to facilitate operability of belief
learning models with repeated-game strategies.
I
The impact of the rules is assessed on a self-tuning Experience
Weighted Attraction model, a γ-Weighted Beliefs model and
an Inertia, Sampling and Weighting model.
I
The predictions of the three models with strategy learning are
validated with experimental data in four symmetric games.
I
We also validate the predictions of their respective models
with actions.
I
The strategy learning models approximate subjects’ behavior
substantially better than their respective action learning
models.
Deterministic Finite Automata (DFA)
A Moore Machine is a four-tuple Qi , qi0 , fi , τi where
I
Qi is a finite set of states,
I
qi0 is the initial state,
I
fi : Qi → A is an output function, and
I
τi : Qi × A−i → Qi is a transition function.
A
q1
A
B
B
q2
AB
Start
Qi = {q1, q2}
Start
I
qi0 = q1
A
fi (q) =
B
q1
τi (q, a−i) =
q2
if q = q1
if q = q2
if (q, a−i) = (q1, A)
otherwise
Fitness Function
Define the fitness function F : S−i × N → [0, Ti (χ)] as
n
o
F (s−i , χ) = max t 0 |s−i is consistent with hTi (χ) for the last t 0 periods .
I
I
Define the belief function B : S−i × N → [0, 1] as
F(s−i , χ)
,
B(s−i , χ) = X
F(r , χ)
r ∈S−i
I
which can be interpreted as player i’s belief that the other
player was using strategy s−i at the end of block χ.
Asynchronous Updating of Strategies
I
Players update their strategies with the completion of a block
of periods.
I
The probability of updating depends on the expected block
length, which is calculated at the end of each period.
I
The probability that player i updates his strategy in period t,
1
Pit , is therefore determined endogenously via the expected
length of the block term, Pit , which is updated recursively:
Pit = Pit−1 −
1
Pit−1
Pt−1
si (χ(t))
1
s s
g
(a
,
a
)
−
E
(χ(t))
t−t(χ(t))
i
i
−i
i
s=t(χ(t))
ḡ − g
.
Self-tuning EWA with Strategies
The model consists of two variables:
1. Ni (χ) is interpreted as the number of observation-equivalents
of past experience in block χ of player i, and
2. Aji (χ) indicates player i’s attraction to strategy j after the χth
block of periods.
The evolution of learning over the χth block with χ ≥ 1 is
governed by the following rules:
and
Aji
Ni (χ) = φi (χ) · Ni (χ − 1) + 1,
φi (χ) · Ni (χ − 1) · Aji (χ − 1) + I(sij , si (χ)) · Ri (χ) + δij (χ) · Eij (χ)
.
(χ) =
Ni (χ)
I
Aji (χ) =
I
Attractions are updated by
φi (χ) · Ni (χ − 1) · Aji (χ − 1) + I(sij , si (χ)) · Ri (χ) + δij (χ) · Eij (χ)
.
φi (χ) · Ni (χ − 1) + 1
I
χ is the block index,
I
Ri (χ) is the reinforcement payoff from block χ,
I
Eij (χ) is the forgone payoff from block χ,
I
δij (χ) is the self-tuning attention function, and
I
φi (χ) is the self-tuning decay function.
Players choose strategy at the beginning of block χ + 1 with
j
Pji (χ
e λ·Ai (χ)
+ 1) = PK
.
λ·Aki (χ)
k e
Reinforcement and Foregone Payoffs
I
Reinforcement Payoffs
Ri (χ) =
X
1
gi (a)
Ti (χ)
a∈h(χ)
I
Forgone Payoffs
I
Define the fitness function
F (s−i , χ) = max {t 0 |s−i is consistent with h(χ) for the last t 0 periods} .
I
The belief of player i that the other played s−i when the
history was h and player i played si , is
F(s−i , χ)
.
B(s−i , χ) = X
F(r , χ)
r ∈S−i
I
The forgone payoff is
Eij (χ) =
X
s−i ∈S−i
πi (sij , s−i ) · B(s−i , χ).
Self-tuning Functions
I
The attention function is
1 if Eij (χ) ≥ Ri (χ) and sij 6= si (χ)
j
δi (χ) =
0 otherwise.
I
The decay function, φi (χ), consists of
I
the averaged belief which is
χ
σ(s−i , χ) =
1X
B(s−i , j),
χ
j=1
I
so that the surprise index is
X
Si (χ) =
(σ(s−i , χ) − B(s−i , χ))2 .
s−i ∈S−i
I
The decay function is
1
φi (χ) = 1 − Si (χ).
2
γ-Weighted Beliefs with Strategies
I
The attractions in this model evolve according to the
following two rules and parameter γ:
Ni (χ) = γ · Ni (χ − 1) + 1,
and
Aji (χ) =
I
γ · Ni (χ − 1) · Aji (χ − 1) + Eij (χ)
.
Ni (χ)
A player chooses strategy at the beginning of block χ + 1 with
j
Pji (χ
e λ·Ai (χ)
+ 1) = PK
.
λ·Aki (χ)
k e
I-SAW with Strategies
I
Exploration: A strategy is randomly selected.
I
Inertia: The probability is (1 −
I
Exploitation: The grand mean is
χ
1X j
Ei (k).
GrandMi s j , χ =
χ
1
)
Pit
rather than πiSurprise .
k=1
Let Mi (χ) be a set of µi numbers drawn with replacement
from {1, 2, . . . , χ}. Then, the sample mean is
X
1
SampleMi s j , χ =
Eij (k).
|Mi (χ)|
k∈Mi (χ)
Finally, the ESV is calculated as
ESVi s j , χ = (1 − ωi )·SampleMi s j , χ +ωi ·GrandM s j , χ .
Simulations
I
I
I
I
In the first (pre-experimental)
phase, players engage in a
lengthy process of learning
among strategies.
A
B
A
3,3
1,4
B
4,1
2,2
Each pair of agents stay
matched until the average
payoff of the given pair
converges.
The second (experimental)
phase consists of a fixed-pair
matching of 100 periods.
The results are averaged over
1,000 simulations.
A
B
A
1,1
2,4
B
4,2
1,1
Prisoner’s Dilemma
A
B
A
3,3
0,2
B
2,0
1,1
Stag-Hunt
Battle of the Sexes
A
B
A
3,3
1,4
B
4,1
0,0
Chicken
Strategy Set
A
AB
AB
AB
B
B
q2
AB
A
B
Automaton 20
B
B
q2
B
Automaton 21
Automaton 22
A
B
A
q1
AB
B
q2
B
Automaton 25
A
q1
AB
B
q2
A
Automaton 26
Start
A
Start
A
q1
A
q1
B
A
A
q1
B
B
q2
Start
Start
Start
A
Automaton 18
AB
B
q2
A
q1
AB
B
q2
A
Automaton 23
Start
Start
B
q2
Start
Start
A
A
Start
Automaton 19
A
A
q1
Start
B
Start
A
Start
B
Start
Automaton 17
A
Start
Automaton 16
AB
B
q2
Start
AB
A
q1
Automaton 15
B
B
q2
Start
B
q2
B
A
q1
AB
Automaton 12
B
A
q1
Automaton 14
B
q2
A
q1
B
A
B
Start
A
B
q2
Automaton 13
A
q1
Start
Start
Start
Start
AB
Automaton 11
A
B
q2
Start
A
q1
A
q1
B
q2
AB
AB
Automaton 10
Start
B
q2
Start
AB
B
q2
Start
B
Automaton 9
Start
Start
A
q1
A
B
Automaton 6
Start
A
A
q1
Start
B
q2
AB
B
q2
Start
AB
A
q1
A
Start
Automaton 8
A
Start
B
B
A
q1
A
B
q2
Automaton 5
Start
B
A
q1
B
A
Start
B
B
q2
B
A
q1
AB
Start
A
B
B
q2
Automaton 4
Start
A
q1
Start
B
q2
Start
B
Start
Automaton 3
A
A
B
A
Automaton 2
A
q1
A
q1
Automaton 24
Start
A
AB
Automaton 7
Start
A
B
q2
Automaton 1
B
Start
B
Start
A
q1
Start
Start
AB
Start
Start
AB
B
q1
Start
Start
Start
Start
AB
A
q1
Prisoner’s Dilemma
γ-WB
STEWA
Choice 2
I-SAW
Choice 2
Choice 2
25
25
25
23
23
23
21
21
21
19
19
19
17
17
17
15
15
15
13
13
13
11
11
11
9
9
9
7
7
7
1
1
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
4
4
3
3
3
2
2
2
1
1
1
2
3
4
0
0
5
7
9 11 13 15 17 19 21 23 25
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
3
Payoff 2
4
0
Choice 1
3
Choice 1
Choice 1
5
0
0
1
2
3
4
Battle of the Sexes
γ-WB
STEWA
Choice 2
I-SAW
Choice 2
Choice 2
25
25
25
23
23
23
21
21
21
19
19
19
17
17
17
15
15
15
13
13
13
11
11
11
9
9
9
7
7
7
1
1
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
4
4
3
3
3
2
2
2
1
1
1
2
3
4
0
0
5
7
9 11 13 15 17 19 21 23 25
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
3
Payoff 2
4
0
Choice 1
3
Choice 1
Choice 1
5
0
0
1
2
3
4
Stag-Hunt
γ-WB
STEWA
Choice 2
I-SAW
Choice 2
25
Choice 2
25
25
23
23
23
21
21
21
19
19
19
17
17
17
15
15
15
13
13
13
11
11
11
9
9
9
7
7
7
1
1
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
4
4
3
3
3
2
2
2
1
1
1
2
3
4
0
0
5
7
9 11 13 15 17 19 21 23 25
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
3
Payoff 2
4
0
Choice 1
3
Choice 1
Choice 1
5
0
0
1
2
3
4
Chicken
γ-WB
STEWA
Choice 2
I-SAW
Choice 2
Choice 2
25
25
25
23
23
23
21
21
21
19
19
19
17
17
17
15
15
15
13
13
13
11
11
11
9
9
9
7
7
7
1
1
3
5
7
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
3
5
7
5
3
1
9 11 13 15 17 19 21 23 25
1
Payoff 2
4
4
3
3
3
2
2
2
1
1
1
2
3
4
0
0
5
7
9 11 13 15 17 19 21 23 25
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
3
Payoff 2
4
0
Choice 1
3
Choice 1
Choice 1
5
5
0
0
1
2
3
4
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
4
4
3
3
3
3
2
2
2
2
1
1
1
1
1
2
3
2
3
4
0
4
0
Payoff
2
0
Payoff
2
1
2
3
1
2
3
0
4
0
Payoff
2
1
2
3
0
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
3
4
0
Payoff
2
1
2
3
Payoff 1
2
Payoff 1
Payoff 1
1
0
0
4
0
Payoff
2
1
2
3
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
2
3
4
0
0
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
0
0
1
2
3
4
1
2
3
4
0
Payoff
2
1
2
3
4
0
Payoff
2
1
2
3
4
1
2
3
4
0
4
4
0
0
Payoff
2
0
4
4
0
Payoff
2
Chicken
1
4
0
Human Subjects
0
Payoff
2
0
Payoff 1
0
0
Payoff
2
Stag-Hunt
4
Payoff 1
3
0
Payoff 1
I-SAW
2
Payoff 1
γ-WB
1
Payoff 1
0
0
Payoff
2
STEWA
Payoff 2
4
Payoff 1
Strategy Learning
Payoff 2
Prisoner’s Dilemma
Phase II
Payoff 2
Battle of the Sexes
Payoff 2
4
0
0
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
4
4
3
3
3
3
2
2
2
2
1
1
1
1
1
2
3
2
3
4
0
4
0
Payoff
2
0
Payoff
2
1
2
3
1
2
3
0
4
0
Payoff
2
1
2
3
0
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
3
4
0
Payoff
2
1
2
3
Payoff 1
2
Payoff 1
Payoff 1
1
0
0
4
0
Payoff
2
1
2
3
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
2
3
4
0
0
1
2
3
4
Payoff 1
1
Payoff 1
Payoff 1
0
0
0
1
2
3
4
1
2
3
4
0
Payoff
2
1
2
3
4
0
Payoff
2
1
2
3
4
1
2
3
4
0
4
4
0
0
Payoff
2
0
4
4
0
Payoff
2
Chicken
1
4
0
Human Subjects
0
Payoff
2
0
Payoff 1
0
0
Payoff
2
Stag-Hunt
4
Payoff 1
3
0
Payoff 1
I-SAW
2
Payoff 1
γ-WB
1
Payoff 1
0
0
Payoff
2
STEWA
Payoff 2
4
Payoff 1
Action Learning
Payoff 2
Prisoner’s Dilemma
Phase II
Payoff 2
Battle of the Sexes
Payoff 2
4
0
0
Which is Better? Formal Evidence
D (π) = ε
jπ m
ε
I
π is the payoff,
I
ε is the accuracy of the discretization,
I
D(π) denotes the transformed payoff, and
I
b·e rounds the fraction to the nearest integer.
STEWA
γ-WB
I-SAW
Game Action Strategy Action Strategy Action Strategy
PD
0.532
0.228
0.535
0.237
0.486
0.257
BoS
0.642
0.126
0.580
0.153
0.531
0.455
SH
0.191
0.165
0.782
0.122
0.614
0.140
CH
0.698
0.326
0.679
0.381
0.773
0.221
Total 2.064
0.846
2.575
0.893
2.404
1.073
Robustness Check
Total Euclidean Distance
2.5
2.0
×
.
×
.
×
.
•
•
•
×
.
•
×
.
•
×
.
•
×
.
•
×
.
•
×
.
•
×
×
.
.
•
•
×
.
•
×
.
•
×
×
.
•
.
.
.
.
•
×
•
×
×
•
•
×
.
•
×
.
•
×
.
•
×
.
•
×
.
•
1.5
1.0
.
×
•
.
×
•
.
×
•
.
×
•
.
.
×
•
×
•
.
×
•
.
×
•
.
×
•
.
×
•
0.5
.
.
•
×
×
•
.
×
•
.
×
•
.
×
•
.
.
×
•
×
•
0.0
0.1
0.2
0.3
0.4
0.5
0.6
ε
0.7
0.8
0.9
1.0
.
I-SAW
×
γ-WB
•
STEWA
Fitness Function 1
γ-WB
STEWA
Human Subjects
I-SAW
Payoff 2
Payoff 2
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
1
2
3
0
4
0
Payoff 2
4
0
3
0
4
0
2
3
0
4
0
1
2
3
0
Payoff 2
1
2
3
0
Payoff 2
4
4
3
3
3
3
2
2
2
2
1
1
1
1
3
4
0
1
2
3
0
4
0
Payoff 2
1
2
3
0
4
4
3
3
3
3
2
2
2
2
1
1
1
4
0
0
1
2
3
4
4
1
2
3
4
1
2
3
4
1
0
0
1
2
3
4
Payoff 1
3
Payoff 1
2
Payoff 1
Payoff 1
1
3
Payoff 2
4
0
2
0
4
Payoff 2
4
0
1
Payoff 1
2
Payoff 1
1
Payoff 1
Payoff 1
0
Payoff 2
4
Payoff 2
4
0
3
0
4
4
0
2
1
0
4
1
Payoff 2
Payoff 1
2
1
Payoff 2
Payoff 1
Stag-Hunt
3
Payoff 1
Payoff 1
1
Payoff 2
Chicken
2
0
Payoff 2
0
0
1
Payoff 1
0
Payoff 1
Payoff 1
0
Payoff 1
Prisoner’s Dilemma
Payoff 2
Battle of the Sexes
Payoff 2
4
0
0
Fitness Function 2
γ-WB
STEWA
Human Subjects
I-SAW
Payoff 2
Payoff 2
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
1
2
3
0
4
0
Payoff 2
4
0
3
0
4
0
2
3
0
4
0
1
2
3
0
Payoff 2
1
2
3
0
Payoff 2
4
4
3
3
3
3
2
2
2
2
1
1
1
1
3
4
0
1
2
3
0
4
0
Payoff 2
1
2
3
0
4
4
3
3
3
3
2
2
2
2
1
1
1
4
0
0
1
2
3
4
4
1
2
3
4
1
2
3
4
1
0
0
1
2
3
4
Payoff 1
3
Payoff 1
2
Payoff 1
Payoff 1
1
3
Payoff 2
4
0
2
0
4
Payoff 2
4
0
1
Payoff 1
2
Payoff 1
1
Payoff 1
Payoff 1
0
Payoff 2
4
Payoff 2
4
0
3
0
4
4
0
2
1
0
4
1
Payoff 2
Payoff 1
2
1
Payoff 2
Payoff 1
Stag-Hunt
3
Payoff 1
Payoff 1
1
Payoff 2
Chicken
2
0
Payoff 2
0
0
1
Payoff 1
0
Payoff 1
Payoff 1
0
Payoff 1
Prisoner’s Dilemma
Payoff 2
Battle of the Sexes
Payoff 2
4
0
0
Synchronous Updating of Strategies
γ-WB
Payoff 2
Payoff 2
4
3
3
2
2
2
2
1
1
1
3
0
4
0
Payoff 2
1
2
3
1
0
4
0
Payoff 2
1
2
3
0
4
0
Payoff 2
4
4
4
3
3
3
3
2
2
2
2
1
1
1
0
4
0
Payoff 2
1
2
3
0
Payoff 2
1
2
3
0
Payoff 2
4
4
3
3
3
2
2
2
2
1
1
1
0
4
0
Payoff 2
1
2
3
0
Payoff 2
1
2
3
0
Payoff 2
4
4
4
3
3
3
2
2
2
2
1
1
1
4
0
0
1
2
3
4
2
3
4
1
2
3
4
1
0
0
1
2
3
4
Payoff 1
3
Payoff 1
2
Payoff 1
Payoff 1
1
1
Payoff 2
3
0
4
0
4
4
0
3
1
0
4
2
Payoff 1
3
Payoff 1
2
Payoff 1
Payoff 1
1
1
Payoff 2
4
3
0
4
0
4
4
0
3
1
0
4
2
Payoff 1
3
Payoff 1
2
Payoff 1
1
1
Payoff 2
4
0
Payoff 1
2
Payoff 1
1
Payoff 1
0
0
Stag-Hunt
Payoff 2
4
3
Payoff 1
Battle of the Sexes
Payoff 2
4
3
0
Chicken
Human Subjects
I-SAW
4
Payoff 1
Prisoner’s Dilemma
STEWA
0
0
Inferred Rules of Behavior
A
q1
B
B
q2
AB
AB
A
q1
B
A
A
q1
B
q2
B
B
A
q1
B
A
B
A
q1
B
q2
AB
A
AB
A
AB
A
B
A
q1
B
q2
AB
AB
B
q2
AB
AB
Win-Stay,Lose-Shift
17
BB Triggers A
Quick B, Then A
BB or A Once
10
AA Triggers B
2,14
Quick A, Then B
11,24
AB
A
q1
16
BA Triggers A
12,25
B
q2
B
A
q1
AA or B Once
1,15
B
q2
AB
A
q1
6
Grim-Trigger
5,20
B
q2
A
A
q1
Tit-For-Tat
3,18
B
q2
A
AB
Chicken
4,19
B
q2
B
A
Stag-Hunt
90%
70%
50%
30%
10%
B
A
Battle of
the Sexes
90%
70%
50%
30%
10%
A
q1
Prisoner’s
Dilemma
90%
70%
50%
30%
10%
A
Equivalent
Automata
90%
70%
50%
30%
10%
Behavior
B
q2
Alternations
EWA
γ-WB
I-SAW
Human Subjects
Contribution
Our suggested modeling framework:
1. accommodates a richer specification of strategies,
2. is consistent with belief-learning,
3. allows asynchronous updating of strategies,
4. is flexible enough to incorporate larger strategy sets, and
5. makes no a priori assumptions on social preferences.
Future Work
I
Incomplete information
I
Random matching protocols
I
Cut-off strategies
I
Asymmetric payoffs
I
Small amounts of errors
© Copyright 2026 Paperzz