Belief Learning in an Unstable
Infinite Game
Paul J. Healy
CMU
Issue #3
Issue #2
Belief Learning in an Unstable
Infinite Game
Issue #1
Issue #1: Infinite Games
• Typical Learning Model:
– Finite set of strategies
– Strategies get weight based on ‘fitness’
– Bells & Whistles: experimentation, spillovers…
• Many important games have infinite strategies
– Duopoly, PG, bargaining, auctions, war of attrition…
• Quality of fit sensitive to grid size?
• Models don’t use strategy space structure
Previous Work
• Grid size on fit quality:
– Arifovic & Ledyard
• Groves-Ledyard mechanisms
• Convergence failure of RL with |S| = 51
• Strategy space structure:
– Roth & Erev AER ’99
• Quality-of-fit/error measures
– What’s the right metric space?
• Closeness in probs. or closeness in strategies?
Issue #2: Unstable Game
• Usually predicting convergence rates
– Example: p–beauty contests
• Instability:
– Toughest test for learning models
– Most statistical power
Previous Work
• Chen & Tang ‘98
– Walker mechanism & unstable Groves-Ledyard
– Reinforcement > Fictitious Play > Equilibrium
• Healy ’06
– 5 PG mechanisms, predicting convergence or not
• Feltovich ’00
– Unstable finite Bayesian game
– Fit varies by game, error measure
Issue #3: Belief Learning
• If subjects are forming beliefs, measure them!
• Method 1: Direct elicitation
– Incentivized guesses about s-i
• Method 2: Inferred from payoff table usage
– Tracking payoff ‘lookups’ may inform our models
Previous Work
• Nyarko & Schotter ‘02
– Subjects BR to stated beliefs
– Stated beliefs not too accurate
• Costa-Gomes, Crawford & Boseta ’01
– Mouselab to identify types
– How players solve games, not learning
This Paper
• Pick an unstable infinite game
• Give subjects a calculator tool & track usage
• Elicit beliefs in some sessions
• Fit models to data in standard way
• Study formation of “beliefs”
– “Beliefs” <= calculator tool
– “Beliefs” <= elicited beliefs
The Game
• Walker’s PG mechanism for 3 players
• Added a ‘punishment’ parameter
N 1, 2, 3
Si
10, 10 R 1
ui
s i , s i vi
y
s t i
s
y
s j s j
vi
yb i y a i y2
ti
s
s i1 s i1 y
s
Parameters & Equilibrium
• vi(y) = biy – aiy2 + ci
• Pareto optimum: y = 7.5
• Unique PSNE: si* = 2.5
ai
bi
ci
1
0.1
1.5 110
2
0.2
3.0 125
3
0.3
4.5 140
• Punishment γ = 0.1
• Purpose: Not too wild, payoffs rarely negative
• Guessing Payoff: 10 – |gL - sL|/4 - |gR - sR|/4
• Game Payoffs: Pr(<50) = 8.9%
Pr(>100) = 71%
Choice of Grid Size
S = [-10,10]
Grid Width
5
2
1
1/2
1/4
1/8
# Grid Points
5
11
21
41
81
161
59.7
61.6
88.7
91.6
91.9
91.9
% on Grid
Properties of the Game
• Best response:
b 1 /2a 1
s BR
b 2 /2a 2
b 3 /2a 3
0
1 /2a 2
1 /2a 1 1 /2a 1
0
1 /2a 3 1 /2a 3
• BR Dynamics: unstable
– One eigenvalue is +2
1 /2a 2
0
s
Interface
Design
• PEEL Lab, U. Pittsburgh
• All Sessions
–
–
–
–
–
–
3 player groups, 50 periods
Same group, ID#s for all periods
Payoffs etc. common information
No explicit public good framing
Calculator always available
5 minute ‘warm-up’ with calculator
• Sessions 1-6
– Guess sL and sR.
• Sessions 7-13
– Baseline: no guesses.
Does Elicitation Affect Choice?
• Total Variation: t |xt xt1 |
– No significant difference (p=0.745)
• No. of Strategy Switches:
– No significant difference (p=0.405)
• Autocorrelation (predictability):
– Slightly more without elicitation
• Total Earnings per Session:
– No significant difference (p=1)
• Missed Periods:
– Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%)
Does Play Converge?
Average | si – si* | per Period
Average | y – yo | per Period
Average Distance From Equilibrium
Average |y - yo|
20
10
18
9
16
8
14
7
12
6
10
5
8
4
6
3
4
2
2
1
0
0
5
10
15
20
25
30
35
40
45
50
0
0
5
10
15
20
25
30
35
40
45
50
Does Play Converge, Part 2
10
8
6
4
2
0
-2
-4
-6
-8
-10
0
5
10
15
20
25
30
35
40
45
50
Accuracy of Beliefs
• Guesses get better in time
Average || s-i – s-i(t) || per Period
Elicited guesses
Calculator inputs
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0
0
5
10
15
20
25
30
35
40
45
0
0
5
10
15
20
25
30
35
40
45
50
Model 1: Parametric EWA
At
si
At1
si N t1
1 I s i ,s i t
ui
si , si
t
Nt
N t N t1 1
ti
si
eA t s i
xS eA t x
i
•
•
•
•
•
•
δ : weight on strategy actually played
φ : decay rate of past attractions
ρ : decay rate of past experience
A(0): initial attractions
N(0): initial experience
λ : response sensitivity to attractions
Model 1’: Self-Tuning EWA
At
si
Nt1A t1 s i
1I s ,s
i i t
ui
s i ,s i
t
Nt1
1
• N(0) = 1
• Replace δ and φ with deterministic functions:
t si
1 if u i si , si t u i st
0 otherwise
i,t 1 1
2
xS i
2
t
1
t
Ix,s
1
Ix,s i t
i
STEWA: Setup
• Only remaining parameters: λ and A0
– λ will be estimated
– 5 minutes of ‘Calculator Time’ gives A0
• Average payoff from calculator trials:
T
I s ,s
ui
s
t
t
i
i
t 1
T
I
t 1 s i ,s i t
T
1
u
s
t
i
t
1
T
A0
si
if
T
I s i ,s i
t 1
t 1
otherwise
STEWA: Fit
• Likelihoods are ‘zero’ for all λ
– Guess: Lots of near misses in predictions
• Alternative Measure: Quad. Scoring Rule
1
2
k
k
Pi
si , tI
si , si
t
k
– Best fit: λ = 0.04 (previous studies: λ>4)
– Suggests attractions are very concentrated
STEWA Lambda=4: Session 3 Player 2 Pers 11-30
10
8
6
4
2
1.0000
0
0.8000
-2
0.6000
0.4000
-4
0.2000
-
-6
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
EWA Prob
Period
-8
-10
Strategy
STEWA Lambda=0.04: Session 3 Player 2 Pers 11-30
10
8
6
4
2
1
0
0.8
-2
0.6
0.4
0.2
0
-4
-6
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
EWA Prob
Period
-8
-10
Strategy
STEWA: Adjustment Attempts
• The problem: near misses in strategy space,
not in time
• Suggests: alter δ (weight on hypotheticals)
–
–
–
–
original specification :
δ = 0.7 (p-beauty est.):
δ = 1 (belief model):
δ(k,t) = % of B.R. payoff:
QSR* = 1.193 @ λ*=0.04
QSR* = 1.056 @ λ*=0.03
QSR* = 1.082 @ λ*=0.175
QSR* = 1.077 @ λ*=0.06
• Altering φ:
– 1/8 weight on surprises:
QSR* = 1.228 @ λ*=0.04
STEWA: Other Modifications
• Equal initial attractions: worse
• Smoothing
– Takes advantage of strategy space structure
• λ spreads probability across strategies evenly
• Smoothing spreads probability to nearby strategies
– Smoothed Attractions
– Smoothed Probabilities
– But… No Improvement in QSR* or λ* !
• Tentative Conclusion:
– STEWA: not broken, or can’t be fixed…
Other Standard Models
•
•
•
•
•
•
•
Nash Equilibrium
Uniform Mixed Strategy (‘Random’)
Logistic Cournot BR
Deterministic Cournot BR
Logistic Fictitious Play
Deterministic Fictitious Play
k-Period BR
s
t BR
1
k
t1
tk
s
“New” Models
• Best respond to stated beliefs (S1-S6 only)
• Best respond to calculator entries
– Issue: how to aggregate calculator usage?
– Decaying average of input
• Reinforcement based on calculator payoffs
– Decaying average of payoffs
Model Comparisons
MODEL
PARAM
BIC
2-QSR
MAD
Random Choice*
N/A
In: Infinite
In: 0.952
Out: 0.878
In: 7.439
Out: 7.816
In: 82.866
Out: 85.558
Logistic STEWA*
λ
In: Infinite
In: 0.807
Out: 0.665
λ*=0.04
In: 3.818
Out: 3.180
λ*=0.41
In: 34.172
Out: 22.853
λ*=0.35
Logistic Cournot*
λ
In: Infinite
In: 0.952
Out: 0.878
λ*=0.00(!)
In: 4.222
Out: 3.557
λ*=4.30
In: 38.186
Out: 25.478
λ*=4.30
Logistic F.P.*
λ
In: Infinite
In: 0.955
Out: 0.878
λ*=14.98
In: 4.265
Out: 3.891
λ*=4.47
In: 31.062
Out: 22.133
λ*=4.47
* Estimates on the grid of integers {-10,-9,…,9,10}
In = periods 1-35 Out = periods 36-End
MSD
Model Comparisons 2
MODEL
PARAM
MAD
MSD
BR(Guesses)
(6 sessions only)
N/A
In: 5.5924
Out: 3.3693
In: 57.874
Out: 19.902
BR(Calculator Input)
δ (=1/2)
In: 6.394
Out: 8.263
In: 79.29
Out: 116.7
Calculator
Reinforcement*
δ (=1/2)
In: 7.389
Out: 7.815
In: 82.407
Out: 85.495
k-Period BR
k
In: 4.2126
Out: 3.582
k* = 4
In: 35.185
Out: 23.455
k* = 4
Cournot
N/A
In: 4.7974
Out: 3.857
In: 45.283
Out: 29.058
Weighted F.P.
δ
In: 4.500
Out: 3.518
δ* = 0.56
In: 38.290
Out: 22.426
δ * = 0.65
The “Take-Homes”
• Methodological issues
– Infinite strategy space
– Convergence vs. Instability
– Right notion of error
• Self-Tuning EWA fits best.
• Guesses & calculator input don’t seem to offer
any more predictive power… ?!?!
© Copyright 2026 Paperzz