EWA Learning - Faculty Directory | Berkeley-Haas

Outline
 Motivation
 Mutual Consistency: CH Model
 Noisy Best-Response: QRE Model
 Instant Convergence: EWA Learning
August 2007
Duke PhD Summer Camp
Teck H. Ho
1
Standard Assumptions in
Equilibrium Analysis
Assumptions
Nash
Equilbirum
Cognitive
Hierarchy
QRE
EWA
Learning
Strategic Thinking
X
X
X
X
Best Response
X
X
Solution Method
X
X
Mutual Consistency
Instant Convergence
August 2007
X
X
X
Duke PhD Summer Camp
X
Teck H. Ho
2
Motivation: EWA Learning

Provide a formal model of dynamic adjustment
based on information feedback (e.g., a model of
equilibration)

Allow players who have varying levels of expertise
to behave differently (adaptation vs. sophistication)
August 2007
Duke PhD Summer Camp
Teck H. Ho
3
Outline
 Research Question
 Criteria of a Good Model
 Experience Weighted Attraction (EWA 1.0)
Learning Model
 Sophisticated EWA Learning (EWA 2.0)
August 2007
Duke PhD Summer Camp
Teck H. Ho
4
Median-Effort Game
August 2007
Duke PhD Summer Camp
Teck H. Ho
5
Median-Effort Game
Fig ure 1a: Ac tual c ho ic e fre que nc ie s
0.600
0.500
0.400
10
0.300
7
0.200
4
0.100
0.000
1 2 3
4 5 6 7
Strategy
Period
1
Van Huyck, Battalio, and Beil (1990)
August 2007
Duke PhD Summer Camp
Teck H. Ho
6
The learning setting

In games where each player is aware of the payoff table

Player i’s strategy space consists of discrete choices
indexed by j (e.g., 1, 2,…, 7)

The game is repeated for several rounds

At each round, all players observed:
 Strategy or action history of all other players
 Own payoff history
August 2007
Duke PhD Summer Camp
Teck H. Ho
7
Research question

To develop a good descriptive model of adaptive learning
to predict the probability of player i (i=1,…,n) choosing
strategy j at round t in any game
Pij (t )
August 2007
Duke PhD Summer Camp
Teck H. Ho
8
Criteria of a “good” model





Use (potentially) all available information
subjects receive in a sensible way
Satisfies plausible principles of behavior (i.e.,
conformity with other sciences such as
psychology)
Fits and predicts choice behavior well
Ability to generate new insights
As simple as the data allow
August 2007
Duke PhD Summer Camp
Teck H. Ho
9
Models of Learning

Introspection ( Pj ): Requires too much human cognition
 Nash equilibrium (Nash, 1950)
 Quantal response equilibrium (Nash-l) (McKelvey and Palfrey,
1995)

Evolution ( Pj(t) ): Players are pre-programmed
 Replicator dynamics (Friedman, 1991)
 Genetic algorithm (Ho, 1996)

Learning ( Pij(t) ): Uses about the right level of cognition
 Experience-weighted attraction learning (Camerer and Ho, 1999)
 Reinforcement (Roth and Erev, 1995)
 Belief-based learning



Cournot best-response dynamics (Cournot, 1838)
Simple Fictitious Play (Brown, 1951)
Weighted Fictitious Play (Fudenberg and Levine, 1998)
 Directional learning (Selten, 1991)
August 2007
Duke PhD Summer Camp
Teck H. Ho
10
Information Usage in Learning

Choice reinforcement learning (Thorndike, 1911; Bush and
Mosteller, 1955; Herrnstein, 1970; Arthur, 1991; Erev and Roth,
1998): successful strategies played again

Belief-based Learning (Cournot, 1838; Brown, 1951; Fudenberg
and Levine 1998): form beliefs based on opponents’ action
history and choose according to expected payoffs

The information used by reinforcement learning is own payoff
history and by belief-based models is opponents’ action history

EWA uses both kinds of information
August 2007
Duke PhD Summer Camp
Teck H. Ho
11
“Laws” of Effects in Learning
T
M
B
L
8
5
4
R
8
9
Colin chose B and received 4
Teck chose M and received 5
Their opponents chose L
10
Row player’s payoff table

Law of actual effect: successes increase the probability of
chosen strategies


Law of simulated effect: strategies with simulated successes
will be chosen more often


Teck is more likely than Colin to “stick” to his previous choice (other things being equal)
Colin is more likely to switch to T than M
Law of diminishing effect: Incremental effect of
reinforcement diminishes over time

$1 has more impact in round 2 than in round 7.
August 2007
Duke PhD Summer Camp
Teck H. Ho
12
Assumptions of
Reinforcement and Belief Learning

Reinforcement learning ignores simulated effect

Belief learning predicts actual and simulated
effects are equally strong

EWA learning allows for a positive (and smaller
than actual) simulated effect
August 2007
Duke PhD Summer Camp
Teck H. Ho
13
The EWA Model

Initial attractions and experience (i.e., Aij (0), N (0) )

Updating rules
  N (t  1)  Aij (t  1)   i ( sij , si (t ))


N (t )
Aij (t )  
  N (t  1)  Aij (t  1)     i ( sij , si (t ))


N (t )

sij  si (t )
sij  si (t )
N (t )    Ν  t  1 1

Choice probabilities Pij (t  1) 
e
l  Aij ( t )
mi
l  Aik ( t )
e

k 1
Camerer and Ho (Econometrica, 1999)
August 2007
Duke PhD Summer Camp
Teck H. Ho
14
EWA Model and Laws of Effects
  N (t  1)  Aij (t  1)   i ( sij , si (t ))

N (t )
Aij (t )  
  N (t  1)  Aij (t  1)     i ( sij , si (t ))


N (t )
sij  si (t )
sij  si (t )
N (t )    Ν t  1 1



Law of actual effect: successes increase the probability of
chosen strategies (positive incremental reinforcement
increases attraction and hence probability)
Law of simulated effect: strategies with simulated
successes will be chosen more often (  > 0 )
Law of diminishing effect: Incremental effect of
reinforcement diminishes over time ( N(t) >= N(t-1) )
August 2007
Duke PhD Summer Camp
Teck H. Ho
15
The EWA model: An Example
L
R
T
8
8
B
4
10
History: Period 1 = (B,L)
Row player’s payoff table

Period 0: AT (0), AB (0)

Period 1:
  AT (0)  N (0)    8
A (1) 
  N (0)  1
  A B (0)  N (0)  
B
A (1) 
  N (0)  1
T
August 2007
Duke PhD Summer Camp
Teck H. Ho
16
Reinforcement Model: An Example
L
T
R
8
8
History: Period 1 = (B,L)
B
4
10
Row player’s payoff table


Period 0: RT (0), R B (0)
Period 1:
RT (1)    RT (0)
R B (1)    R B (0)  4
  AT (0)  N (0)    8
  N (0)  1
  A B (0)  N (0)  
A B (1) 
  N (0)  1
AT (1) 
If      0N (0)  1
EWA  RF
August 2007
Duke PhD Summer Camp
Teck H. Ho
17
Belief-based(BB) model: An Example
L

R
T
8
8
B
4
10
Period 0:
N (0)  N (0)  N (0)
L
R
N L (0)
B (0) 
N (0)
L
N R (0)
B (0) 
N (0)
R
N L (0)  8  N R (0)  8
E (0)  8  B (0)  8  B (0) 
N (0)
T
L
R
N L (0)  4  N R (0) 10
E (0)  4  B (0)  10  B (0) 
N (0)
B

Period 1:
L
R
  N L (0)  1
B (1) 
  N (0)  1
  N R ( 0)  0
B (1) 
  N ( 0)  1
  E T (0)  N (0)  8
T
L
R
E (1)  8  B (1)  8  B (1) 
  N (0)  1
  E B (0)  N (0)  4
B
L
R
E (1)  4  B (1)  10  B (1) 
  N ( 0)  1
L
R
Bayesian Learning with Dirichlet priors
August 2007
Duke PhD Summer Camp
Teck H. Ho
18
Relationship between Belief-based (BB)
and EWA Learning Models
BB
  N L (0)  1
B (1) 
  N (0)  1
EWA
  N R (0)  0
B (1) 
  N (0)  1
L
R
  E T (0)  N (0)  8
E (1)  8  B (1)  8  B (1) 
  N (0)  1
  E B (0)  N (0)  4
B
L
R
E (1)  4  B (1)  10  B (1) 
  N (0)  1
T
L
R
  AT (0)  N (0)    8
A (1) 
  N (0)  1
  A B (0)  N (0)  
B
A (1) 
  N (0)  1
T
If   1   
EWA  BB
August 2007
Duke PhD Summer Camp
Teck H. Ho
19
Model Interpretation



Simulation or attention parameter () : measures the degree of
sensitivity to foregone payoffs
 
Exploitation parameter (   
): measures how rapidly
players lock-in to a strategy (average versus cumulative)
Stationarity or motion parameter (  ): measures players’
perception of the degree of stationarity of the environment
August 2007
Duke PhD Summer Camp
Teck H. Ho
20
Model Interpretation
Weighted Fictitious Play
Cournot
Fictitious Play
Cumulative
Reinforcement
Average Reinforcement
August 2007
Duke PhD Summer Camp
Teck H. Ho
21
New Insight

Reinforcement and belief learning were thought to be
fundamental different for 50 years.

For instance, “….in rote [reinforcement] learning success and
failure directly influence the choice probabilities. … Belief
learning is very different. Here experiences strengthen or
weaken beliefs. Belief learning has only an indirect influence
on behavior.” (Selten, 1991)

EWA shows that belief and reinforcement learning are related
and special kinds of EWA learning
August 2007
Duke PhD Summer Camp
Teck H. Ho
22
Actual versus Belief-Based Model
Frequencies
Fig ure 1a: Ac tual c ho ic e fre que nc ie s
Fig ure 1b: B e lie f-base d Mo de l
fre que ncie s
0.600
0.600
0.400
0.500
10
0.400
0.300
4
0.100
1 2 3
4 5 6
7
Strategy
August 2007
0.200
Period
0.000
7
4
7
0.200
0.000
10
1234 5
6 7
Strategy
1
Duke PhD Summer Camp
Period
1
Teck H. Ho
23
Actual versus Reinforcement Model
Frequencies
Fig ure 1a: Ac tual c ho ic e fre que nc ie s
Figure 1c: Choice reinforcement model frequencies
0.600
0.600
0.500
0.500
0.400
0.400
10
0.300
7
7
0.200
1 2 3
4 5 6
7
Strategy
August 2007
0.200
5
4
0.100
0.000
9
0.300
Period
Period
0.100
3
0.000
1
1
2
3
1
4
5
6
7
Strategy
Duke PhD Summer Camp
Teck H. Ho
24
Actual versus EWA Model Frequencies
Fig ure 1a: Ac tual c ho ic e fre que nc ie s
Figure 1d: EWA mode l fre que nc ie s
0.600
0.600
0.500
0.500
0.400
0.400
10
0.300
August 2007
7
0.200
4
0.100
1 2 3
4 5 6
7
Strategy
8
7
0.200
0.000
10
9
0.300
6
Period
5
Period
4
0.100
3
1
2
0.000
1
2
3
Strategy
Duke PhD Summer Camp
1
4
5
6
7
Teck H. Ho
25
Estimation and Results
Game
Model
No. of Parameters
LL
Calibration
AIC
Random Choice
0
-677.29
-677.29
-677.29
0.0000
-315.24
0.1217
Choice Reinforcement
8
-341.70
-349.70
-365.44
0.4837
-80.27
0.0301
Belief-based
9
-438.74
-447.74
-465.45
0.3389
-113.90
0.0519
EWA
11
-309.30
-320.30
-341.94*
0.5271
-41.05
0.0185
Random
0
-677.29
-677.29
-677.29
0.0000
-315.24
0.1217
Choice Reinforcement
17
-331.25
-348.25
-381.70
0.4858
-66.32
0.0245
Belief-based
19
-379.24
-398.24
-435.62
0.4120
-70.31
0.0250
EWA
23
-290.25
-313.25*
-358.51
0.5374*
-34.79*
0.0139*
BIC
2
Validation
LL
MSD
Median Action(M=378)
1-segment
2-Segment
August 2007
Duke PhD Summer Camp
Teck H. Ho
26
August 2007
Duke PhD Summer Camp
Teck H. Ho
27
August 2007
Duke PhD Summer Camp
Teck H. Ho
28
Extensions

Heterogeneity (JMP, Camerer and Ho, 1999)

Payoff learning (EJ, Ho, Wang, and Camerer 2006)

Sophistication and strategic teaching


Sophisticated learning (JET, Camerer, Ho, and Chong, 2002)
Reputation building (GEB, Chong, Camerer, and Ho, 2006)

EWA Lite (Self-tuning EWA learning) (JET, Ho, Camerer,
and Chong, 2007)

Applications:

Product Choice at Supermarkets (JMR, Ho and Chong, 2004)
August 2007
Duke PhD Summer Camp
Teck H. Ho
29
Homework
 Provide a general proof that Bayesian learning
(i.e., weighted fictitious play) is a special case of
EWA learning.
 If players are faced with a stationary environment
(i.e., decision problems), will EWA learning lead
to EU maximization in the long-run?
August 2007
Duke PhD Summer Camp
Teck H. Ho
30
Outline
 Research Question
 Criteria of a Good Model
 Experience Weighted Attraction (EWA 1.0)
Learning Model
 Sophisticated EWA Learning (EWA 2.0)
August 2007
Duke PhD Summer Camp
Teck H. Ho
31
Three User Complaints of EWA 1.0
 Experience
matters.
 EWA 1.0
prediction is not sensitive to the
structure of the learning setting (e.g.,
matching protocol).

EWA 1.0 model does not use opponents’
payoff matrix to predict behavior.
August 2007
Duke PhD Summer Camp
Teck H. Ho
32
Example 3: p-Beauty Contest

n players

Every player simultaneously chooses a number from 0
to 100

Compute the group average

Define Target Number to be 0.7 times the group average

The winner is the player whose number is the closet to
the Target Number

The prize to the winner is US$20
Ho, Camerer, and Weigelt (AER, 1998)
August 2007
Duke PhD Summer Camp
Teck H. Ho
33
Actual versus Belief-Based Model
Frequencies: pBC (inexperienced subjects)
Figure 2a: Actual Choice Frequencies for Inexperienced Subjects
Figure 3d: Belief Learning Model Frequencies for Inexperienced Subjects
0.60
0.40
0.50
0.35
0.30
0.40
0.25
0.30
0.20
0.15
0.20
0.10
9
0.05
9
0.10
7
7
August 2007
5
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
21~30
0.00
0
Round
91~100
71~80
81~90
51~60
61~70
31~40
Choices
41~50
11~20
21~30
0
1~10
3
1~10
5
0.00
Teck H. Ho
34
Actual versus Reinforcement Model
Frequencies: pBC (inexperienced subjects)
Figure 2a: Actual Choice Frequencies for Inexperienced Subjects
0.40
Figure 3e: Reinforcement Model Frequencies for Inexperienced
Subjects
0.60
0.55
0.35
0.50
0.30
0.45
0.40
0.25
0.35
0.20
0.30
0.25
0.15
0.20
0.10
0.15
9
0.05
9
0.10
7
7
0.05
5
91~100
71~80
1
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
0
1
21~30
0.00
91~100
71~80
81~90
51~60
61~70
31~40
41~50
11~20
21~30
0
1~10
Choices
August 2007
Round
3
1~10
5
0.00
Teck H. Ho
35
Actual versus EWA Model Frequencies:
pBC (inexperienced subjects)
Figure 2a: Actual Choice Frequencies for Inexperienced Subjects
Figure 2b: Adaptive EWA Model Frequencies for Inexperienced Subjects
0.4
0.40
0.35
0.35
0.3
0.30
0.25
0.25
0.20
0.2
0.15
0.15
0.1
0.10
9
9
0.05
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
91~100
71~80
81~90
51~60
61~70
31~40
41~50
11~20
21~30
0
1~10
5
0
3
Choices
August 2007
Round
0
5
0.00
7
21~30
7
1~10
0.05
Teck H. Ho
36
Actual versus Belief-Based Model
Frequencies: pBC (experienced subjects)
Figure 3a: Actual Choice Frequencies for Experienced Subjects
Figure 4d: Belief Learning Model Frequencies for Experienced Subjects
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
9
0.1
9
0.1
7
7
August 2007
5
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
21~30
0.0
0
Round
91~100
71~80
81~90
51~60
61~70
31~40
Choices
41~50
11~20
21~30
0
1~10
3
1~10
5
0.0
Teck H. Ho
37
Actual versus Reinforcement Model
Frequencies: pBC (experienced subjects)
Figure 3a: Actual Choice Frequencies for Experienced Subjects
Figure 4e: Reinforcement Model Frequencies for Experienced Subjects
0.60
0.6
0.55
0.50
0.5
0.45
0.40
0.4
0.35
0.30
0.3
0.25
0.20
0.2
0.15
9
5
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
21~30
0.00
91~100
71~80
81~90
51~60
61~70
31~40
41~50
11~20
21~30
0
1~10
August 2007
Round
3
Choices
7
0.05
0
5
0.0
9
0.10
7
1~10
0.1
Teck H. Ho
38
Actual versus EWA Model Frequencies:
pBC (experienced subjects)
Figure 3a: Actual Choice Frequencies for Experienced Subjects
Figure 3b: Adaptive EWA Model Frequencies for Experienced Subjects
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
9
0.1
9
0.1
7
7
5
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
21~30
0
0.0
91~100
71~80
81~90
51~60
61~70
31~40
41~50
11~20
21~30
0
1~10
Choices
August 2007
Round
3
1~10
5
0.0
Teck H. Ho
39
Sophisticated EWA Learning (EWA 2.0)
 The
population consists of both adaptive and
sophisticated players.
proportion of sophisticated players is denoted by a.
Each sophisticated player however believes the
proportion of sophisticated players to be a’.
 The
 Use
latent class to estimate parameters.
August 2007
Duke PhD Summer Camp
Teck H. Ho
40
The EWA 2.0 Model: Adaptive players

Adaptive ( 1  a ) + sophisticated ( a )

Adaptive players
  N (t  1)  Aij ( a, t  1)   i ( sij , si (t ))

N (t )

Aij ( a, t )  
  N (t  1)  Aij ( a, t  1)     i ( sij , si (t ))

N (t )

Pij ( a, t  1) 
e
sij  si (t )
sij  si (t )
l  Aij ( t )
mi
l  Aik ( t )
e

k 1
August 2007
Duke PhD Summer Camp
Teck H. Ho
41
The EWA 2.0 Model: Sophisticated Players


Adaptive (1  a ) + sophisticated ( a )
Sophisticated players believe ( 1 a  )proportion of the players
are adaptive and best respond based on that belief:
m i
Aij ( s, t )   [(1  a   Pi ,k (a, t  1)  a   Pi ,k ( s, t  1)]    ( sij , si ,k )
k 1
Pij ( s, t  1) 
e
mi
l  Aij ( s ,t )
 el
 Aik ( s ,t )
k 1

Better-than-others ( a   a ); false consensus ( a   a )
August 2007
Duke PhD Summer Camp
Teck H. Ho
42
Well-known Special Cases
 Nash
equilibrium: aa’ = 1 and l= infinity
 Quanta
response equilibrium: aa’ = 1
 Rational
expectation model: aa’
 Better-than-others
August 2007
model: aa’
Duke PhD Summer Camp
Teck H. Ho
43
Results
Figure 3a: Actual Choice Frequencies for Experienced Subjects
Figure 3c: Sophisticated EWA Model Frequencies for Experienced Subjects
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
9
0.1
9
0.1
7
7
5
1
91~100
71~80
81~90
51~60
61~70
31~40
Choices
Duke PhD Summer Camp
Round
3
41~50
11~20
1
21~30
0
0
91~100
71~80
81~90
51~60
61~70
31~40
41~50
11~20
21~30
0
1~10
Choices
August 2007
Round
3
1~10
5
0.0
Teck H. Ho
44
MLE Estimates
August 2007
Duke PhD Summer Camp
Teck H. Ho
45
Summary

EWA cube provides a simple but useful framework
for studying learning in games.

EWA 1.0 model fits and predicts better than
reinforcement and belief learning in many classes of
games because it allows for a positive (and smaller
than actual) simulated effect.

EWA 2.0 model allows us to study equilibrium and
adaptation simultaneously (and nests most familiar
cases including QRE)
August 2007
Duke PhD Summer Camp
Teck H. Ho
46
Key References for This Talk
 Refer to Ho, Lim, and Camerer (JMR, 2006) for
important references
 CH Model: Camerer, Ho, and Chong (QJE, 2004)
 QRE Model: McKelvey and Palfrey (GEB,
1995); Baye and Morgan (RAND, 2004)
 EWA Learning: Camerer and Ho (Econometrica,
1999), Camerer, Ho, and Chong (JET, 2002)
August 2007
Duke PhD Summer Camp
Teck H. Ho
47
August 2007
Duke PhD Summer Camp
Teck H. Ho
48