Outline Motivation Mutual Consistency: CH Model Noisy Best-Response: QRE Model Instant Convergence: EWA Learning August 2007 Duke PhD Summer Camp Teck H. Ho 1 Standard Assumptions in Equilibrium Analysis Assumptions Nash Equilbirum Cognitive Hierarchy QRE EWA Learning Strategic Thinking X X X X Best Response X X Solution Method X X Mutual Consistency Instant Convergence August 2007 X X X Duke PhD Summer Camp X Teck H. Ho 2 Motivation: EWA Learning Provide a formal model of dynamic adjustment based on information feedback (e.g., a model of equilibration) Allow players who have varying levels of expertise to behave differently (adaptation vs. sophistication) August 2007 Duke PhD Summer Camp Teck H. Ho 3 Outline Research Question Criteria of a Good Model Experience Weighted Attraction (EWA 1.0) Learning Model Sophisticated EWA Learning (EWA 2.0) August 2007 Duke PhD Summer Camp Teck H. Ho 4 Median-Effort Game August 2007 Duke PhD Summer Camp Teck H. Ho 5 Median-Effort Game Fig ure 1a: Ac tual c ho ic e fre que nc ie s 0.600 0.500 0.400 10 0.300 7 0.200 4 0.100 0.000 1 2 3 4 5 6 7 Strategy Period 1 Van Huyck, Battalio, and Beil (1990) August 2007 Duke PhD Summer Camp Teck H. Ho 6 The learning setting In games where each player is aware of the payoff table Player i’s strategy space consists of discrete choices indexed by j (e.g., 1, 2,…, 7) The game is repeated for several rounds At each round, all players observed: Strategy or action history of all other players Own payoff history August 2007 Duke PhD Summer Camp Teck H. Ho 7 Research question To develop a good descriptive model of adaptive learning to predict the probability of player i (i=1,…,n) choosing strategy j at round t in any game Pij (t ) August 2007 Duke PhD Summer Camp Teck H. Ho 8 Criteria of a “good” model Use (potentially) all available information subjects receive in a sensible way Satisfies plausible principles of behavior (i.e., conformity with other sciences such as psychology) Fits and predicts choice behavior well Ability to generate new insights As simple as the data allow August 2007 Duke PhD Summer Camp Teck H. Ho 9 Models of Learning Introspection ( Pj ): Requires too much human cognition Nash equilibrium (Nash, 1950) Quantal response equilibrium (Nash-l) (McKelvey and Palfrey, 1995) Evolution ( Pj(t) ): Players are pre-programmed Replicator dynamics (Friedman, 1991) Genetic algorithm (Ho, 1996) Learning ( Pij(t) ): Uses about the right level of cognition Experience-weighted attraction learning (Camerer and Ho, 1999) Reinforcement (Roth and Erev, 1995) Belief-based learning Cournot best-response dynamics (Cournot, 1838) Simple Fictitious Play (Brown, 1951) Weighted Fictitious Play (Fudenberg and Levine, 1998) Directional learning (Selten, 1991) August 2007 Duke PhD Summer Camp Teck H. Ho 10 Information Usage in Learning Choice reinforcement learning (Thorndike, 1911; Bush and Mosteller, 1955; Herrnstein, 1970; Arthur, 1991; Erev and Roth, 1998): successful strategies played again Belief-based Learning (Cournot, 1838; Brown, 1951; Fudenberg and Levine 1998): form beliefs based on opponents’ action history and choose according to expected payoffs The information used by reinforcement learning is own payoff history and by belief-based models is opponents’ action history EWA uses both kinds of information August 2007 Duke PhD Summer Camp Teck H. Ho 11 “Laws” of Effects in Learning T M B L 8 5 4 R 8 9 Colin chose B and received 4 Teck chose M and received 5 Their opponents chose L 10 Row player’s payoff table Law of actual effect: successes increase the probability of chosen strategies Law of simulated effect: strategies with simulated successes will be chosen more often Teck is more likely than Colin to “stick” to his previous choice (other things being equal) Colin is more likely to switch to T than M Law of diminishing effect: Incremental effect of reinforcement diminishes over time $1 has more impact in round 2 than in round 7. August 2007 Duke PhD Summer Camp Teck H. Ho 12 Assumptions of Reinforcement and Belief Learning Reinforcement learning ignores simulated effect Belief learning predicts actual and simulated effects are equally strong EWA learning allows for a positive (and smaller than actual) simulated effect August 2007 Duke PhD Summer Camp Teck H. Ho 13 The EWA Model Initial attractions and experience (i.e., Aij (0), N (0) ) Updating rules N (t 1) Aij (t 1) i ( sij , si (t )) N (t ) Aij (t ) N (t 1) Aij (t 1) i ( sij , si (t )) N (t ) sij si (t ) sij si (t ) N (t ) Ν t 1 1 Choice probabilities Pij (t 1) e l Aij ( t ) mi l Aik ( t ) e k 1 Camerer and Ho (Econometrica, 1999) August 2007 Duke PhD Summer Camp Teck H. Ho 14 EWA Model and Laws of Effects N (t 1) Aij (t 1) i ( sij , si (t )) N (t ) Aij (t ) N (t 1) Aij (t 1) i ( sij , si (t )) N (t ) sij si (t ) sij si (t ) N (t ) Ν t 1 1 Law of actual effect: successes increase the probability of chosen strategies (positive incremental reinforcement increases attraction and hence probability) Law of simulated effect: strategies with simulated successes will be chosen more often ( > 0 ) Law of diminishing effect: Incremental effect of reinforcement diminishes over time ( N(t) >= N(t-1) ) August 2007 Duke PhD Summer Camp Teck H. Ho 15 The EWA model: An Example L R T 8 8 B 4 10 History: Period 1 = (B,L) Row player’s payoff table Period 0: AT (0), AB (0) Period 1: AT (0) N (0) 8 A (1) N (0) 1 A B (0) N (0) B A (1) N (0) 1 T August 2007 Duke PhD Summer Camp Teck H. Ho 16 Reinforcement Model: An Example L T R 8 8 History: Period 1 = (B,L) B 4 10 Row player’s payoff table Period 0: RT (0), R B (0) Period 1: RT (1) RT (0) R B (1) R B (0) 4 AT (0) N (0) 8 N (0) 1 A B (0) N (0) A B (1) N (0) 1 AT (1) If 0N (0) 1 EWA RF August 2007 Duke PhD Summer Camp Teck H. Ho 17 Belief-based(BB) model: An Example L R T 8 8 B 4 10 Period 0: N (0) N (0) N (0) L R N L (0) B (0) N (0) L N R (0) B (0) N (0) R N L (0) 8 N R (0) 8 E (0) 8 B (0) 8 B (0) N (0) T L R N L (0) 4 N R (0) 10 E (0) 4 B (0) 10 B (0) N (0) B Period 1: L R N L (0) 1 B (1) N (0) 1 N R ( 0) 0 B (1) N ( 0) 1 E T (0) N (0) 8 T L R E (1) 8 B (1) 8 B (1) N (0) 1 E B (0) N (0) 4 B L R E (1) 4 B (1) 10 B (1) N ( 0) 1 L R Bayesian Learning with Dirichlet priors August 2007 Duke PhD Summer Camp Teck H. Ho 18 Relationship between Belief-based (BB) and EWA Learning Models BB N L (0) 1 B (1) N (0) 1 EWA N R (0) 0 B (1) N (0) 1 L R E T (0) N (0) 8 E (1) 8 B (1) 8 B (1) N (0) 1 E B (0) N (0) 4 B L R E (1) 4 B (1) 10 B (1) N (0) 1 T L R AT (0) N (0) 8 A (1) N (0) 1 A B (0) N (0) B A (1) N (0) 1 T If 1 EWA BB August 2007 Duke PhD Summer Camp Teck H. Ho 19 Model Interpretation Simulation or attention parameter () : measures the degree of sensitivity to foregone payoffs Exploitation parameter ( ): measures how rapidly players lock-in to a strategy (average versus cumulative) Stationarity or motion parameter ( ): measures players’ perception of the degree of stationarity of the environment August 2007 Duke PhD Summer Camp Teck H. Ho 20 Model Interpretation Weighted Fictitious Play Cournot Fictitious Play Cumulative Reinforcement Average Reinforcement August 2007 Duke PhD Summer Camp Teck H. Ho 21 New Insight Reinforcement and belief learning were thought to be fundamental different for 50 years. For instance, “….in rote [reinforcement] learning success and failure directly influence the choice probabilities. … Belief learning is very different. Here experiences strengthen or weaken beliefs. Belief learning has only an indirect influence on behavior.” (Selten, 1991) EWA shows that belief and reinforcement learning are related and special kinds of EWA learning August 2007 Duke PhD Summer Camp Teck H. Ho 22 Actual versus Belief-Based Model Frequencies Fig ure 1a: Ac tual c ho ic e fre que nc ie s Fig ure 1b: B e lie f-base d Mo de l fre que ncie s 0.600 0.600 0.400 0.500 10 0.400 0.300 4 0.100 1 2 3 4 5 6 7 Strategy August 2007 0.200 Period 0.000 7 4 7 0.200 0.000 10 1234 5 6 7 Strategy 1 Duke PhD Summer Camp Period 1 Teck H. Ho 23 Actual versus Reinforcement Model Frequencies Fig ure 1a: Ac tual c ho ic e fre que nc ie s Figure 1c: Choice reinforcement model frequencies 0.600 0.600 0.500 0.500 0.400 0.400 10 0.300 7 7 0.200 1 2 3 4 5 6 7 Strategy August 2007 0.200 5 4 0.100 0.000 9 0.300 Period Period 0.100 3 0.000 1 1 2 3 1 4 5 6 7 Strategy Duke PhD Summer Camp Teck H. Ho 24 Actual versus EWA Model Frequencies Fig ure 1a: Ac tual c ho ic e fre que nc ie s Figure 1d: EWA mode l fre que nc ie s 0.600 0.600 0.500 0.500 0.400 0.400 10 0.300 August 2007 7 0.200 4 0.100 1 2 3 4 5 6 7 Strategy 8 7 0.200 0.000 10 9 0.300 6 Period 5 Period 4 0.100 3 1 2 0.000 1 2 3 Strategy Duke PhD Summer Camp 1 4 5 6 7 Teck H. Ho 25 Estimation and Results Game Model No. of Parameters LL Calibration AIC Random Choice 0 -677.29 -677.29 -677.29 0.0000 -315.24 0.1217 Choice Reinforcement 8 -341.70 -349.70 -365.44 0.4837 -80.27 0.0301 Belief-based 9 -438.74 -447.74 -465.45 0.3389 -113.90 0.0519 EWA 11 -309.30 -320.30 -341.94* 0.5271 -41.05 0.0185 Random 0 -677.29 -677.29 -677.29 0.0000 -315.24 0.1217 Choice Reinforcement 17 -331.25 -348.25 -381.70 0.4858 -66.32 0.0245 Belief-based 19 -379.24 -398.24 -435.62 0.4120 -70.31 0.0250 EWA 23 -290.25 -313.25* -358.51 0.5374* -34.79* 0.0139* BIC 2 Validation LL MSD Median Action(M=378) 1-segment 2-Segment August 2007 Duke PhD Summer Camp Teck H. Ho 26 August 2007 Duke PhD Summer Camp Teck H. Ho 27 August 2007 Duke PhD Summer Camp Teck H. Ho 28 Extensions Heterogeneity (JMP, Camerer and Ho, 1999) Payoff learning (EJ, Ho, Wang, and Camerer 2006) Sophistication and strategic teaching Sophisticated learning (JET, Camerer, Ho, and Chong, 2002) Reputation building (GEB, Chong, Camerer, and Ho, 2006) EWA Lite (Self-tuning EWA learning) (JET, Ho, Camerer, and Chong, 2007) Applications: Product Choice at Supermarkets (JMR, Ho and Chong, 2004) August 2007 Duke PhD Summer Camp Teck H. Ho 29 Homework Provide a general proof that Bayesian learning (i.e., weighted fictitious play) is a special case of EWA learning. If players are faced with a stationary environment (i.e., decision problems), will EWA learning lead to EU maximization in the long-run? August 2007 Duke PhD Summer Camp Teck H. Ho 30 Outline Research Question Criteria of a Good Model Experience Weighted Attraction (EWA 1.0) Learning Model Sophisticated EWA Learning (EWA 2.0) August 2007 Duke PhD Summer Camp Teck H. Ho 31 Three User Complaints of EWA 1.0 Experience matters. EWA 1.0 prediction is not sensitive to the structure of the learning setting (e.g., matching protocol). EWA 1.0 model does not use opponents’ payoff matrix to predict behavior. August 2007 Duke PhD Summer Camp Teck H. Ho 32 Example 3: p-Beauty Contest n players Every player simultaneously chooses a number from 0 to 100 Compute the group average Define Target Number to be 0.7 times the group average The winner is the player whose number is the closet to the Target Number The prize to the winner is US$20 Ho, Camerer, and Weigelt (AER, 1998) August 2007 Duke PhD Summer Camp Teck H. Ho 33 Actual versus Belief-Based Model Frequencies: pBC (inexperienced subjects) Figure 2a: Actual Choice Frequencies for Inexperienced Subjects Figure 3d: Belief Learning Model Frequencies for Inexperienced Subjects 0.60 0.40 0.50 0.35 0.30 0.40 0.25 0.30 0.20 0.15 0.20 0.10 9 0.05 9 0.10 7 7 August 2007 5 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 21~30 0.00 0 Round 91~100 71~80 81~90 51~60 61~70 31~40 Choices 41~50 11~20 21~30 0 1~10 3 1~10 5 0.00 Teck H. Ho 34 Actual versus Reinforcement Model Frequencies: pBC (inexperienced subjects) Figure 2a: Actual Choice Frequencies for Inexperienced Subjects 0.40 Figure 3e: Reinforcement Model Frequencies for Inexperienced Subjects 0.60 0.55 0.35 0.50 0.30 0.45 0.40 0.25 0.35 0.20 0.30 0.25 0.15 0.20 0.10 0.15 9 0.05 9 0.10 7 7 0.05 5 91~100 71~80 1 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 0 1 21~30 0.00 91~100 71~80 81~90 51~60 61~70 31~40 41~50 11~20 21~30 0 1~10 Choices August 2007 Round 3 1~10 5 0.00 Teck H. Ho 35 Actual versus EWA Model Frequencies: pBC (inexperienced subjects) Figure 2a: Actual Choice Frequencies for Inexperienced Subjects Figure 2b: Adaptive EWA Model Frequencies for Inexperienced Subjects 0.4 0.40 0.35 0.35 0.3 0.30 0.25 0.25 0.20 0.2 0.15 0.15 0.1 0.10 9 9 0.05 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 91~100 71~80 81~90 51~60 61~70 31~40 41~50 11~20 21~30 0 1~10 5 0 3 Choices August 2007 Round 0 5 0.00 7 21~30 7 1~10 0.05 Teck H. Ho 36 Actual versus Belief-Based Model Frequencies: pBC (experienced subjects) Figure 3a: Actual Choice Frequencies for Experienced Subjects Figure 4d: Belief Learning Model Frequencies for Experienced Subjects 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 9 0.1 9 0.1 7 7 August 2007 5 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 21~30 0.0 0 Round 91~100 71~80 81~90 51~60 61~70 31~40 Choices 41~50 11~20 21~30 0 1~10 3 1~10 5 0.0 Teck H. Ho 37 Actual versus Reinforcement Model Frequencies: pBC (experienced subjects) Figure 3a: Actual Choice Frequencies for Experienced Subjects Figure 4e: Reinforcement Model Frequencies for Experienced Subjects 0.60 0.6 0.55 0.50 0.5 0.45 0.40 0.4 0.35 0.30 0.3 0.25 0.20 0.2 0.15 9 5 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 21~30 0.00 91~100 71~80 81~90 51~60 61~70 31~40 41~50 11~20 21~30 0 1~10 August 2007 Round 3 Choices 7 0.05 0 5 0.0 9 0.10 7 1~10 0.1 Teck H. Ho 38 Actual versus EWA Model Frequencies: pBC (experienced subjects) Figure 3a: Actual Choice Frequencies for Experienced Subjects Figure 3b: Adaptive EWA Model Frequencies for Experienced Subjects 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 9 0.1 9 0.1 7 7 5 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 21~30 0 0.0 91~100 71~80 81~90 51~60 61~70 31~40 41~50 11~20 21~30 0 1~10 Choices August 2007 Round 3 1~10 5 0.0 Teck H. Ho 39 Sophisticated EWA Learning (EWA 2.0) The population consists of both adaptive and sophisticated players. proportion of sophisticated players is denoted by a. Each sophisticated player however believes the proportion of sophisticated players to be a’. The Use latent class to estimate parameters. August 2007 Duke PhD Summer Camp Teck H. Ho 40 The EWA 2.0 Model: Adaptive players Adaptive ( 1 a ) + sophisticated ( a ) Adaptive players N (t 1) Aij ( a, t 1) i ( sij , si (t )) N (t ) Aij ( a, t ) N (t 1) Aij ( a, t 1) i ( sij , si (t )) N (t ) Pij ( a, t 1) e sij si (t ) sij si (t ) l Aij ( t ) mi l Aik ( t ) e k 1 August 2007 Duke PhD Summer Camp Teck H. Ho 41 The EWA 2.0 Model: Sophisticated Players Adaptive (1 a ) + sophisticated ( a ) Sophisticated players believe ( 1 a )proportion of the players are adaptive and best respond based on that belief: m i Aij ( s, t ) [(1 a Pi ,k (a, t 1) a Pi ,k ( s, t 1)] ( sij , si ,k ) k 1 Pij ( s, t 1) e mi l Aij ( s ,t ) el Aik ( s ,t ) k 1 Better-than-others ( a a ); false consensus ( a a ) August 2007 Duke PhD Summer Camp Teck H. Ho 42 Well-known Special Cases Nash equilibrium: aa’ = 1 and l= infinity Quanta response equilibrium: aa’ = 1 Rational expectation model: aa’ Better-than-others August 2007 model: aa’ Duke PhD Summer Camp Teck H. Ho 43 Results Figure 3a: Actual Choice Frequencies for Experienced Subjects Figure 3c: Sophisticated EWA Model Frequencies for Experienced Subjects 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 9 0.1 9 0.1 7 7 5 1 91~100 71~80 81~90 51~60 61~70 31~40 Choices Duke PhD Summer Camp Round 3 41~50 11~20 1 21~30 0 0 91~100 71~80 81~90 51~60 61~70 31~40 41~50 11~20 21~30 0 1~10 Choices August 2007 Round 3 1~10 5 0.0 Teck H. Ho 44 MLE Estimates August 2007 Duke PhD Summer Camp Teck H. Ho 45 Summary EWA cube provides a simple but useful framework for studying learning in games. EWA 1.0 model fits and predicts better than reinforcement and belief learning in many classes of games because it allows for a positive (and smaller than actual) simulated effect. EWA 2.0 model allows us to study equilibrium and adaptation simultaneously (and nests most familiar cases including QRE) August 2007 Duke PhD Summer Camp Teck H. Ho 46 Key References for This Talk Refer to Ho, Lim, and Camerer (JMR, 2006) for important references CH Model: Camerer, Ho, and Chong (QJE, 2004) QRE Model: McKelvey and Palfrey (GEB, 1995); Baye and Morgan (RAND, 2004) EWA Learning: Camerer and Ho (Econometrica, 1999), Camerer, Ho, and Chong (JET, 2002) August 2007 Duke PhD Summer Camp Teck H. Ho 47 August 2007 Duke PhD Summer Camp Teck H. Ho 48
© Copyright 2026 Paperzz