Sandholm slides - Duke Computer Science

Game Theory-Based Opponent Modeling
in Large Imperfect-Information Games
Tuomas Sandholm
Carnegie Mellon University
Computer Science Department
Joint work with Sam Ganzfried
Traditionally two approaches
• Game theory approach (abstraction+equilibrium finding)
– Safe in 2-person 0-sum games
– Doesn’t maximally exploit weaknesses in opponent(s)
• Opponent modeling
– Get-taught-and-exploited problem [Sandholm AIJ-07]
– Needs prohibitively many repetitions to learn in large games
(loses too much during learning)
• Crushed by game theory approach in Texas Hold’em…even with just 2
players and limit betting
• Same tends to be true of no-regret learning algorithms
Let’s hybridize the two approaches
• Start playing based on game theory approach
• As we learn opponent(s) deviate from equilibrium,
start adjusting our strategy to exploit their weaknesses
The dream of safe exploitation
• Wish: Let’s avoid the get-taught-and-exploited problem by
exploiting only to an extent that risks what we have won so far
• Proposition. It is impossible to exploit to any extent (beyond
what the best equilibrium strategy would exploit) while
preserving the safety guarantee of equilibrium play
• So we give up some on worst-case safety …
Deviation-Based Best Response (DBBR) algorithm
(can be generalized to multi-player non-zero-sum)
Dirichlet prior
• Many ways to determine opponent’s “best” strategy
that is consistent with observations
– L1 or L2 distance to equilibrium strategy
– Custom weight-shifting algorithm
– ...
Experiments
• Performs significantly better in 2-player Limit Texas
Hold’em against trivial opponents, and weak
opponents from AAAI computer poker competitions,
than game-theory-based base strategy
• Can be turned on only against weak opponents
• Examples of winrate evolution: