Game Theory-Based Opponent Modeling in Large Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried Traditionally two approaches • Game theory approach (abstraction+equilibrium finding) – Safe in 2-person 0-sum games – Doesn’t maximally exploit weaknesses in opponent(s) • Opponent modeling – Get-taught-and-exploited problem [Sandholm AIJ-07] – Needs prohibitively many repetitions to learn in large games (loses too much during learning) • Crushed by game theory approach in Texas Hold’em…even with just 2 players and limit betting • Same tends to be true of no-regret learning algorithms Let’s hybridize the two approaches • Start playing based on game theory approach • As we learn opponent(s) deviate from equilibrium, start adjusting our strategy to exploit their weaknesses The dream of safe exploitation • Wish: Let’s avoid the get-taught-and-exploited problem by exploiting only to an extent that risks what we have won so far • Proposition. It is impossible to exploit to any extent (beyond what the best equilibrium strategy would exploit) while preserving the safety guarantee of equilibrium play • So we give up some on worst-case safety … Deviation-Based Best Response (DBBR) algorithm (can be generalized to multi-player non-zero-sum) Dirichlet prior • Many ways to determine opponent’s “best” strategy that is consistent with observations – L1 or L2 distance to equilibrium strategy – Custom weight-shifting algorithm – ... Experiments • Performs significantly better in 2-player Limit Texas Hold’em against trivial opponents, and weak opponents from AAAI computer poker competitions, than game-theory-based base strategy • Can be turned on only against weak opponents • Examples of winrate evolution:
© Copyright 2026 Paperzz