Poker - CSE, IIT Bombay

Poker and AI
How the most “stable”
creature on earth got used to
that good old game from the
west!
A game of (p)luck!
• Cards:
– 2 Blinds
– Flop :
– Turn :
– River :
3 community cards
1 more community card
1 last community card
• Betting rounds after every card deal/flip
• Fold OR Call (Check) OR Raise (Bet)
• Showdown, if you get there
Poker as a non trivial act of
intelligence
Phil Hellmuth
Mike Matusow
Phil used my knowledge of Phil against me
Ain’t this an AI seminar?
• Games have always been an allure to AI
theoreticians.
• Game of incomplete information
• Several successful implementations:
BluffBot(Teppo Salonen), Polaris(Univ of
Alberta), Poki, Casper… will see some.
• AAAI Annual Poker Competition :
http://www.cs.ualberta.ca/~pokert/
The essence of Poker
• Hand Strength & Hand Potential : Assess the
strength of the current hand.
– Cards in game
– Number of players in the game
– Position of the player
– History
– Draws
– Risks
The essence of Poker
• Pot Odds
– Pot odds are the relative odds of the bet v/s the
total pot compared with the odds of winning
– Example: If the cards in hand are A(H)-A(D). And
the cards on board are A(C)-2-3-7-?. Then the
odds of getting a very strong hand after the river
are 5:13.
– The pot odds for a $10 bet on $40 pot are 1:4
while on a $10 pot are 1:1.
– The first favorable, not the second.
The essence of Poker
• Bluffing & Unpredictability
– Different strategies in similar situations
– Element of non determinacy
• Opponent Modeling
– Used to guess the opponents’ cards based on
history
LOKI & POKI
A look at how
The Experts do it!
Encoding the Problem
• Probability triples – simplicity itself
Pr := ( f , c , r )
“Marvin thinks for an eon and comes up with the
three magic numbers to make tea!”
The output of all analysis at any game point
is the probability with which poki folds or
calls or raises. The final decision is non
deterministic adding natural noise.
Building the system
• Pre-flop strategies : Almost zero information
guess!
• How do humans start: Sklansky’s rankings
– Collected into groups of similar cards (as far as
poker is concerned) and categorized into 8
groups, of decreasing strength
– Tuned for 10 player games, not considering
opponent characteristics
• A Rule based system on this information
Man as a hand-wavy standard
• Moving away from External information:
– Eliminate the use of human knowledge
whenever possible
– calculated information may be quantitative
rather than qualitative
– The algorithmic approach can be applied to many
different specific situations (such as having
exactly six players in the game)
Rebuilding the system
• Roll Out simulations
– Pre-flop blinds called by all players and then
checks till the showdown. Then probability of
winning with a pair of cards gives the Income
rate
– Coarse
• Iterated Roll Out simulations
– Income rates in the first simulation decides
whether a player calls or folds pre-flops.
– This value stabilizes
Think!
Hand Strength
Hand Potential
Effective
Hand Strength
Probability
Triple
Random
Number
Generator
Hand Strength
• Hand Strength is the probability that a given
hand is better than that of an active opponent
– How? Calculate all possible hands that can be
made with the current hand, and also those that
are better / equal / worse than ours
• Extrapolate to n-opponents by raising the
found probability to n
HSn = (HS1)n
Hand Potential
• Positive Potential: Of all possible games with
the current hand, we calculate all scenarios
where Poki is behind but ends up winning.
• Negative Potential: Of all possible games with
the current hand, we calculate all scenarios
where Poki is ahead but ends up losing.
Hand Strength
Hand Potential
Effective
Hand Strength
Probability
Triple
Random
Number
Generator
Effective Hand Strength
Pr(win)
= Pr(ahead)×Pr(opponent does not improve) +
Pr(behind)×Pr(we improve)
= HS ×(1 − NPot) +(1 − HS)×PPot.
= HS + (1 − HS)×PPot.
= HSn + (1− HSn)×Ppot (multiple opponents)
Hand Strength
Hand Potential
Effective
Hand Strength
Probability
Triple
Random
Number
Generator
Adding Sophistication
• All card pairs at a given point of time not
equally likely
• Maintain a weighting table that stores the
probability for each card pair he/she may be
holding at the given point in game depending
on history.
• re-weighting : update to this table on every
move.
EHSi = HSi + (1− HSi)×Ppot,i
“No poker strategy is complete without a
good opponent modeling system”
Fold
Bet
Call
Neural Net
Inputs
A Neural Net trained for an
opponent fed 19 game
characteristics and
outputs a probability
triple of for the
opponents next action.
There are other ways to
make money
CASe based Poker playER
• Stores a large case base obtained through the
simulation of other bots (Loki/Poki)
• For a particular situation calculates similarity
value for each case and sort them (quick sort)
• Take cases up to a threshold of 97% or top 20
(which ever applicable)
• Find probability (f, c, r) ,i.e., the frequency of
various decisions taken in there cases.
CASe based Poker playER
• Performs well against other bots and against
real opponents in play money games
• Testing in real money games was expensive!!
Reasons given for this
– Insufficient real money cases
– Different strategy adopted by people
Evolving Adaptive Play
Loose
Tight
Passive
Aggressive
Evolution starts
A particular human trait is represented by a matrx which
stores informations like probability tuple in various game
situations
Evolution
• Matrices corresponding to the new generation
are formed by randomizing/swapping some
values in the matrix.
• The most promising matrices are selected
through multiple game plays.
• The final set of matrices correspond to the
best solution in the current playing
environment.
• Can adapt to any change in the strategy of
other players
Evolution: Martians can’t exist on Earth
Wtight(Atight) > Wtight(A)
Wloose(Aloose) > Wloose(A)
Wtight(Atight) > Wtight(Aloose)
Wloose(Aloose) > Wloose(Atight)
Wx : Performance in ‘x’ environment
Ay : Program developed in environment ‘y’
Human traits are generally not fixed and
their domain is not so small 
Stereotypes
• People play with certain “prejudiced” strategies.
Extensive statistics collected to jot down possible
stereotypes
• In an early game, lack of data hampers effective
opponent modeling : use stereotypes
• Extend the idea to the whole game.
Stereotypes are various game-play styles adopted
by various peoples recorded by watching a large
number of games
A Façade used to match the decisions taken by
the player at each betting round. The stereotype
with the least mean square deviation chosen as
the match
The actual
stereotype then
used to guess the
action of the player
in future
Poker and Game Theory
How to find the “optimal” strategy in
the game of imperfect information –
poker?
Applications of Game Theory
• To mathematically capture behavior in strategic
situations, in which an individual's success in
making choices depends on the choices of others
• In an equilibrium, each player of the game has
adopted a strategy that they are unlikely to
change, e.g. Nash Equilibrium applied to Climate
Change Models
A One Card Poker
OPENER
DEALER
ACE
DEUCE
TREY
How is the game played?
A One Card Poker
OPENER
DEALER
2. Put $ 100
1. Dealer Deals
2. Put $ 100
3. Check or Bet depending on
how the other player plays!!
One card poker – decision tree
Dealer bets =>
Showdown
Opener Bets
Dealer folds
=>Opener wins
Opener has a
choice
Dealer Calls =>
Showdown
Opener Checks
Opener Calls =>
Showdown
Dealer Bets
Opener Folds
=> Dealer wins
The tree goes to a maximum depth of 3
A One Card Poker – typical situation
OPENER
DEALER
What to do???
Is he bluffing?
I Bet!!
DEUCE
Assumption: Obvious Plays and Stupid
Mistakes
1.
2.
3.
4.
Folding the trey (3)
Calling with the ace
Checking with the trey “in position”
Betting with the deuce
Strategic Plays and Expected Value
Consider the following variables:
p1 = probability the opener bluffs with the ace,
p2 = probability the opener calls with the deuce,
p3 = probability the opener bets with the trey,
q1 = probability the dealer bluffs with the ace,
q2 = probability the dealer calls with the deuce.
Opener’s post-ante expected value
• There are three possible non-zero post-ante results
for the opener. Either he loses $100, wins $200, or
wins $300. We will begin by computing the
probabilities of each of these outcomes.
Case 1: The opener has the ace, the dealer has the deuce
P(-100 $) = p1q2, P(200 $) = p1(1 - q2), P(300 $) = 0
Case 2: The opener has the ace, the dealer has the trey (3)
P(-100 $) = p1, P(200 $) = P(300 $) = 0
Opener’s post-ante expected value
Case 3: The opener has the deuce (2), the dealer has the ace
P(-100 $) = 0, P(200 $) = 1 – q1, P(300 $) = q1p2
Case 4: The opener has the deuce (2), the dealer has the trey
P(-100 $) = p2, P(200 $) = P(300 $) = 0
Case 5: The opener has the trey (3), the dealer has the ace
P(-100 $) = 0, P(200 $) = 1 - (1 - p3)q1 , P(300 $) = (1 p3)q1
Case 6: The opener has the trey (3), the dealer has the deuce
P(-100 $) = 0, P(200 $) = 1 - p3q2 , P(300 $) = p3q2
Game Theoretic Analysis
The opener’s total Expected Value for the entire
hand is:
[q1(3p2 − p3 − 1) + q2(p3 − 3p1) + (p1 − p2)] / 6
If q1 = q2 = 1/3; EV = - 1/18 and this does not
depend on the opener’s choices of the numbers
p1, p2, and p3
Optimal strategy: Game Theoretic
Analysis
• The opener has an advantage in the game.
The only way for the dealer to prevent the
opener from being able to seize back some
of this advantage is to play the indifferent
strategy,
q1 = q2 = 1/3
• It is for this reason that the indifferent
strategy is more commonly referred to as
the “optimal” strategy.
Game Theory – How to win?
You cannot win with the optimal strategy,
but you can make sure you don’t lose.
Game Theory – How to win?
• So the object of the game is not to play optimally.
It is to spot the times when your opponent is not
playing optimally, or even to induce him not to
play optimally, to recognize the way in which he is
deviating from optimality, and then to choose a
non-optimal strategy for yourself which capitalizes
on his mistakes. You must play non-optimally in
order to win. To capitalize on your opponent’s
mistakes, you must play in a way that leaves you
vulnerable.
Game Theory – to the other games
Perfect Information
Imperfect Information
No chance
Chess
Go
Inspection Game
Battleships
Chance
Backgammon
Monopoly
Poker
Interesting finds in Game Theoretical Poker Research:
•Gautam Rao, a poker expert said about PsOpti : You have a very strong
program. Once you add opponent modeling to it, it will kill everyone
•In poker, knowing the basic approach of the opponent is essential,
since it will dictate how to properly handle many situations that arise.
Some players wrongly attributed intelligence where none was present
References
• Billings, Davidson, Schaeffer, Szafron; The challenge of
poker, 2002
• Billings, Davidson, Schaeffer, Szafron; Opponent modeling
in poker, 1998
• Luigi Baron, Lyndon While; Evolving Adaptive play for
simplified poker, 1998
• Watson and Rubin, Case Based Poker Bot, 2008
• Layton, Vamplew, Turville; Using stereotypes to improve
early match poker play, 2008
• Jason Swanson, Game Theory and Poker, 2005
• D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T.
Schauenberg, and D. Szafron Approximating GameTheoretic Optimal Strategies for Full-scale Poker