Learning BlackJack with ANN (Aritificial Neural Network)

Learning BlackJack with
ANN (Aritificial Neural
Network)
Ip Kei Sam
[email protected]
ID: 9012828100
Goal
 Use Reinforcement Learning algorithm to
learn strategies in Blackjack.
 Train MLP to play Blackjack without explicitly
teaching the rules of the game.
 Develop a better strategy with ANN that beats
the Dealer’s 17 points rule.
Win %
Tie %
Player’s random moves
31%
8%
Dealer’s 17 points rule
61%
8%
Blackjack
 Draw cards from a deck of 52 cards to a total value
as close to 21 as possible.
 Simplify Blackjack to allow only hit or stand in each
turn.
Reinforcement Learning
 Map situations to actions such that the reward value is
maximized.
 Decide which actions (hit/stand) to take by finding the
actions that yields the highest reward through trial and
error.
 Update winning probability of the intermediate states
after each game.
 The winning probability of each state converges as
the learning parameter decreases after each game.
Result table from learning
 The first 5 columns = dealer’s cards
 next 5 columns = the player’s cards
 Card sorted in ascending order
 Column 11 = the winning probability of each state
 Column 12 & 13 = action taken by the player
 Action [1 0] -> “hit”
 [0 1] -> “stand” and [1 1] -> end state
2.0000 5.0000 0
0
2.0000 5.0000 0
0
2.0000 5.0000 10.0000 0
0
0
0
6.0000 6.0000
4.0000 6.0000
4.0000 6.0000
0
6.0000
6.0000
0
0
7.0000
0
0
0
0.3700
0.2500
0
1.0000
1.0000
0
0
1.0000
1.0000
MLP and game flow
MLP Configurations
 Normalization in feature vectors, and scaled to range







of -5 to 5.
Max. Training Epochs: 1000, epoch size = 64
Activation function (hidden layer)=hyperbolic tangent
Activation function (output layer) = sigmoidal
MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2.
89.5%.
MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2.
91.1%.
MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2.
92.5%.
MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2.
90.2%.
Experiment Results
When dealer uses 17-point rule:
Strategy
Win %
Tie %
Player with MLP
56.5%
9%
When player uses random moves:
Strategy
Win %
Tie %
Dealer with MLP
68.2%
3%
When both dealer and player use MLP:
Strategy
Win %
Tie %
Player with MLP
54%
3%
Dealer with MLP
43%
3%
Conclusion
 MLP network works best for highly random
and dynamic games, where the game rules
and the strategies are hard to define and the
game outputs are hard to predict exactly.
 Strategies interpreted from Reinforcement
Learning - Hit if less than 15, otherwise stand.
 As the number of game increases, the game
strategies will change over time.
Future work
 Current hand depends on the last hands. Use
card memory in Blackjack.
 Train ANN with a teacher to eliminate
duplicate patterns (for example, 4 + 7 = 7 + 4
= 5 + 6 = …) and identify misclassified pattern
 Train ANN to play against different experts so
that it can pick up various game strategies
 Include game tricks and strategies in a table
for the ANN to look up
 Explore other learning methods