Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam [email protected] ID: 9012828100 Goal Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game. Develop a better strategy with ANN that beats the Dealer’s 17 points rule. Win % Tie % Player’s random moves 31% 8% Dealer’s 17 points rule 61% 8% Blackjack Draw cards from a deck of 52 cards to a total value as close to 21 as possible. Simplify Blackjack to allow only hit or stand in each turn. Reinforcement Learning Map situations to actions such that the reward value is maximized. Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error. Update winning probability of the intermediate states after each game. The winning probability of each state converges as the learning parameter decreases after each game. Result table from learning The first 5 columns = dealer’s cards next 5 columns = the player’s cards Card sorted in ascending order Column 11 = the winning probability of each state Column 12 & 13 = action taken by the player Action [1 0] -> “hit” [0 1] -> “stand” and [1 1] -> end state 2.0000 5.0000 0 0 2.0000 5.0000 0 0 2.0000 5.0000 10.0000 0 0 0 0 6.0000 6.0000 4.0000 6.0000 4.0000 6.0000 0 6.0000 6.0000 0 0 7.0000 0 0 0 0.3700 0.2500 0 1.0000 1.0000 0 0 1.0000 1.0000 MLP and game flow MLP Configurations Normalization in feature vectors, and scaled to range of -5 to 5. Max. Training Epochs: 1000, epoch size = 64 Activation function (hidden layer)=hyperbolic tangent Activation function (output layer) = sigmoidal MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%. MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%. MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%. MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%. Experiment Results When dealer uses 17-point rule: Strategy Win % Tie % Player with MLP 56.5% 9% When player uses random moves: Strategy Win % Tie % Dealer with MLP 68.2% 3% When both dealer and player use MLP: Strategy Win % Tie % Player with MLP 54% 3% Dealer with MLP 43% 3% Conclusion MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly. Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand. As the number of game increases, the game strategies will change over time. Future work Current hand depends on the last hands. Use card memory in Blackjack. Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern Train ANN to play against different experts so that it can pick up various game strategies Include game tricks and strategies in a table for the ANN to look up Explore other learning methods
© Copyright 2026 Paperzz