Reinforcement Learning for Blackjack

Reinforcement Learning for Blackjack
Saqib A Kakvi
Supervisor: Dr Marco Gillies
April 30, 2009
ii
Abstract
This report explores the development of an Artificial Intelligence system for an already existing framework of card games, called SKCards. This system was developed by the author as a way of learning Java.
The old Artificial intelligence is highly flawed.
Reinforcement Learning was chosen as the method to be employed. Reinforcement Learning attempts
to teach a computer certain actions, given certain states, based on past experience and numerical rewards
gained. The agent either assigns values to states, or actions in states.
This will initially be developed for Blackjack, with possible extensions to other games. Blackjack is one
of the simpler games and the only game in the SKCards package which needs an Artificial Intelligence
agent. All the other games are single player.
iii
iv
Acknowledgements
I am hugely grateful to my project supervisor, Dr. Marco Gillies, and all the members of staff, for all the
support and assistance they gave me. I would also like to thank my colleagues, whose help was invaluable.
In particular I would like to thank David Jarret, Rohan Sherrard, Daniel Smith, Benoy Sen, Jusna Begum,
Arron Donnely & Josephene Amirthanayagam. And a very special thank you to Nidhi Shah and Fatima
Nassir for their help in proofreading this report.
v
vi
Contents
1
Introduction
1.1 The problem I am solving . . . . . . . .
1.1.1 Reinforcement Learning Agent .
1.1.2 SKCards . . . . . . . . . . . .
1.2 The layout of the report . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
1
Background
2.1 Blackjack . . . . . . . . . . . . . . .
2.1.1 Overview . . . . . . . . . . .
2.1.2 Game play . . . . . . . . . .
2.2 Reinforcement Learning . . . . . . .
2.2.1 Introduction . . . . . . . . . .
2.2.2 Rewards . . . . . . . . . . . .
2.2.3 Policies . . . . . . . . . . . .
2.2.4 Agent Behaviour . . . . . . .
2.3 Blackjack in Reinforcement Learning
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
3
5
5
5
6
6
7
3
Project Description
3.1 Technologies involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
9
4
System Development
4.1 Prototyping . . . . . . . . . . . . . . . . . .
4.1.1 Prototype 1 - Next card higher/lower .
4.1.2 Prototype 2 - GridWorld . . . . . . .
4.2 Final System . . . . . . . . . . . . . . . . .
4.2.1 AI . . . . . . . . . . . . . . . . . . .
4.2.2 SKCards . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
15
15
16
16
20
System Evaluation
5.1 System testing . . . . . . . . .
5.2 Results . . . . . . . . . . . . .
5.2.1 Table . . . . . . . . .
5.2.2 Graphs . . . . . . . .
5.3 Result Analysis . . . . . . . .
5.4 Possible Future Improvements
5.4.1 Card Counting . . . .
5.4.2 Probabilities . . . . .
5.4.3 Busting . . . . . . . .
5.4.4 SKCards . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
22
23
30
30
30
31
31
31
2
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
viii
6
CONTENTS
Summary and Conclusions
6.1 Summary of the Project
6.1.1 Goals . . . . .
6.1.2 Outcomes . . .
6.2 Conclusion . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
33
33
33
Project Self-Evaluation
7.1 Introduction . . . .
7.2 Project outcomes .
7.3 Project planning . .
7.4 Self-evaluation . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
35
35
35
A Card High/Low prototype
A.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
42
B GridWorld prototype
B.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
67
C Blackjack AI
C.1 Source Code . . .
C.1.1 Face . . .
C.1.2 Suit . . .
C.1.3 Card . . .
C.1.4 Pack . . .
C.1.5 Blackjack
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
83
83
83
84
89
93
D Final System
D.1 Source Code . . .
D.1.1 Face . . .
D.1.2 Suit . . .
D.1.3 Card . . .
D.1.4 Pack . . .
D.1.5 Shoe . .
D.1.6 BJPlayer
D.1.7 BJUI . .
D.1.8 BJRules .
D.1.9 RLPlayer
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
109
109
109
109
109
109
110
114
119
122
7
E Sample Blackjack Game
.
.
.
.
.
.
.
.
125
List of Figures
3.1
3.2
3.3
3.4
Old System Architecture
New System Architecture
Use Cases . . . . . . . .
Single Episode . . . . .
.
.
.
.
11
12
13
14
4.1
The frequency of the extended actions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
5.1
5.2
5.3
5.4
5.5
Q-Learing Policies .
Win Percentages . .
Loss Percantages . .
Bust Percantages . .
Net Win Percentages
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
27
28
29
A.1
A.2
A.3
A.4
A.5
A.6
A.7
A.8
A.9
A.10
A.11
A.12
A.13
A.14
Values for Joker .
Values for Ace . .
Values for 2 . . .
Values for 3 . . .
Values for 4 . . .
Values for 5 . . .
Values for 6 . . .
Values for 7 . . .
Values for 8 . . .
Values for 9 . . .
Values for 10 . .
Values for Jack .
Values for Queen
Values for King .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
44
45
46
47
48
49
50
51
52
53
54
55
56
B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8
B.9
B.10
B.11
B.12
B.13
Episode 1 . .
Episode 25 .
Episode 50 .
Episode 100 .
Episode 150 .
Episode 200 .
Episode 250 .
Episode 500 .
Episode 750 .
Episode 1000
Episode 1500
Episode 2000
Episode 2500
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
69
70
71
72
73
74
75
76
77
78
79
80
81
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
LIST OF FIGURES
List of Tables
5.1
Final Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
A.1 Final Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
B.1 Predicted values for the GridWorld problem . . . . . . . . . . . . . . . . . . . . . . . . .
68
xi
xii
LIST OF TABLES
Chapter 1
Introduction
1.1
The problem I am solving
This project has two goals. The major goal is the development of a Reinforcement Learning agent for
Blackjack, using parts of an existing system. The second and less important goal is to improve the design
of the existing system, so as to comply with software engineering norms. These goals may be solved
independently and then combined to make the final system.
1.1.1
Reinforcement Learning Agent
When the initial system was built, it lacked a proper Artificial Intelligence Agent for the player to play
against. This project aims to build an AI agent using Reinforcement Learning for Blackjack. This agent
can be the dealer, or it can be another player in the game. Reinforcement Learning is very suited for the
task of Blackjack due to the easily represented states. And the agent could learn to match a user’s style of
play, over time.
1.1.2
SKCards
SKCards was built a as framework for all card games. It provides the basic Cards, Packs of Cards and
a standard interface. The concept is that any programmer could create their own card game and easily
integrate it into SKCards. This, would give us an infinitely extendable suite of card games. This was
created as an experiment with the Java programming language.
1.2
The layout of the report
This report will aim to break down the problem and present it in several smaller pieces. Each piece will
logically follow from the next and will try to build on top of the last. In this way this report should give a
smooth transition from theory to implementation and finally the results and conclusions.
The next chapter will discuss the separate parts of the project, namely, Blackjack and Reinforcement
Learning. Then it discusses the two together. The following chapter will discuss the design aspects of the
system. After that the development of the system will be looked at. This will detail the implementation of
the system. Following that will the evaluation of the system. This will detail the experiments, the results
and an analysis of these results. Finally, the last chapter will cover the self-evaluation.
1
2
CHAPTER 1. INTRODUCTION
Chapter 2
Background
2.1
Blackjack
2.1.1
Overview
Blackjack is a very simple game. The object is to get as close to, but not over 21 points in a maximum
of 5 cards. This is achieved is by assigning values to each of the cards in the deck, irrespective of suit, as
listed below:
2 - 10 ⇒ Face value (i.e. a 2 is 2 points, a 3 is 3 points)
Courts (Jack, Queen, King) ⇒ 10 points
A ⇒ 1 or 11 points. The player decides if he wants to the ace “high” (11) or “low” (1). This can be changed
in a situation where a “high” ace would cause a player to be bust.
2.1.2
Game play
The player is dealt 2 cards face down. The dealer then deals themselves 2 cards, 1st face up then face
down. The player places a bet on his hand and turns over his cards. The player may then take any of several
actions, listed below. These have been divided into core and extended rules, where the core rules make for
the basis of the game, and the extended rules give additional gameplay.
Core Rules
HIT: The player may ask for another card from the pack to be added to their hand, this is known as a
“hit” or “twist” (the term “hit” shall be used henceforth). The dealer “hits” the player’s hand. The player’s
score is increased by the point value of the card added. If the player’s score exceeds 21, they are “bust” and
loses on that hand. If not, they can continue “hitting” until they are bust or have 5 cards in their hand.
STAND: If the player is satisfied with their hand they may “stand”. This means that he is done with the
current hand. The player then moves on to play any other hands they have(see Split in section 2.1.2). After
“standing” on all their hands, the player ends their turn.
Extended rules
These rules are extensions of the rules in the previous section. They provide further functionality, however they are mostly based on the core rules.
BLACKJACK: The quickest way to win. A blackjack is an Ace and a Court Card (King, Queen or
Jack) bringing the player up to a total of 21, known as a “natural”. The player receives 2.5 times the money
he was supposed to receive. This situation does not apply if the player has an Ace and a Ten. This rule was
introduced in casinos to increase popularity of the game. Originally it only applied to an Ace and the Jack
of Clubs, the “Black Jack”, which is where the game derives it name from.
3
4
CHAPTER 2. BACKGROUND
INSURANCE: If the dealer’s exposed card is an ace , threatening a Blackjack, the player may take out
“insurance” against a Blackjack. They place another bet on the table. If the dealer has Blackjack, they gets
a 2-to-1 payout. If the dealer doesn’t have a Blackjack, the player loses that money. As the name states,
this serves as the player’s backup, should they lose to a dealer blackjack.
SPLIT: If both the cards have the same face value (two Queens, two Fives, two Sevens. . . ) the player may
opt to “split” their hand into two separate hands. The cards are separated and one card is added to each
hand to make two cards in each hand. The player then places the same bet on the second hand. A player
may have up to 5 hands.
DOUBLE DOWN: If the total points of the players cards is greater than or equal to 11 the player may opt
to “double down”. In this case the player doubles their bet and the player’s hand is hit, after which they
are forced to stand, provided the player is not bust. This rule is basically a gamble by the player that they
will need no more than 1 Card to reach a favourable score. This increased risk is why there is an increased
reward.
After the player is finished, the next player plays in the same manner. After all players have finished
playing, play moves to the dealer, who plays in a similar manner. After the dealer plays, the winner is the
hand which is closest to 21. It must be noticed that the dealer generally has some restrictions imposed on
them to make it fair on the players.
Dealer restrictions normally come in the form of a hitting threshold, a score above which they can no
longer hit. It must be noted that a dealer can hit on high counting hands that are “soft” i.e. they have an
Ace counting high. The standard policy in casinos tends to be a threshold score of 17.
Rule variations
The rules listed above are mostly in effect in the United States of America. There are some variations,
specifically on the British version of the game, know as Pontoon. The details of this game are described
in Parlett (1994). Although the basic principles of the game are the same, there are several rule variations.
These variations are described below.
In this game, the role of dealer rotates between the players. The other players are called punters. The
dealer also places a stake and plays as normal. Listed below are some of the other rule variations:
BUY: This is a variation on the hit rule. Under this rule, the player may request for a card to be dealt
face-down to them. This allows the player to keep their score secret. To do this the player must pay an
amount greater than what they paid for the last card, where applicable, and a less than their total stake. The
fifth card of the hand may not be bought.
5-CARD TRICK: If a player has a hand of 5 cards and has not gone bust, then they automatically win.
They receive a payment equivalent double to their stake.
PONTOON: In Pontoon Tens, Jacks, Queens & Kings are called tenths. A regular Pontoon is any tenth
and an Ace. A Royal Pontoon is three sevens. The payouts for a Pontoon and Royal Pontoon are 2 times
and 3 times respectively. Royal Pontoons can only be claimed by a punter. If the dealer holds a pontoon,
then each punter pays the dealer double their stake. If the punter holds a Pontoon, they only pay an amount
equal to their stake.
BANKER’S PLAY: The banker plays in a similar way as the punters, however there are some restrictions
on their play. The banker is not allowed to split pairs. The players may only beat the dealer if they hold
a score of 16 or higher. After the dealer has stood and is not bust, at a score of s < 21, they make a call
of “pay s + 1“. Any players holding hands scored at s + 1 or greater must reveal their hands and claim
payment. A Royal Pontoon is unbeatable.
A sample game which illustrates most of the gameplay and the basic and extended rules can be found in
Appendix E.
2.2. REINFORCEMENT LEARNING
2.2
2.2.1
5
Reinforcement Learning
Introduction
Reinforcement learning is an unsupervised Artificial Intelligence learning method in which the agent
teaches itself. The central principle is based on rewards for a sequence of actions taken or sequence of state
transitions. Given a state S, the agent will take an action A, based on its knowledge of the environment.
This will lead to a change in the environment, and eventually a reward, either positive or negative, will be
given. The agent will then take this into consideration and accordingly adjust its knowledge. It must be
noted that rewards are not given after every action, but may be after a series of actions.
The agent has a preference to each possible action for a given state. This is based on what is known as
the action’s value. This is a numeric value attached to each action A, given a state S. Every reward will
adjust the action’s value and hence make it more or less desirable. There are several ways to value action
and the choice is mainly arbitrary. It depends mainly on the developers understanding of the system and
their preference as to how the agent should behave.
Reinforcement Learning agents can act based on 3 things: values of states, values of actions or values of
actions in states. The choice of which method to employ is left to the developer. This choice is made based
on the developer’s understanding of the environment and his ability to represent it accurately. It must be
noted that an agent can learn in more than one way from the same environment. Action valuing is only
applicable in very trivial tasks and hence is not used as much as state or state-action valuing.
The agent learns based on 2 things, namely the current reward and the previous rewards for A, S, or A &
S. To simplify calculations and storage, we simply take the average of the previous rewards and update it.
As the rewards are arbitrary, although mostly people use +1, -1 and 0, the agent will not know how good a
reward is. By comparing it to the average of the past rewards, it can judge how high or low the reward was.
A “high” reward would increase the value of the action and a “low” reward would reduce it.
How exactly rewards are allocated is a bit of a mathematical quandary. Some methods allow rewards to
be “passed back” to previous actions. This follows directly from the fact that rewards are allocated after
a series of actions. This is so a low valued action which can be followed by a high valued action seems
more attractive. If this were not done then the action values would not be balanced and would only reflect
immediate rewards.
2.2.2
Rewards
The basic value method is based on discounted expected future returns. As mentioned above, when an
agent receives a reward, it passes back some of that reward. However this is not always back all the way
and may just be to the previous state. By doing this, when the agent next encounters the newly rewarded
state, it passes back to the previous state and so on and so forth. Based on these values, and the probability
of the states occurring, we calculate the expected reward. So in effect we are summing the rewards of the
given possible states over their probability distribution, from 0 → ∞.
This is not always possible to do, due the large problem space. Instead whenever a reward is received,
all previous states or state-action pairs are updated by a fraction of that reward. This fraction is a constant
call the discount rate and is denoted by γ. It follows that a state S1 n steps from state S2, which receives a
reward of r, will get a reward of γ n × r, where 0 < γ < 1.
It is very important, especially for stochastic tasks, to minimise the effect noise will have on the learning
algorithm. To do this, we use what is call the step-size parameter, denoted by α. This means that only
a small portion of any reward will be passed back, mitigating any effects of noise in the data. valid data
6
CHAPTER 2. BACKGROUND
will be observed several times, which will eventually add up to its true value. The value of the step-size
parameter is 0 < α < 1.
2.2.3
Policies
By assigning rewards to states or state-action pairs, the agent learns what is known as a policy, denoted
by P. The policy is the set of actions the agent will take in a given state. The aim of Reinforcement
Learning is to learn a policy which is optimal or as close to the optimal policy as possible, denoted by P ∗ .
For some problems, such as the grid world, the optimal policy is defined. However for stochastic and/or
non-stationary problems, this is much harder and some other statistic must be used to determine if the agent
has learnt the policy it was required to.
Most Reinforcement Learning techniques have been mathematically proven to converge on P ∗ for the
limit n → ∞, where n is the number of runs. However as it is not always possible to learn the optimal
policy in a reasonable amount of time, so the focus on is to learn a policy that is as close as possible to P ∗ ,
of not exactly P ∗ .
Reinforcement learning is a very powerful Artificial Intelligence tool. This very closely mimics the way
humans learn by association. For every set of actions taken, there is a reward and this is associated to those
actions, relative to past experience. However not every action is rewarded independently, but instead the
entire series of actions receives a reward, which is mathematically calculated based on the whole reward.
Hence it is not as powerful as human association, but it is none the less, very close to it.
What makes it even more powerful is that it is an unsupervised technique. So the agent can run through
thousands of plays on its own and learn some policy P. With supervised learning, the user would be required
to tell the agent something after every play. This way it saves time and completely eliminates any human
error in the learning process. Assuming the implementation is done correctly, as stated above, it will, in the
limit n → ∞, converge on P ∗ .
2.2.4
Agent Behaviour
Agents can behave in several ways. The primary question with Reinforcement Learning is ”exploitation
vs. exploration.” An agent can either exploit what it believes to be the best action, or it can explore new
options, or ideally it can do some of both. The real trick is striking the correct balance between the two
factors. Exploration is best for learning, as the agent will cover as much of the state space as possible,
whereas exploitation is very useful after the agent ahs learnt a good policy.
The simplest exploitative and most intuitive behaviour is called a greedy agent. This agent will always
choose the action with the highest reward value, known as exploitation. In this way it is guaranteed large
immediate rewards, but possibly lower rewards in the long run. I believe given a small number of actions
needed to reach a goal, this is the best approach. But it must still be said that some lowly valued actions
may lead to much higher rewards than greedy actions.
In slight contrast to a greedy agent, we have what is known as an ε-greedy agent, which is more explorative. This agent explores other actions with lower reward values with a small probability ε. All
non-greedy actions have an equal chance of being chosen. This allows the agent to explore more alternatives and possibly find a better action than the current greedy action. However in my opinion, this is not a
very balanced way of having the agent behave as it may take a very poor action as likely as the next best
action.
2.3. BLACKJACK IN REINFORCEMENT LEARNING
7
A more explorative agent behaviour is called a softmax selection agent. It takes all actions with a probability proportionate to their value. It creates a probability distribution according to the formula below:
eQt (a)\τ
Pn
Qt (a)\τ
b=1 e
(Sutton and Barto, 1998)
This ensures that the agent picks better actions more often than not. This is, in my opinion, very close
to human behaviour and can be perceived as the agent being able to take risks, as it will not always take the
greediest action, but it tends to take good actions rather than bad. Also this strikes a fine balance between
exploitation and exploration. The parameter τ is called the temperature and decides how much the agent
explores. The values of the temperature are bounded as 0 < τ < 1
High temperatures cause the actions to be all (nearly) equiprobable. Low temperatures cause
a greater difference in selection probability for actions that differ in their value estimates. In
the limit as τ → 0 , softmax action selection becomes the same as greedy action selection.
(Sutton and Barto, 1998)
Most agents start off with initial values all set to 0. However based on human knowledge of the system,
optimistic initial values can be set. As they systems interacts with its environment, it will update the values
based on formula defined by the user. This gives the agent a statistical bias. However Sutton and Barto
(1998) make the point that it provides “an easy way to supply some prior knowledge about what level of
rewards can be expected.”
2.3
Blackjack in Reinforcement Learning
Taking the existing Card/Pack system, which works very well, I intend to rebuild the functionality of the
Player and Main classes. I will re-implement the basic GUI I had before, but the Player class will now be
redefined. Each game will have 2 types of players, a human player, an improved version of the previous
system, and a reinforcement learning agent. This will be the main concentration of my project.
Blackjack is a fairly simple game with minimal mathematical complexity both in terms of rules and
betting. Also a relatively small number of actions are required to reach and outcome and hence a reward.
Both these factors, make Blackjack suitable as a reinforcement learning problem.
After having made a successful agent for blackjack, it should be possible to abstract the basic functionality of the agent and hence create AIPlayer, which would integrate with my old framework and make an
infinitely extensible suite of card games, with AI playing capabilities.
Blackjack is what is known as a Markov Decision Process (MDP). Blackjack is said to be have the
Markov property because, given any state, you can easily trace back previous states, as well as the actions
leading to those states. This is the case with a large majority of Reinforcement Learning problems.
The basic state representation will be the sum of the value of the cards the player has. This is the most
important information, as far as the game is concerned. It would be wasteful to have a separate state for
each combination, as they are irrelevant, a 6 and 7 is the same is a 5 and 8.
8
CHAPTER 2. BACKGROUND
However there are some complexities involved in this. The most prominent case is the fact that an Ace
can count high (11) or low (1). Also deriving from this is the fact that when a high ace would cause a player
to be bust, it is automatically switched to a low ace. Another complexity would be enabling the agent to
split. Considering the state is only the sum, we can miss on a split, as a 5 and 5 is treated the same as a 2
and 8. Also the double down rule would be a minor complication, but this would not be due to the state
representation, but because of the inherent complexity of the rule itself.
Chapter 3
Project Description
3.1
Technologies involved
The system will be developed using the Java programming language, as the existing system is written in
Java. From this fact, it follows that any additions to the system must be in Java. To change to any other
programming language would require rebuilding the entire system, after it has been refactored. Due to the
time constraints, it was decided to continue using Java.
In terms of Reinforcement Learning, the algorithms provided in Sutton and Barto (1998) are all in
pseudo-code. Hence we can take these algorithms and implement them in any given programming language. As mentioned above, the existing system was developed in Java, hence the learning algorithms will
all be implemented in Java to be added to the system.
3.2
System Design
The old system is based on 3 main components. Firstly the Card/Pack system, which is currently the
only fully operational part. This may still be modified to improve on the current functionality. The second
part is Player interface, and its subclasses, which for the representation for the Player. Thirdly we have the
Main interface, its subclasses and SKMain which provide the GUI elements. SKMain is the Main Menu of
the program, but does not implement Main. This can be illustrated in the class diagram (see figure 3.1).
This system is not very stable, as the dependencies and interactions between Main and Player are very
complicated and tend to cause unexpected errors. The new system uses a less densely connected system to
ensure a more streamlined system and easier communication between objects. The interface “GameRules”
will be introduced to act as a conduit for communication. This is illustrated in the class diagram (see figure
3.2).
There are two basic Use Cases for the system, which a Player playing the game. This case is implemented in the old system. However this case is specific for a Human Player. The second Use Case is the
Reinforcement Learning agent actually learning how to play the game. This case actually includes the first
Use Case. This can be illustrated by the Use Case Diagram (see figure 3.3).
Based on the reading and research, it has become apparent that Blackjack is an episodic task. The
agent will learnt something each time it plays an episode, which in this case is a game of Blackjack. For
Blackjack, there can be no more than 2 states between the initial state and the final state. This is because
a Player may not have more than 5 cards in his hand. The initial state is 2 cards. Between these states
we have 3 and 4 cards as the two possible intermediate states. The agent will operate as illustrated in the
sequence diagram (see figure 3.4).
9
10
CHAPTER 3. PROJECT DESCRIPTION
However it must be noted that in some problems, there may be a reward at an intermediate state. This is
not true for Blackjack, hence this was not included in the diagram.
3.2. SYSTEM DESIGN
1
Figure 3.1: Old System Architecture
11
12
CHAPTER 3. PROJECT DESCRIPTION
Figure 3.2: New System Architecture
3.2. SYSTEM DESIGN
Figure 3.3: Use Cases
13
14
CHAPTER 3. PROJECT DESCRIPTION
Figure 3.4: Single Episode
Chapter 4
System Development
4.1
Prototyping
To better understand Reinforcement Learning, two prototypes were built before the final system was
built. These were built chosen in such a way that they shared some features with the final system, thus
enabling code reuse. Both of these prototypes are discussed below.
4.1.1
Prototype 1 - Next card higher/lower
This system trains an agent to guess if the next Card is higher or lower than the current card. This
was chosen so as to see how the Card/Pack system would be used in an agent. No proper Reinforcement
Learning algorithm was used, but merely a sum of all past rewards.
The agent used the points() method in the Card class to determine of a card was higher or lower. This
gives a Joker 0 points, Ace 1 point, number cards face value and Jack, Queen and King get 11, 12, and 13
respectively. The agent received a reward of +1 for guessing correctly, and -1 for guessing incorrectly. The
source code for this prototype can be found in Appendix A.1.
The prototype did not take into account when two cards where equal and just counted this as an incorrect
guess. This situation does not occur very often and was not considered for the prototype as it was simply
a proof of concept as opposed to a proper system. Results from this prototype can be found in Appendix
A.2.
This prototype learns a policy which is fairly consistent with what is expected. However as this does not
use a proper Reinforcement Learning method, it is not guaranteed to converge on P ∗ . Despite this, it uses
the softmax probability as described in section 2.2.3.
From this prototype, code that calculated softmax probabilities was made. This was carried over to
the next prototype and the final system. Also it showed how to correctly use the Card/Pack system in a
Reinforcement Learning Agent. Parts of this code was carried over to the final system.
4.1.2
Prototype 2 - GridWorld
This prototype solves what is known as the GridWorld problem. This problem has an agent in a gridbased environment. Each square is either a goal, a penalty or empty. The agent begins in a random square
and must navigate its way to the goal(s) and avoid any penalty(ies). If the agent moves to a penalty or goal
square, the episode ends. It receives a reward of +100 for a goal and -100 for a penalty. The source code
for this prototype can be found in Appendix B.1.
15
16
CHAPTER 4. SYSTEM DEVELOPMENT
For this prototype, the Monte Carlo method was used (see Sutton and Barto (1998) chapter 5). This is
mathematically proven to converge on P ∗ , which this system does. For an easier graphical representation,
states were valued instead of state/action pairs. As stated before either approach is valid and can be chosen
based on the users requirements. Some results of this can be found in Appendix B.2.
From the previous prototype, code that calculated softmax probabilities was used. This prototype created
the code for reward functions and value calculation. The code was modified, as the Monte Carlo method
was abandoned in favour of Q-Learning for the final system. However, the learning parameters were carried
over.
4.2
4.2.1
Final System
AI
As the Reinforcement Learning Agent was the primary objective of the project, it was built first. It was
decided that the agent should be built using the Temporal Difference Q-Learning Algorithm. Although
Sutton and Barto (1998) cover Blackjack under the MonteCarlo methods, Temporal Difference is a more
powerful method in general.
For this agent, only hit and stand were considered as valid actions. The rest of the actions (as listed in
2.1.2) were not considered for the system, as it was thought that the agent would not learn anything of
substance. These reasons are enumerated below:
• BLACKJACK: This would not help the agent learn anything except that a starting pair adding up to
21 is greatly desired. Hence the logical action would be to stand. The system will already learn it, as
it will always get a negative reward for hitting at 21. Although the increased payout was added into
the system, however the agent does not learn anything from this.
• INSURANCE: Taking out an insurance would not be beneficial in anyway to an AI agents policy, as
the payout of this is based on probability. Also this occurs so infrequently, that it was not considered
worth adding in.
• SPLIT: Splitting hands would only give the agent more hands to learn on. Although it would lead to
more learning, it would not reduce the learning time enough to justify its inclusion. Also due to the
complexity of being able to split many times it was not included in the final system.
• DOUBLE DOWN: Doubling down would lead to a hit and then stand. The agent would effectively
learn a policy for hitting at a score sn and standing at score sn+1 . This is would not lead to any
additional learning and may also impede the learning of the agent. As a separate action, it would not
add anything useful, as it is just a hit and a stand.
For further clarification, a graph indicating how frequently each action may have been taken is included
below. As can be seen, all the actions occur less than 10% of the time. This would not have lead to any
significant learning. Although double down occurs over 70% of the time, as stated above it would not add
anything meaningful to the learning.
4.2. FINAL SYSTEM
17
Figure 4.1: The frequency of the extended actions
18
1
2
3
4
5
CHAPTER 4. SYSTEM DEVELOPMENT
double
double
double
double
double
hit =
stand
sum =
hprob
sprob
Math . exp ( v a l [ HIT ] [ s c o r e − 1 ] /TAU ) ;
= Math . exp ( v a l [STAND ] [ s c o r e − 1 ] /TAU ) ;
hit + stand ;
= h i t / sum ;
= s t a n d / sum ;
The first step was to build the graphical user interface. This was built with the same design as the
previous system. This was to ensure that the maximum of code could be re-used in the new system, so as
to avoid any complications of porting the code from one interface to another.
After the interface was built, the underlying game mechanics were built. For the AI, only hit and stand
were considered for actions. The reasons for this have already been discussed above. To facilitate this, a
class called BPlayer was created as an inner class, to facilitate easy access. This was used instead of using
accessor methods to manipulate the object for simplicity and quick implementation.
The BPlayer was created with the basic functionality required for hit and stand. The BPlayer had no real
functionality, apart from the Reinforcement Learning, as most of the gameplay is done in the Blackjack
object. After this was established, the Reinforcement Learning methods were added. The player has an
exploration method, which uses softmax, a greedy method and a fixed method, for the dealer.
Finally the Reinforcement Learning agent was developed. This mostly used code from the prototype,
with modifications to fit into the system. The completion of the Reinforcement Learning agent makes both
system equal in functionality. Although it must be noted that new AI works, where as the old AI does not.
To achieve this, we need a state representation. For Blackjack, this is simple as it just the score of the
player minus 1. The actions were stored as numbers. The values of the state action pairs were stored in a
2-dimensional array called val. The value for action A in state (score) S is given by val[A][S-1]. The states
are represented as S-1 as Java uses 0 as the first array index and n − 1 as the nth index, hence the S-1.
The softmax method was fairly simple to implement as there are only 2 possible actions, hence only two
probabilities. These were calculated using the following code:
After these probabilities are calculated, we update the last state-action pair, if any. For the case where it
is the first action, lastaxn is set to DEAL(-1). Then a random r number is generated to choose the action.
After the action is chosen, we set the values of lastaxn and lastscore. This is achieved with the code given
below:
The above code is an implementation of the general formula:
Q(st ,at ) = Q(st ,at ) + α(rt+1 + γmaxa Q(st+1 ,a) − Q(st ,at ) )
(Sutton and Barto, 1998)
Here the reward rt+1 is 0, as no reward is given.
After the learning method, a greedy method was devised. This is for use when the agent has learnt a
policy P ≈ P ∗ . As stated previously, after exploring, the agent should exploit what it learnt. This was
implemented using the following code:
4.2. FINAL SYSTEM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
19
d o u b l e maxval = Math . max ( v a l [STAND ] [ s c o r e −1] , v a l [ HIT ] [ s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = A ∗ ( ( G ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
d o u b l e r = Math . random ( ) ;
lastscore = score ;
i f ( r > hprob ){
stand ( ) ;
l a s t a x n = STAND ;
}
else{
hit ();
l a s t a x n = HIT ;
}
i f ( v a l [ HIT ] [ s c o r e −1] > v a l [STAND ] [ s c o r e −1]){
hit ();
}
else{
stand ( ) ;
}
Finally the reward function was created. This gives a reward to the agent, based on the game outcome.
After the dealer stands, the outcome is calculated. There are 5 possible outcomes. Based on the outcome,
the player was judged to have achieved one of the 4 results and based on that it was given a reward. Listed
below are the 5 outcomes and their end state and reward in brackets:
1. Player Busts (Bust -10)
2. Dealer Busts (Win +10)
3. Player and Dealer bust (Bust -10)
4. Player Wins (Win -10)
5. Player Loses (Lose -10)
6. Push (Push +5)
This was achieved using the code below:
1
2
3
4
5
6
7
8
9
10
11
12
i f ( p l a y e r . s c o r e > 21 && d e a l e r . s c o r e <= 2 1 ) {
/ / player bust
p l a y e r . reward ( −20);
}
e l s e i f ( p l a y e r . s c o r e <= 21 && d e a l e r . s c o r e > 2 1 ) {
/ / dealer bust
p l a y e r . f u n d s += b e t ∗ 2 ;
p l a y e r . reward ( 1 0 ) ;
}
e l s e i f ( p l a y e r . s c o r e > 21 && d e a l e r . s c o r e > 2 1 ) {
/ / d e a l e r and p l a y e r b u s t
p l a y e r . reward ( −10);
20
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
CHAPTER 4. SYSTEM DEVELOPMENT
}
e l s e i f ( player . score > d e a l e r . score ){
/ / p l a y e r wins
p l a y e r . f u n d s += b e t ∗ 2 ; ;
p l a y e r . reward ( 1 0 ) ;
}
e l s e i f ( player . score < d e a l e r . score ){
/ / player loses
p l a y e r . reward ( −10);
}
else{
/ / push
p l a y e r . f u n d s += b e t ;
p l a y e r . reward ( 5 ) ;
}
The reward method uses the same formula as employed by the learning method, except the reward is
explicitly passed to it as can be seen above. The code for this method is given below:
1
2
3
4
5
6
7
p r i v a t e void reward ( double r ){
d o u b l e maxval = Math . max ( v a l [STAND ] [ l a s t s c o r e −1] , v a l [ HIT ] [ l a s t s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = A ∗ ( r + (G ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
}
It must be noted that the final class has more code than listed above, but this code was for testing purposes
only and does not interfere with the learning algorithm. For the final code see Appendix C.1.5.
4.2.2
SKCards
The first feature that was added to the SKCards was the Shoe class (see Appendix D.1.5). This class is
used to represent what is known as a dealer’s shoe. This is in effect a Pack, which is not fully used. A
cut card is inserted at random in the Pack. Once this cut card is dealt, the Shoe is discarded and a new
one is brought into play. This is used mainly in casinos for Blackjack to counter card counting. This was
implemented for completeness of the system. The dealer does not distinguish between “soft” and “hard”
scores. This is as dealer play was not the objective of the experiments.
Then the Player interface was reworked, to fit in the new system. This included eliminating all relations
to the Main interface, which was renamed to UI. Simultaneously, the UI interface was modified to eliminate
all relations to the Player interface. The Rules abstract class was then created and the two interfaces where
joined using this. With this the framework was completed.
After completing the framework, the existing BJPlayer and BJUI (formerly BJMain) were built up.
Where possible, code from the original system was used, in its original form or modified. Where it was not
possible, code from the prototype was used, or it was made from scratch. This now makes the Old system,
equivalent to the old system in functionality, barring the AI.
Chapter 5
System Evaluation
5.1
System testing
There are 4 basic tests that have been devised, all of which depend on the agent being able to react to a
dealer policy effectively. As previously stated, we wish to come as close as possible to P ∗ , which in this
case would be a policy better than its opponent (the dealer).
Each test uses different values of rewards to achieve a different purpose. The first test uses the normal
values as listed below:
• Win = +10
• Push = +5
• Lose = -10
• Bust = -10
The subsequent tests involve doubling the reward for win, lose and bust and keeping the rest at their
normal values. The push value was not modified as its low frequency of occurrence would mean its effect is
minimal. These values were tested against 7 dealer policies, namely dealer hits below 11-17. As the dealer
plays a more aggressive strategy, the agent should also adjust its policy, within the parameters provided.
This range of policies is that the two boundary values represent significant policies. The lower bound
(11) represent the policy a player must play to avoid going bust. The upper bound (17) represents the policy
that is employed by several casinos, making it the de facto standard. All other policies were included for
completion.
The constants in the learning equation were kept the same as listed below:
• Softmax temperature τ = 0.5
• Step size parameter α = 0.6
• Discount rate γ = 0.75
21
22
5.2
5.2.1
CHAPTER 5. SYSTEM EVALUATION
Results
Table
This table provides the results of running the agent with different parameters. The data and therefore the
results are highly susceptible to noise, due to the random element in the system. However separate data
sets with the same parameters tend show a consistency of approximately ±2.5%, giving a total error range
of 5%. This error can be attributed to noise in the data, as well as the random factor within the game. Due
to the nature of card games, even an optimal action can lead to a sub-optimal result, causing interference
in the data. This is mitigated to a certain extent by the Law of Large numbers.
5.2. RESULTS
DEALER
POLICY
11
12
13
14
15
16
17
11
12
13
14
15
16
17
11
12
13
14
15
16
17
11
12
13
14
15
16
17
WIN
REWARD
10
10
10
10
10
10
10
20
20
20
20
20
20
20
10
10
10
10
10
10
10
10
10
10
10
10
10
10
23
LOSE
REWARD
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-20
-20
-20
-20
-20
-20
-20
BUST
REWARD
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-10
-20
-20
-20
-20
-20
-20
-20
-10
-10
-10
-10
-10
-10
-10
WIN(%)
LOSE(%)
BUST(%)
44.62
43.38
37.58
34.76
27.86
26.2
19.08
45.98
42.76
37.62
35.02
28.14
26.12
17.92
43.02
37.1
30.06
27.04
19.48
18.3
12.44
29.26
27.52
25.28
25.28
20.66
20.46
16.14
17.6
19.88
21.98
22.28
23.08
23.9
17.4
23.1
19.54
20.96
21.92
24.06
23.88
25.46
47.22
52.5
56.22
57.42
58.88
58.26
54.34
3.92
5.48
5.02
5.02
5.86
5.8
6.04
30.66
29
29.06
29.34
30.22
29.26
38.74
22.02
29.82
29.12
29.66
28.72
29.88
29.54
0
0
0
0
0
0
0
60.08
59.56
59.86
59.32
60.84
59.86
59.02
NET
WINS(%)
-3.64
-5.5
-8.96
-10.76
-12.46
-11.74
-12.86
0.86
-6.6
-7.92
-11.04
-12.26
-12.82
-13.18
-4.2
-15.4
-22.02
-24.4
-26.26
-25.72
-16.76
-34.74
-37.52
-35.44
-32.9
-32.28
-31.12
-25.7
Table 5.1: Final Results
5.2.2
Graphs
All the data from table 5.1 is illustrated graphically in the graphs below. Each graphs show the comparison between the 4 separate testbeds. For each run the following values were calculated:
1. Final Policy
2. Winning Percentage
3. Losing Percentage
4. Bust Percentage
5. Net Winning Percentage
The final policy is the highest score at which a greedy agent will hit and will stand on any higher score,
based on the values learnt. It is expected that as the dealer policy increases, the agent will also increase its
policy to try and stay ahead of it. It is also expected as agent’s reward values are adjusted, its policy will
also be adjusted.
24
CHAPTER 5. SYSTEM EVALUATION
The winning percentage is the percentage of games in which neither dealer nor agent has gone bust and
the player has won the game. Although when the dealer goes bust, it is considered that the agent has won,
this was not considered as this situation arises from poor play from the dealer and not necessarily good
play from the agent.
Similarly, the losing percentage is the percentage of the games in which the dealer has won. Again the
case of the player going bust was not considered under here as it can be attributed to poor play by the agent
and not necessarily good play from the dealer.
However, busting is an essential part of the game and must not be ignored. To this effect, the percentage
of busts was recorded. This number is simply the percentage of times the player has been bust during the
course of the learning.
When considering all of these figures, we must remember that there is a large random factor that affects
the game and therefore the results. Although the step-size parameter can smooth the learning process, there
is really no way to compensate for the noise in the final results data.
To consolidate all these figures, the net winning percentage was also calculated. This can be found using
the formula below:
net wins = (wins + dealer busts) - (losses + player bust)
5.2. RESULTS
25
Figure 5.1: Q-Learing Policies
26
CHAPTER 5. SYSTEM EVALUATION
Figure 5.2: Win Percentages
5.2. RESULTS
27
Figure 5.3: Loss Percantages
28
CHAPTER 5. SYSTEM EVALUATION
Figure 5.4: Bust Percantages
5.2. RESULTS
29
Figure 5.5: Net Win Percentages
30
5.3
CHAPTER 5. SYSTEM EVALUATION
Result Analysis
As can be seen from Figure 5.1, when the agent is given a reward of -20 for busting, it learns the most
effective policy to counter it, that is hitting at 11 or below. Here we can see the agent prefers to lose (-10),
rather than go bust. The best possible reward is +10 for it winning. However, it chooses not to pursue
this, as it is simply tries to avoid going bust and receiving the -20 reward. However, when it is given a
reward of -20 for losing, it plays a very aggressive strategy, specifically 20, hitting on everything but a 21.
Here the agent tries to avoid losing, even at the risk of busting. The best way to do that is play a very
aggressive strategy, which it does. By increasing the win reward to +20, the agent plays a slightly more
defensive strategy. However, on several points, both agents pick the same strategy. Cleary, the negative
rewards have a greater bearing on the learning than the positive rewards. Hence any tuning of the agent
would concentrate more on the negative rewards than the positive.
One point that must be noted is that the agent consistently learns a threshold score at which to stop
hitting. This is not specified on the algorithm. The algorithm merely provides the agent the ability to
value the actions and hence build a policy based on its experience. This can be said to be one of the major
successes of this project, in terms of the Reinforcement Learning Agent.
Furthermore, from Figure 5.2 it can clearly be seen that as the dealer plays a more aggressive strategy,
the player will lose more often. This is as expected, because as the dealer plays a more aggressive policy,
it will score higher, hence making it harder for the player to win. A point worth noting is that having a
higher bust and lose penalty results in a lower winning percentage, where as increasing the win reward has
a minimal effect.
As we proceed to Figure 5.3, we can see that this graph shows trends opposite to Figure 5.1. This is
expected as the better policy the agent plays, it should lose less. The lines are not as smooth, due to the
noise in the data, but the basic trend can be seen. Also it must be noted that the dealer policy has a minimal
effect on the loss percentage of the agent.
Figure 5.4 shows the same trend we saw in Figure 5.1. The more aggressive policy the agent plays, the
more it will bust. As it chooses to hit on higher scores, there is a larger probability of it going bust. Hence,
the same trend repeats itself, as expected. However, it must be noted that the agent busts more than the
dealer. The reason for this is not known, but it has been observed that the player busts about 5% more than
the dealer playign the same policy.
From the above graphs and analysis it is clear that the best results are obtained with the winning set at +10
or +20. Changing the penalties may improve the performance in certain areas, but the overall performance
is degraded. Hence the best agent under these circumstances would have a win reward of +10 or +20.
5.4
Possible Future Improvements
As can be seen from Figure 5.5 the agent tends to lose more than it wins when playing a policy of 12
or greater, no matter what the learning parameters. As this is the primary performance benchmark, any
non-trivial improvement to the system would have to increase this figure. There are two possible ways this
can be done, card counting and probability calculations. Both of these are discussed below.
5.4.1
Card Counting
Card counting is a tactic used by some players to try and “beat the system”. This tactic involves keeping
a running total of what is know as the “count” of the deck. Each low card “counts” as +1 and each high
card “counts” as +1. By adding the “count” of all the cards observed so far, the player can tell how “heavy”
5.4. POSSIBLE FUTURE IMPROVEMENTS
31
or “light” a deck is, indicating a large number of high or low cards respectively. Based on this, the player
can adjust their strategy.
The best way to combat this is to use a Shoe, which has been implemented (see Appendix D.1.5). Once
the shoe is changed out, the player no longer has a count of the deck. This would mean that they must now
begin to count again. While they are counting, they would presumably play a fixed strategy. By keeping
the cut-off point as low as possible, card counting can be countered.
5.4.2
Probabilities
Another possible way to improve the system is by using probabilities. This assumes knowledge of how
many decks are in play. Normally this player information is kept hidden from the player. However, if the
number of the decks is known, the number of each card is known. Based on cards already played out, the
agent can calculate the probability of getting the required card(s). Using these values, the agent can adjust
its policy.
The simplest counter to this is to simply not reveal the number of decks. Although a well programmed
agent can calculate several probabilities at the same time. It could simply maintain multiple variables and
update each with based on Cards observed. The number of decks could be inferred from the frequency of
Cards appearing, e.g. 3 occurrences of the King of Spades would indicate at least 3 decks. Again, if we
use a Shoe instead of a Pack, we can also quite effectively counter this.
5.4.3
Busting
It has been noted that the agent seems to bust too much. There is no apparent reason for this. Based on
analysis of the code in Appendix C.1.5, the system should perform equally to the dealer. However this is
not the case. The software needs to be thoroughly investigated to determine the cause of this error. Once
found the error needs to be rectified. This would improve the net win percentage of the system, which as
stated above is the measure of improvement in this system.
5.4.4
SKCards
There is scope for improvement in the SKCards framework. The Blackjack only implemnts the core rules.
The extended rules can be implemented for the player and make it a complete Blackjack game. Also, the
GUIs are very functional but do not look very attractive. These could be made more attractive. There is
also the possibility of adding more games to the framework.
32
CHAPTER 5. SYSTEM EVALUATION
Chapter 6
Summary and Conclusions
6.1
6.1.1
Summary of the Project
Goals
There were two main aims of this project. The main goal was to develop a Reinforcement Learning agent
for Blackjack. The second goal was to refactor the SKCards system and incorporate the Reinforcement
Learning agent into it.
6.1.2
Outcomes
As we can see, the Reinforcement Learning agent was successfully developed. Although it was unable
to beat the dealer, this is due to the nature of the game and not any shortcomings in the learning agent.
However, it must also be noted that the large number of busts lead it perform lower than expected.
The refactoring was successful. The new system was created, and conforms the the model shown in
Figure 3.2. However due the large amount of time spent on trying to implement the AI, there was little time
for the final system. Althought the system was built, it has only the most basic functionality and supports
only the core rules as stated in Section 2.1.2. Although it was planned to have all the rules included in the
system, this was not possible.
6.2
Conclusion
Based on the results gained from the system, it can be said that the Reinforcement Learning Agent learns
an effective policy. Although the algorithm does not specify that the agent should learn a threshold score
for hitting, the agent does learn this. This can be seen a large success of the learning algorithm. From
figure 5.1 we can see that modifying the values of the bust and lose reward has a much greater effect than
the win parameter. It can be concluded that the agent does learn to play Blackjack and that Reinforcement
Learning is suitable for Blackjack.
Also we notice that as the dealer plays more aggressively, even though the player is playing an equal
or better policy, they tend to lose more. It can be inferred from here that through pure strategy alone, the
casino policy is very difficult to beat.
Also we can see there is some noise affecting the system, as can be seen in Figure 5.4. The graph is
fairly consistent with what is expected as the number of busts relates to the policy the agent plays. Both
figure 5.4 and 5.1 show the same basic trend. However in Figure 5.4 the last point for the agent with -20
reward for loss is significantly higher than the others. This is the noise and the random element of the game
clouding the results.
33
34
CHAPTER 6. SUMMARY AND CONCLUSIONS
Although the Law of Large Numbers mitigates this effect to a certain extent, it is not sufficient. The
Pack is shuffled using the Math.random() function in Java, which is not very effective at generating very
large numbers of data. This factor is not very significant and does not affect most of the tests. However,
this, combined with noise and the random element in Blackjack, can explain erroneous results such as this.
Chapter 7
Project Self-Evaluation
7.1
Introduction
This project aimed to build a Reinforcement Learning agent for Blackjack and build it into an existing
framework. The framework had to be refactored to make it more usable and conform with software engineering norms. It can be said that this was very successful to a large extent.
7.2
Project outcomes
The first outcome, which was to develop a Reinforcement Learning agent, was successful. The agent
does learn a policy of hitting until a threshold score. This score varies on the rewards it is given. However
it can be seen from Figure 5.1 that the agent does learn a suitable policy based on what it is told to do.
There is however, the problem of the agent busting more than the dealer.
The second outcome was to refactor the old system and include the Reinforcement Learning agent. This
was succesfully achieved. However, it was only built to support the core rules of Blackjack. This was due
to a planing oversight on my part, which meant that I spent too much time on the AI development.
7.3
Project planning
The initial planning of my project was quite challenging. There were several tasks to be completed and
these all needed to be scheduled. The 3 main tasks were:
• Background reading and research
• Prototyping
• System development
Although I had allocated time for all these tasks, the prototyping stage spilled over into the system
development stage. This gave me less time to work on the system. Barring this, the project went according
to plan. There was a maximum deviation from plan of 1 or 2 days. I now realise that I should have allowed
a larger buffer time in between tasks for spill over and unforeseen errors
7.4
Self-evaluation
During the course of this project, I learnt the limitations of my programming skills. Although I consider
my self highly competent in terms of programming, developing graphical user interfaces is an area where
35
36
CHAPTER 7. PROJECT SELF-EVALUATION
there is great room for improvement. My interfaces tend to lean towards functionality at the expense of
appearances. This makes the interface less aesthetically pleasing.
Also, I have realised that my planning tends to be to optimistic and does not account for errors and other
spill over. This is due to my underestimation of buffer time between tasks, which would be taken up by
error correction.
Finally I have learnt that I am quite competent at analysis of numerical data. Based on the prototypes
I have built and the results thereof, (see Appendix A.2 & B.2) it has become apparent to me that my
mathematical skills are better than I thought before the project. However, if I work with the same data for
too long, I tend to miss the bigger picture.
Appendix A
Card High/Low prototype
A.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Source Code
/ / A d d i n g HighLow t o SKCards
package SKCards ;
/∗
∗ Importing the ArrayList class
∗ and t h e IO c o m p o n e n t s
∗/
import j a v a . u t i l . A r r a y L i s t ;
import j a v a . i o . ∗ ;
/∗ ∗
∗ This i s the 1 s t prototype for the Reinforcement Learining agent
∗ f o r t h e SKCards s y s t e s m , s p e c i f i c a l l y S K B l a c k J a c k . T h i s a g e n t
∗ a t t e m p t s t o guess o f t h e n e x t card i s h i g h e r or lower than t h e
∗ c u r r e n t c a r d . I t a c h e i c e v e s t h i s by u s i n g a s i m p l e sum o f a l l
∗ the past rewards .
∗
∗ @author S a q i b A K a k v i
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @see SKCards . Card Card
∗ @SK . bug s y s t e m d o e s n o t c o n s i d e r t h e c a s e when t h e Cards a r e e q u a l .
∗/
p u b l i c c l a s s HighLow{
p u b l i c f i n a l s t a t i c i n t HIGH = 1 ;
p u b l i c f i n a l s t a t i c i n t LOW = 0 ;
double [ ] highv , highp ;
d o u b l e [ ] lowv , lowp ;
double t a u ;
Pack p a c k ;
int cscore , hscore ;
37
38
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
APPENDIX A. CARD HIGH/LOW PROTOTYPE
/∗ ∗
∗ The main method . C r e a t e s a new HighLow .
∗/
p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) {
new HighLow ( ) ;
}
/∗ ∗
∗ D e f a u l t c o n s t r u c t o r . C r e a t e s a new l e a r n i n g a g e n t , w i t h no p r e v i o u s
∗ knowledge . I t b e g i n s l e a r n i n g f o r 10 ,000 e p i s o d e s . A f t e r i t has
∗ i t h a s l e a r n t , i t p l a y s a g a i n s t t h e u s e r . As p a r t o f t h e t e s t i n g ,
∗ the values of the l e a r n i n g are w r i t t e n to s e p e r a t e f i l e s .
∗/
p u b l i c HighLow ( ) {
p a c k = new Pack ( 2 , t r u e ) ;
h i g h v = new d o u b l e [ 1 4 ] ;
h i g h p = new d o u b l e [ 1 4 ] ;
lowv = new d o u b l e [ 1 4 ] ;
lowp = new d o u b l e [ 1 4 ] ;
tau = 0.1;
f o r ( i n t i = 0 ; i < h i g h v . l e n g t h ; i ++){
highv [ i ] = 0 . 0 ;
highp [ i ] = 0 . 0 ;
lowv [ i ] = 0 . 0 ;
lowp [ i ] = 0 . 0 ;
}
System . o u t . p r i n t l n ( ” ∗∗∗∗∗LEARNING∗∗∗∗∗ ” ) ;
l o n g s t a r t = System . c u r r e n t T i m e M i l l i s ( ) ;
try {
B u f f e r e d W r i t e r [ ] wr = new B u f f e r e d W r i t e r [ 1 4 ] ;
f o r ( i n t i = 0 ; i < wr . l e n g t h ; i ++){
S t r i n g name = ” v a l u e s ” + i + ” . c s v ” ;
wr [ i ] = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( name ) ) ;
wr [ i ] . w r i t e ( ” low ” + I n t e g e r . t o S t r i n g ( i ) ) ;
wr [ i ] . w r i t e ( ” \ t ” ) ;
wr [ i ] . w r i t e ( ” h i g h ” + I n t e g e r . t o S t r i n g ( i ) ) ;
wr [ i ] . newLine ( ) ;
}
f o r ( i n t i = 0 ; i < 1 0 0 0 0 ; i ++){
episode ( ) ;
f o r ( i n t j = 0 ; j < wr . l e n g t h ; j ++){
wr [ j ] . w r i t e ( Double . t o S t r i n g ( lowv [ j ] ) ) ;
wr [ j ] . w r i t e ( ” \ t ” ) ;
A.1. SOURCE CODE
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
39
wr [ j ] . w r i t e ( Double . t o S t r i n g ( h i g h v [ j ] ) ) ;
wr [ j ] . newLine ( ) ;
}
}
f o r ( i n t i = 0 ; i < wr . l e n g t h ; i ++){
wr [ i ] . f l u s h ( ) ;
wr [ i ] . c l o s e ( ) ;
}
}
c a t c h ( E x c e p t i o n ex ) {
}
l o n g s t o p = System . c u r r e n t T i m e M i l l i s ( ) ;
long time = s to p − s t a r t ;
System . o u t . p r i n t l n ( ” ∗∗∗∗∗DONE LEARNING∗∗∗∗∗ ” ) ;
System . o u t . p r i n t l n ( ” ∗∗∗∗∗TOOK ” + t i m e + ” ms TO LEARN∗∗∗∗∗ ” ) ;
try {
B u f f e r e d W r i t e r bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” v a l u e s . t x t ” ) ) ;
bw . w r i t e ( ”num” + ” \ t ” + ” h i g h ” +
bw . newLine ( ) ;
” \ t ” + ” low ” ) ;
f o r ( i n t i = 0 ; i < h i g h v . l e n g t h ; i ++){
bw . w r i t e ( i + ” \ t ” + h i g h v [ i ] + ” \ t ” + lowv [ i ] ) ;
bw . newLine ( ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
System . o u t . p r i n t l n ( ) ;
System . o u t . p r i n t l n ( ”NOW LETS PLAY ! ! ! ” ) ;
System . o u t . p r i n t l n ( ”You n e e d t o g u e s s i f t h e n e x t c a r d i s h i g h e r o r ” +
” lower than the c u r r e n t card ” ) ;
System . o u t . p r i n t l n ( ” E n t e r 1 f o r h i g h e r , 0 f o r l o w e r , ” +
” e q u a l s i s n o t an o p t i o n ! ” ) ;
cscore = 0;
hscore = 0;
try {
B u f f e r e d R e a d e r b r = new B u f f e r e d R e a d e r ( new I n p u t S t r e a m R e a d e r ( System . i n ) ) ;
f o r ( i n t i = 1 ; i < 1 0 ; i ++){
Card c a r d 1 = p a c k . n e x t C a r d ( ) ;
Card c a r d 2 = p a c k . n e x t C a r d ( ) ;
i n t c1 = c a r d 1 . g e t P o i n t s ( ) ;
i n t c2 = c a r d 2 . g e t P o i n t s ( ) ;
40
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
APPENDIX A. CARD HIGH/LOW PROTOTYPE
i n t caxn ;
i f ( h i g h v [ c1 ] > lowv [ c1 ] ) {
caxn = 1 ;
}
else{
caxn = 0 ;
}
System . o u t . p r i n t l n ( ” Number ” + i + ” : ” + c a r d 1 . t o S t r i n g ( ) ) ;
System . o u t . p r i n t ( ” E n t e r y o u r g u e s s : ” ) ;
i n t haxn = I n t e g e r . p a r s e I n t ( b r . r e a d L i n e ( ) ) ;
System . o u t . p r i n t l n ( ” I g u e s s e d ” + c a x n ) ;
c p l a y ( caxn , c1 , c2 ) ;
h p l a y ( haxn , c1 , c2 ) ;
System . o u t . p r i n t l n ( ” The a c t u a l c a r d i s : ” + c a r d 2 . t o S t r i n g ( ) ) ;
}
System . o u t . p r i n t l n ( ”YOU SCORED : ” + h s c o r e ) ;
System . o u t . p r i n t l n ( ” I SCORED : ” + c s c o r e ) ;
}
c a t c h ( E x c e p t i o n ex ) {
}
}
/∗ ∗
∗ T h i s method g o e s t h r o u g h a s i n g l e e p i s o d e o f t h e l e a r n i n g . T h i s
∗ i n v o l v e s t h e a g e n t p i c k i n g an a c t i o n b a s e d on t h e v a l u e o f t h e
∗ c a r d . The s e l e c t e d a c t i o n i s t h e n t a k e n .
∗/
public void episode ( ) {
Card c a r d 1 = p a c k . n e x t C a r d ( ) ;
Card c a r d 2 = p a c k . n e x t C a r d ( ) ;
i n t c1 = c a r d 1 . g e t P o i n t s ( ) ;
i n t c2 = c a r d 2 . g e t P o i n t s ( ) ;
/ / Jo < A < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < T < J < Q < K
/ / 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 < 13
d o u b l e hv = Math . exp ( h i g h v [ c1 ] / t a u ) ;
d o u b l e l v = Math . exp ( lowv [ c2 ] / t a u ) ;
d o u b l e sum = hv + l v ;
h i g h p [ c1 ] = hv / sum ;
lowp [ c1 ] = l v / sum ;
d o u b l e r = Math . random ( ) ;
i f ( r > lowp [ c1 ] ) {
A.1. SOURCE CODE
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
41
a c t i o n ( HIGH , c1 , c2 ) ;
}
else{
a c t i o n (LOW, c1 , c2 ) ;
}
}
/∗ ∗
∗ C a r r i e s o u t a s i n g l e a c t i o n , h i g h o r low , b a s e d on t h e p a r a m e t e r s
∗ r e c i e v e d f r o m t h e e p i s o d e method . The v a l u e s o f t h e two c a r d s a r e
∗ a l s o p a s s e d f o r v e r i f i c a t i o n . The s c o r e i s a d j u s t e d a c c o r d i n l y ,
∗ b a s e d on t h e v a l u e s o f t h e c a r d s and t h e axn t a k e n .
∗
∗ @param axn t h e axn t o be t a k e n
∗ @param c1 t h e v a l u e o f Card1
∗ @param c2 t h e v a l u e o f Card2
∗/
p u b l i c v o i d a c t i o n ( i n t axn , i n t c1 , i n t c2 ) {
i f ( axn == HIGH ) {
i f ( c1 < c2 ) {
h i g h v [ c1 ] += 1 ;
}
else{
h i g h v [ c1 ] −= 1 ;
}
}
e l s e i f ( axn == LOW) {
i f ( c1 > c2 ) {
lowv [ c1 ] += 1 ;
}
else{
lowv [ c1 ] −= 1 ;
}
}
}
/∗ ∗
∗ T h i s method i s u s e d f o r a s i n g l e g u e s s made by t h e c o m p u t e r i n t h e
∗ head t o head game w i t h t h e human . I f t h e c o m p u t e r g u e s s e d c o r r e c t l y
∗ i t gets a point .
∗
∗ @param axn t h e axn t o be t a k e n
∗ @param c1 t h e v a l u e o f Card1
∗ @param c2 t h e v a l u e o f Card2
∗/
p u b l i c v o i d c p l a y ( i n t axn , i n t c1 , i n t c2 ) {
i f ( axn == 1 ) {
i f ( c1 < c2 ) {
c s c o r e ++;
}
42
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
APPENDIX A. CARD HIGH/LOW PROTOTYPE
}
e l s e i f ( axn == 0 ) {
i f ( c1 > c2 ) {
c s c o r e ++;
}
}
}
/∗ ∗
∗ T h i s method i s u s e d f o r a s i n g l e g u e s s made by t h e human i n t h e
∗ head t o head game w i t h t h e c o m p u t e r . I f t h e human g u e s s e d c o r r e c t l y
∗ i t gets a point .
∗
∗ @param axn t h e axn t o be t a k e n
∗ @param c1 t h e v a l u e o f Card1
∗ @param c2 t h e v a l u e o f Card2
∗/
p u b l i c v o i d h p l a y ( i n t axn , i n t c1 , i n t c2 ) {
i f ( axn == 1 ) {
i f ( c1 < c2 ) {
h s c o r e ++;
}
}
e l s e i f ( axn == 0 ) {
i f ( c1 > c2 ) {
h s c o r e ++;
}
}
}
}
A.2
Results
This section contains the results of one run of the High/Low prototype. Several runs were conducted
and the results were found to be fairly consistent. The reason for variations in the results was the fact
that no proper learning algorithm was used, hence there was no supporting proof of convergence. The
following graphs illustrate the values of guessing high or low for a given card, with respect to the number
of occurrences.
A.2. RESULTS
43
Figure A.1: Values for Joker
44
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.2: Values for Ace
A.2. RESULTS
45
Figure A.3: Values for 2
46
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.4: Values for 3
A.2. RESULTS
47
Figure A.5: Values for 4
48
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.6: Values for 5
A.2. RESULTS
49
Figure A.7: Values for 6
50
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.8: Values for 7
A.2. RESULTS
51
Figure A.9: Values for 8
52
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.10: Values for 9
A.2. RESULTS
53
Figure A.11: Values for 10
54
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.12: Values for Jack
A.2. RESULTS
55
Figure A.13: Values for Queen
56
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Figure A.14: Values for King
A.2. RESULTS
57
VALUE
Joker
2
3
4
5
6
7
8
9
10
Jack
Queen
King
FINAL POLICY
HIGH
HIGH
HIGH
HIGH
HIGH
HIGH
LOW
LOW
LOW
LOW
LOW
LOW
LOW
Table A.1: Final Policy
As we can see, The agent learns policies at the extremities (Joker and King) very easily. However with
the middle values, it has some problems. Using a proper learning algorithm would help smooth out the
learning in the middle values. It can bee seen from Table A.1 below that the that the agent has learnt a
fairly good policy.
58
APPENDIX A. CARD HIGH/LOW PROTOTYPE
Appendix B
GridWorld prototype
B.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Source Code
/ / A d d i n g G r i d W o r l d t o SKCards
package SKCards ;
/∗
∗ importing the ArrayList class
∗ and t h e s w i n g / awt c o m p o n e n t s
∗ and t h e b o r d e r c o m p o n e n t s
∗ and t h e B u f f e r e d I m a g e c l a s s
∗ and t h e IO c o m p o n e n t s
∗ and t h e ImageIO c l a s s
∗/
import j a v a . u t i l . A r r a y L i s t ;
import j a v a . awt . ∗ ;
import j a v a x . s w i n g . ∗ ;
import j a v a x . s w i n g . b o r d e r . ∗ ;
import j a v a . awt . image . B u f f e r e d I m a g e ;
import j a v a . i o . ∗ ;
import j a v a x . i m a g e i o . ImageIO ;
/∗ ∗
∗ T h i s i s t h e s e c o n d p r o t o t y p e f o r t h e SKCards s y s t e m . T h i s
∗ c l a s s a t t e m p t s t o s o l v e t h e G r i d W o r l d problem , u s i n g t h e
∗ R e i n f o r c m e n t L e a r n i n g Monte C a r l o m e t h o d s .
∗
∗ @author S a q i b A K a k v i
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @SK . bug none
∗/
p u b l i c c l a s s G r i d W o r l d e x t e n d s JFrame {
private
private
private
private
JLabel [ ] [ ] grid ;
Point loc ;
d o u b l e t a u , gamma , a l p h a ;
double [ ] [ ] val , prob ;
59
60
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
APPENDIX B. GRIDWORLD PROTOTYPE
private
private
private
private
private
private
b o o l e a n [ ] [ ] pen ;
A r r a y L i s t p a t h ;
JFrame v a l u e s ;
JLabel [ ] [ ] vlab ;
String filename ;
i n t h , w, p ;
/∗ ∗
∗ main method . C r e a t e s a new i n s t a n c e o f GridWorld , u s i n g t h e d e f a u l t
∗ constructor .
∗
∗ @param a r g s n o t u s e d
∗/
p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) { new G r i d W o r l d ( ) . r u n ( ) ; }
/∗ ∗
∗ D e f a u l t c o n s t r u c t o r . C r e a t e s a GridWorld i n s t a n c e b p a s s i n g t h e
∗ p a r a m e t e r s 10 and 10 t o t h e main c o n s t r u c t o r .
∗
∗ @see G r i d W o r l d
∗/
p u b l i c GridWorld ( ) { t h i s ( 1 0 , 1 0 ) ; }
/∗ ∗
∗ Main C o n s t r u c t o r . C r e a t e s a g r i d o f s i z e n xm.
∗ A l l t h e g r i d s q u a r e s a r e c r e a t e d and p e n a l t i e s a s s i g n e d t o t h e
∗ a p p r o p r i a t e s q u a r e s . The s y s t e m i s t h e n r u n f o r 2500 e p i s o d e s .
∗
∗ @param n t h e l e n g t h o f t h e g r i d
∗ @param m t h e w i d t h o f t h e d r i d
∗/
p u b l i c G r i d W o r l d ( i n t n , i n t m) {
super ( ” GridWorld ” ) ;
Dimension d = ( T o o l k i t . g e t D e f a u l t T o o l k i t ( ) ) . g e t S c r e e n S i z e ( ) ;
h = ( int )( d . getHeight ( ) ) / 4 ∗ 3;
w = ( i n t ) ( d . getWidth ( ) ) / 2;
s e t S i z e (w, h ) ;
s e t D e f a u l t C l o s e O p e r a t i o n ( DISPOSE ON CLOSE ) ;
g r i d = new J L a b e l [ n ] [m ] ;
tau = 0.75;
gamma = 0 . 7 5 ;
alpha = 0.5;
v a l = new d o u b l e [ n ] [m ] ;
pen = new b o o l e a n [ n ] [m ] ;
for ( int i = 0 ;
for ( int j =
grid [ i ][
grid [ i ][
i < n ; i ++){
0 ; j < m; j ++){
j ] = new J L a b e l ( ) ;
j ] . s e t B o r d e r ( new L i n e B o r d e r ( C o l o r . BLUE ) ) ;
B.1. SOURCE CODE
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
61
g r i d [ i ] [ j ] . s e t H o r i z o n t a l A l i g n m e n t ( S w i n g C o n s t a n t s . CENTER ) ;
i n t r = new Double ( Math . random ( ) ∗ 1 0 ) . i n t V a l u e ( ) ;
i f ( ( i + j == ( ( n+m) / 2 ) − 1 ) && ( i ! = n / 2 ) ) {
pen [ i ] [ j ] = t r u e ;
g r i d [ i ] [ j ] . s e t F o r e g r o u n d ( C o l o r . RED ) ;
}
else{
pen [ i ] [ j ] = f a l s e ;
}
}
}
C o n t a i n e r cp = g e t C o n t e n t P a n e ( ) ;
cp . s e t L a y o u t ( new G r i d L a y o u t ( n ,m ) ) ;
f o r ( i n t i = 0 ; i < n ; i ++){
f o r ( i n t j = 0 ; j < m; j ++){
cp . add ( g r i d [ i ] [ j ] ) ;
}
}
g r i d [ 0 ] [ 0 ] . s e t F o r e g r o u n d ( C o l o r . GREEN ) ;
g r i d [ n −1][m− 1 ] . s e t F o r e g r o u n d ( C o l o r . GREEN ) ;
for ( int i = 0; i <
for ( int j = 0;
i f ( pen [ i ] [ j
val [ i ][
}
else{
val [ i ][
}
}
}
v a l . l e n g t h ; i ++){
j < v a l [ i ] . l e n g t h ; j ++){
]){
j ] = −100;
j ] = 0;
val [ 0 ] [ 0 ] = 100;
v a l [ n −1][m−1] = 1 0 0 ;
values
values
values
values
= new JFrame ( ”VALUES” ) ;
. s e t S i z e (w, h ) ;
. s e t L o c a t i o n (w , 0 ) ;
. s e t D e f a u l t C l o s e O p e r a t i o n ( DISPOSE ON CLOSE ) ;
Container c = values . getContentPane ( ) ;
c . s e t L a y o u t ( new G r i d L a y o u t ( n , m ) ) ;
v l a b = new J L a b e l [ n ] [m ] ;
f o r ( i n t i = 0 ; i < v l a b . l e n g t h ; i ++){
f o r ( i n t j = 0 ; j < v l a b [ i ] . l e n g t h ; j ++){
v l a b [ i ] [ j ] = new J L a b e l ( Double . t o S t r i n g ( v a l [ i ] [ j ] ) ) ;
v l a b [ i ] [ j ] . s e t H o r i z o n t a l A l i g n m e n t ( S w i n g C o n s t a n t s . CENTER ) ;
62
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
APPENDIX B. GRIDWORLD PROTOTYPE
vlab [ i ] [ j ] . setOpaque ( true ) ;
v l a b [ i ] [ j ] . s e t F o r e g r o u n d ( C o l o r . WHITE ) ;
v l a b [ i ] [ j ] . s e t B a c k g r o u n d ( C o l o r . BLACK ) ;
c . add ( v l a b [ i ] [ j ] ) ;
}
}
reset ();
t h i s . s e t V i s i b l e ( true ) ;
values . s e t V i s i b l e ( true ) ;
this . repaint ();
values . repaint ( ) ;
}
/∗ ∗
∗ Runs t h e G r i d W o r l d program .
∗/
public void run ( ) {
f o r ( i n t i = 1 ; i <= 2 5 0 0 ; i ++){
f i l e n a m e = new S t r i n g ( ”TEST” + I n t e g e r . t o S t r i n g ( i ) + ” . j p g ” ) ;
s e t T i t l e ( ” Episode ” + i ) ;
episode ( ) ;
reset ();
screenShot ( ) ;
}
System . o u t . p r i n t l n ( p + ” PENALTIES” ) ;
}
/∗ ∗
∗ S e t s t h e G r i d W o r l d and t h e v a l u e s f r a m e ’ s v i s i b l i t y u s i n g
∗ the value passed .
∗
∗ @param v i s t h e v i s i b i l i t y o f t h e windows .
∗/
public void s e t V i s i b i l i t y ( boolean v i s ){
this . setVisible ( vis );
values . setVisible ( vis ) ;
}
/∗ ∗
∗ T h i s method r u n s a s n i n g l e e p i s o d e t h r o u g h t h e s y s t e m . I t
∗ c o n t i n u e s t o move a s i n g l e s t e p u n t i l i t r e a c h e s a t e r m i n a l
∗ s q a u r e . When t h i s happens , a s c r e e n s h o t o f t h e windows i s t a k e n .
∗
∗ @see s t e p
∗/
private void episode ( ) {
i n t x = ( i n t ) ( ( Math . random ( ) ∗ 1 0 0 ) % g r i d . l e n g t h ) ;
i n t y = ( i n t ) ( ( Math . random ( ) ∗ 1 0 0 ) % g r i d [ 0 ] . l e n g t h ) ;
B.1. SOURCE CODE
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
63
l o c = new P o i n t ( x , y ) ;
while ( c h e c k l o c ( ) != 0){
x = ( i n t ) ( ( Math . random ( ) ∗ 1 0 0 ) % g r i d . l e n g t h ) ;
y = ( i n t ) ( ( Math . random ( ) ∗ 1 0 0 ) % g r i d [ 0 ] . l e n g t h ) ;
l o c = new P o i n t ( x , y ) ;
}
p a t h = new A r r a y L i s t ( ) ;
g r i d [ x ] [ y ] . s e t T e x t ( ”X” ) ;
w h i l e ( c h e c k l o c ( ) == 0 ) {
step ( ) ;
}
reward ( ) ;
}
/∗ ∗
∗ T e s t i n g method . Used t o t a k e a s c r e e n s h o t o f t h e s y s t e m
∗ a f t e r e a c h e p i s o d e . T h i s a l l o w s t h e p r o g r e s s and r e s u l t s
∗ o f t h e s y s t e m t o be d o c u m e n t e d .
∗/
private void s cr ee nS ho t ( ) {
/ / CODE TAKEN FROM
/ / www . j a v a − t i p s . o r g / j a v a −se− t i p s / j a v a . awt / how−t o −c a p t u r e −s c r e e n s h o t . h t m l
try {
Robot r o b o t = new Robot ( ) ;
/ / Capture the screen s ho t of the area of the screen
/ / d e f i n e d by t h e r e c t a n g l e
B u f f e r e d I m a g e b i = r o b o t . c r e a t e S c r e e n C a p t u r e ( new R e c t a n g l e (w∗ 2 , h ) ) ;
ImageIO . w r i t e ( b i , ” j p g ” , new F i l e ( f i l e n a m e ) ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
/∗ ∗
∗
∗/
private void r e s e t ( ) {
for ( int i = 0; i < grid
for ( int j = 0; j <
i f ( pen [ i ] [ j ] ) {
grid [ i ][ j ] .
}
else{
grid [ i ][ j ] .
. l e n g t h ; i ++){
g r i d [ i ] . l e n g t h ; j ++){
s e t T e x t ( ”PEN” ) ;
setText ( ”” );
64
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
APPENDIX B. GRIDWORLD PROTOTYPE
}
}
}
g r i d [ 0 ] [ 0 ] . s e t T e x t ( ”GOAL” ) ;
g r i d [ g r i d . l e n g t h −1][ g r i d [ 0 ] . l e n g t h − 1 ] . s e t T e x t ( ”GOAL” ) ;
f o r ( i n t i = 0 ; i < v l a b . l e n g t h ; i ++){
f o r ( i n t j = 0 ; j < v l a b [ i ] . l e n g t h ; j ++){
v l a b [ i ] [ j ] . s e t T e x t ( Double . t o S t r i n g ( v a l [ i ] [ j ] ) ) ;
i f ( val [ i ] [ j ] > 0){
i n t v = ( i n t ) v a l [ i ] [ j ] ∗ 255 / 1 0 0 ;
v l a b [ i ] [ j ] . s e t B a c k g r o u n d ( new C o l o r ( 0 , v , 0 ) ) ;
}
else{
i n t v = ( i n t ) v a l [ i ] [ j ] ∗ 255 / 100 ∗ −1;
v l a b [ i ] [ j ] . s e t B a c k g r o u n d ( new C o l o r ( v , 0 , 0 ) ) ;
}
}
}
repaint ();
}
/∗ ∗
∗
∗/
private int checkloc (){
i f ( g e t x ( ) == 0 && g e t y ( ) == 0 ) {
return 1;
}
e l s e i f ( g e t x ( ) == g r i d . l e n g t h −1 && g e t y ( ) == g r i d [ 0 ] . l e n g t h −1){
return 1;
}
e l s e i f ( pen [ g e t x ( ) ] [ g e t y ( ) ] ) {
r e t u r n −1;
}
else{
return 0;
}
}
/∗ ∗
∗ T h i s method moves t h e a g e n t a s i n g l e s t e p . The d i r e c t i o n
∗ i s d e c i d e d on u s i n g t h e s o f t m a x p r o b a b i l i t y d i s t r i b u t i o n s .
∗
∗ @see a c t i o n ( i n t )
∗/
private void s t e p ( ) {
double lim = 0 . 0 ;
B.1. SOURCE CODE
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
65
d o u b l e sum = 0 . 0 ;
int x = getx ( ) ;
int y = gety ( ) ;
d o u b l e [ ] p r o b = new d o u b l e [ 4 ] ;
d o u b l e up , down , l e f t , r i g h t ;
b o o l e a n uv = t r u e , l v = t r u e , dv = t r u e , r v = t r u e ;
i f ( g e t x ( ) == 0 ) {
uv = f a l s e ;
}
e l s e i f ( g e t x ( ) == g r i d . l e n g t h −1){
dv = f a l s e ;
}
i f ( g e t y ( ) == g r i d [ 0 ] . l e n g t h −1){
rv = f a l s e ;
}
e l s e i f ( g e t y ( ) == 0 ) {
lv = false ;
}
i f ( uv ) {
up = Math . exp ( v a l [ x −1][ y ] / t a u ) ;
sum += up ;
}
i f ( rv ){
r i g h t = Math . exp ( v a l [ x ] [ y + 1 ] / t a u ) ;
sum += r i g h t ;
}
i f ( dv ) {
down = Math . exp ( v a l [ x + 1 ] [ y ] / t a u ) ;
sum += down ;
}
i f ( lv ){
l e f t = Math . exp ( v a l [ x ] [ y − 1 ] / t a u ) ;
sum += l e f t ;
}
i f ( uv ) {
prob [ 0 ]
}
i f ( rv ){
prob [ 1 ]
}
i f ( dv ) {
prob [ 2 ]
}
i f ( lv ){
prob [ 3 ]
}
= Math . exp ( v a l [ x −1][ y ] / t a u ) / sum ;
= Math . exp ( v a l [ x ] [ y + 1 ] / t a u ) / sum ;
= Math . exp ( v a l [ x + 1 ] [ y ] / t a u ) / sum ;
= Math . exp ( v a l [ x ] [ y − 1 ] / t a u ) / sum ;
d o u b l e r = Math . random ( ) ;
66
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
APPENDIX B. GRIDWORLD PROTOTYPE
f o r ( i n t i = 0 ; i < p r o b . l e n g t h ; i ++){
l i m += p r o b [ i ] ;
i f ( r < lim ){
action ( i );
break ;
}
}
repaint ();
}
/∗ ∗
∗ T a k e s an a c t i o n , t h a t i s moving 1 s t e p i n any o f t h e
∗ valid directions .
∗
∗ @param axn t h e d i r e c t i o n t o move .
∗/
p r i v a t e v o i d a c t i o n ( i n t axn ) {
p a t h . add ( new P o i n t ( l o c ) ) ;
s w i t c h ( axn ) {
c a s e 0 : / / up
loc . t r a n s l a t e
break ;
case 1: / / r i g h t
loc . t r a n s l a t e
break ;
c a s e 2 : / / down
loc . t r a n s l a t e
break ;
case 3: / / l e f t
loc . t r a n s l a t e
break ;
}
( −1 ,0);
(0 ,1);
(1 ,0);
(0 , −1);
g r i d [ g e t x ( ) ] [ g e t y ( ) ] . s e t T e x t ( ”X” ) ;
}
/∗ ∗
∗
∗/
private void reward ( ) {
i f ( c h e c k l o c ( ) == −1) p ++;
double reward = v a l [ g e t x ( ) ] [ g e t y ( ) ] ;
f o r ( i n t i = p a t h . s i z e ( ) − 1 ; i >=0; i −−){
r e w a r d ∗= gamma ;
B.2. RESULTS
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
67
Point p = path . get ( i ) ;
i n t x = ( i n t ) p . getX ( ) ;
i n t y = ( i n t ) p . getY ( ) ;
v a l [ x ] [ y ] += a l p h a ∗ ( r e w a r d − v a l [ x ] [ y ] ) ;
reward = val [ x ] [ y ] ;
}
}
/∗ ∗
∗ R e t u r n s t h e x co−o r d i n a t e o f t h e a g e n t ’ s c u r r e n t
∗ p o s i t i o n , a s an i n t s o i t can be u s e d a s an
∗ index in the grid array .
∗
∗ @return t h e x co−o r d i n a t e
∗ @see g e t y
∗/
p r i v a t e i n t g e t x ( ) { r e t u r n ( i n t ) l o c . getX ( ) ; }
/∗ ∗
∗ R e t u r n s t h e y co−o r d i n a t e o f t h e a g e n t ’ s c u r r e n t
∗ p o s i t i o n , a s an i n t s o i t can be u s e d a s an
∗ index in the grid array .
∗
∗ @return t h e y co−o r d i n a t e
∗ @see g e t x
∗/
p r i v a t e i n t g e t y ( ) { r e t u r n ( i n t ) l o c . getY ( ) ; }
}
B.2
Results
The GridWorld problem can be said to be mathematically trivial, as the values can be calculated very
easily. Given the final reward r and the discount rate γ, we can calculate the value v of a square n steps
away from the goal using the formula:
n
v =r×γ
Based on this formula, we can calculate the values of all the grid squares based on their distance from
the goal square as given in the table below:
Included below are some screen shots taken during the execution of the program. These screen shots
selected were before episodes 1, 25, 50, 100, 150, 200, 250, 500, 750, 1000, 1500 & 2500.
68
APPENDIX B. GRIDWORLD PROTOTYPE
STEPS TO GOAL
0
1
2
3
4
5
6
7
8
9
10
11
12
VALUE
100
75
56.25
42.1875
31.6406
23.7305
17.7979
13.3484
10.0113
7.5085
5.6314
4.2235
3.1676
Table B.1: Predicted values for the GridWorld problem
B.2. RESULTS
Figure B.1: Episode 1
69
70
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.2: Episode 25
B.2. RESULTS
Figure B.3: Episode 50
71
72
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.4: Episode 100
B.2. RESULTS
Figure B.5: Episode 150
73
74
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.6: Episode 200
B.2. RESULTS
Figure B.7: Episode 250
75
76
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.8: Episode 500
B.2. RESULTS
Figure B.9: Episode 750
77
78
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.10: Episode 1000
B.2. RESULTS
Figure B.11: Episode 1500
79
80
APPENDIX B. GRIDWORLD PROTOTYPE
Figure B.12: Episode 2000
B.2. RESULTS
Figure B.13: Episode 2500
81
82
APPENDIX B. GRIDWORLD PROTOTYPE
Appendix C
Blackjack AI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
C.1
Source Code
C.1.1
Face
/ / A d d i n g Face t o SKCards
package SKCards ;
/∗ ∗
∗ D e f i n e s an enum o f f a c e v a l u e s . The v a l u e s a r e 2 −10( t h e words ) ,
∗ Jack , Queen , King , Ace and J o k e r .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 1 . 0 A
∗ @see SKCards . S u i t S u i t v a l u e s
∗ @SK . bug none
∗/
p u b l i c enum F a c e {TWO, THREE , FOUR, FIVE , SIX , SEVEN , EIGHT , NINE , TEN ,
JACK , QUEEN, KING , ACE, JOKER } ;
C.1.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Suit
/ / A d d i n g S u i t t o SKCards
package SKCards ;
/∗ ∗
∗ D e f i n e s an enum o f t h e p l a y i n g c a r d s u i t s . The v a l u e s a r e Diamonds , Clubs ,
∗ H e a r t s , S p a d e s and N u l l f o r j o k e r s .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 1 . 0 A
∗ @see SKCards . Face Face v a l u e s
∗ @SK . bug none
∗/
p u b l i c enum S u i t {DIAMONDS, CLUBS , HEARTS , SPADES , NULL}
83
84
APPENDIX C. BLACKJACK AI
C.1.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Card
/ / A d d i n g Card t o SKCards
package SKCards ;
/∗
∗ Importing the ArrayList class
∗ and t h e s w i n g / awt c o m p o n e n t s
∗/
import j a v a . u t i l . A r r a y L i s t ;
import j a v a . awt . ∗ ;
import j a v a x . s w i n g . ∗ ;
/∗ ∗
∗ Card r e p r e s e n t s a p l a y i n g c a r d and h a s two s a t e s − s u i t and f a c e v a l u e .
∗ S u i t i s d e f i n e d i n an enum and can be Diamonds , Clubs , H e a r t s , S p a d e s o r
∗ N u l l f o r J o k e r s . The f a c e v a l u e i s a l s o d e f i n e d i n and enum and i s t h e
∗ s t r i n g r e p r e s e n t a t i o n of the card ’ s f a c e value .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 1 . 0 A
∗ @SK . bug none
∗/
p u b l i c c l a s s Card {
private Suit s u i t ;
p r i v a t e Face f a c e V a l u e ;
/∗ ∗
∗ C o n s t r u c t s a new c a r d w i t h t h e v a l u e s p a s s e d a s a r g u m e n t s .
∗ @param v t h e f a c e v a l u e o f t h e c a r d .
∗ @param s t h e s u i t o f t h e c a r d .
∗ @see SKCards . Face Face V a l u e s
∗ @see SKCards . S u i t S u i t V a u l e s
∗/
p u b l i c Card ( F a c e v , S u i t s ) {
faceValue = v ;
suit = s ;
}
/∗ ∗
∗ R e t u r n s t h e f a c e v a l u e o f t h e a c t i v e Card .
∗ @return t h e c o d e o f t h e Face v a l u e .
∗/
public String getValue (){
i f ( t h i s == n u l l ) {
return ” ” ;
}
C.1. SOURCE CODE
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
e l s e i f ( f a c e V a l u e ==
return ”2” ;
}
e l s e i f ( f a c e V a l u e ==
return ”3” ;
}
e l s e i f ( f a c e V a l u e ==
return ”4” ;
}
e l s e i f ( f a c e V a l u e ==
return ”5” ;
}
e l s e i f ( f a c e V a l u e ==
return ”6” ;
}
e l s e i f ( f a c e V a l u e ==
return ”7” ;
}
e l s e i f ( f a c e V a l u e ==
return ”8” ;
}
e l s e i f ( f a c e V a l u e ==
return ”9” ;
}
e l s e i f ( f a c e V a l u e ==
r e t u r n ”T” ;
}
e l s e i f ( f a c e V a l u e ==
return ” J ” ;
}
e l s e i f ( f a c e V a l u e ==
r e t u r n ”Q” ;
}
e l s e i f ( f a c e V a l u e ==
r e t u r n ”K” ;
}
e l s e i f ( f a c e V a l u e ==
r e t u r n ”A” ;
}
e l s e i f ( f a c e V a l u e ==
return ” Joker ” ;
}
else{
return ” ” ;
}
85
F a c e .TWO) {
F a c e . THREE) {
F a c e . FOUR) {
F a c e . FIVE ) {
F a c e . SIX ) {
F a c e . SEVEN) {
F a c e . EIGHT ) {
F a c e . NINE ) {
F a c e . TEN) {
F a c e . JACK ) {
F a c e . QUEEN) {
F a c e . KING ) {
F a c e . ACE) {
F a c e . JOKER ) {
}
/∗ ∗
∗ R e t u r n s t h e f a c e v a l u e o f t h e a c t i v e Card .
∗ @return t h e c o d e o f t h e S u i t v a l u e .
∗/
public String getSuit (){
86
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
APPENDIX C. BLACKJACK AI
i f ( t h i s == n u l l ) {
return ” ” ;
}
e l s e i f ( s u i t == S u i t . NULL) {
return ” ” ;
}
else{
r e t u r n ( new C h a r a c t e r ( s u i t . t o S t r i n g ( ) . c h a r A t ( 0 ) ) ) . t o S t r i n g ( ) ;
}
}
/∗ ∗
∗ G e t s t h e S t r i n g v a l u e o f t h e c a r d i n t t h e f o r m FS ( F a c e S u i t ) , e x c e p t t h e
∗ j o k e r . eg t h e 2 o f H e a r t s i s 2H, t h e 10 o f Diamonds i s TD , t h e King o f
∗ S p a d e s i s KS , t h e J o k e r i s J o k e r .
∗ @return t h e S t r i n g v a l u e o f t h e c a r d
∗/
public String t o S t r i n g (){
i f ( t h i s == n u l l ) {
return ” ” ;
}
e l s e i f ( f a c e V a l u e == F a c e . JOKER ) {
return ” Joker ” ;
}
else{
return ( getValue ( ) + g e t S u i t ( ) ) ;
}
}
/∗ ∗
∗ G e t s t h e image a s s o c i a t e d w i t h t h e g i v e n card , u s i n g t h e c a r d S e t p r o v i d e d .
∗ @param c s t h e c a r d S e t
∗ @return t h e c a r d ’ s image
∗/
public ImageIcon toImage ( S t r i n g cs ){
r e t u r n new I m a g e I c o n ( ” i m a g e s / c a r d s / ” + c s + ” / ”+ t o S t r i n g ( ) +” . j p g ” ) ;
}
/∗ ∗
∗ R e t u r n s t h e image o f t h e c a r d f a c e down , u s i n g t h e b a c k p r o v i d e d
∗ @param b t h e b a c k
∗ @return t h e c a r d ’ s b a c k image
∗/
public ImageIcon toBackImage ( S t r i n g b ){
r e t u r n new I m a g e I c o n ( ” i m a g e s / b a c k s / ” + b ) ;
}
C.1. SOURCE CODE
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
87
/∗ ∗
∗ P r i n t s out the d e s c r i p t i o n of the a c t i v e card .
∗/
public void p r i n t ( ) {
System . o u t . p r i n t l n ( t h i s . t o S t r i n g ( ) ) ;
}
/∗ ∗
∗ Returns t h e p o i n t value of t h e card f o r g e n e r a l uses . This methods c o u n t s
∗ Ace a s 1 , number c a r d s a t f a c e v a l u e , J a c k a s 1 1 , Queen a s 1 2 , King a s 13
∗ and J o k e r a s 0 .
∗ @return t h e p o i n t v a l u e o f t h e c a r d .
∗/
public int getPoints (){
i f ( f a c e V a l u e == F a c e . ACE) {
return 1;
}
e l s e i f ( f a c e V a l u e == F a c e .TWO) {
return 2;
}
e l s e i f ( f a c e V a l u e == F a c e . THREE) {
return 3;
}
e l s e i f ( f a c e V a l u e == F a c e . FOUR) {
return 4;
}
e l s e i f ( f a c e V a l u e == F a c e . FIVE ) {
return 5;
}
e l s e i f ( f a c e V a l u e == F a c e . SIX ) {
return 6;
}
e l s e i f ( f a c e V a l u e == F a c e . SEVEN) {
return 7;
}
e l s e i f ( f a c e V a l u e == F a c e . EIGHT ) {
return 8;
}
e l s e i f ( f a c e V a l u e == F a c e . NINE ) {
return 9;
}
e l s e i f ( f a c e V a l u e == F a c e . TEN) {
return 10;
}
e l s e i f ( f a c e V a l u e == F a c e . JACK ) {
return 11;
}
e l s e i f ( f a c e V a l u e == F a c e . QUEEN) {
return 12;
}
e l s e i f ( f a c e V a l u e == F a c e . KING ) {
88
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
APPENDIX C. BLACKJACK AI
return 13;
}
else{
return 0;
}
}
/∗ ∗
∗ Gets the p o i n t value of a fa c e value card .
∗ @param f a c e V a l u e t h e f a c e v a l u e o f t h e c a r d s .
∗/
p u b l i c s t a t i c i n t g e t P o i n t s ( Face f a c e V a l u e ){
i f ( f a c e V a l u e == F a c e . ACE) {
return 1;
}
e l s e i f ( f a c e V a l u e == F a c e .TWO) {
return 2;
}
e l s e i f ( f a c e V a l u e == F a c e . THREE) {
return 3;
}
e l s e i f ( f a c e V a l u e == F a c e . FOUR) {
return 4;
}
e l s e i f ( f a c e V a l u e == F a c e . FIVE ) {
return 5;
}
e l s e i f ( f a c e V a l u e == F a c e . SIX ) {
return 6;
}
e l s e i f ( f a c e V a l u e == F a c e . SEVEN) {
return 7;
}
e l s e i f ( f a c e V a l u e == F a c e . EIGHT ) {
return 8;
}
e l s e i f ( f a c e V a l u e == F a c e . NINE ) {
return 9;
}
e l s e i f ( f a c e V a l u e == F a c e . TEN) {
return 10;
}
e l s e i f ( f a c e V a l u e == F a c e . JACK ) {
return 11;
}
e l s e i f ( f a c e V a l u e == F a c e . QUEEN) {
return 12;
}
e l s e i f ( f a c e V a l u e == F a c e . KING ) {
return 13;
}
else{
return 0;
C.1. SOURCE CODE
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
}
}
/∗ ∗
∗ R e t u r n s t h e p o i n t s v a l u e o f t h e Card f o r B l a c k j a c k . T h i s
∗ method i s o n l y u s e d by t h e B l a c k J a c k p r o t o t y p e . The SKCards
∗ s y s t e m u s e s t h e B J R u l e s . g e t B J P o i n t s ( Card ) method , w h i c h i s
∗ almost i d e n t i c a l .
∗
∗ @return t h e B l a c k j a c k p o i n t s v a l u e o f t h e Card .
∗/
public int getBJPoints (){
int points = getPoints ( ) ;
i f ( p o i n t s == 10 | | p o i n t s == 11 | | p o i n t s == 12 | | p o i n t s == 1 3 ) {
return 10;
}
else{
return p o i n t s ;
}
}
}
C.1.4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
89
Pack
/ / A d d i n g Pack t o SKCards
package SKCards ;
/∗
∗ Importing the ArrayList class
∗/
import j a v a . u t i l . A r r a y L i s t ;
/∗ ∗
∗ R e p r e s e n t s a p a c k o f c a r d s . A p a c k i s composed o f a number o f d e c k s , w i t h
∗ o r w i t h o u t j o k e r s , b o t h t o be s p e c i f i e d by t h e u s e r .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 1 . 0 A
∗ @see SKCards . Card Card
∗ @SK . bug none
∗/
p u b l i c c l a s s Pack {
private final s t a t i c Suit [] s u i t s = Suit . values ( ) ;
p r i v a t e f i n a l s t a t i c Face [ ] v a l s = Face . v a l u e s ( ) ;
p r i v a t e A r r a y L i s t <Card> p a c k ;
p r o t e c t e d i n t decks , c u r r e n t C a r d ;
p r i v a t e boolean j o k e r s ;
90
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
APPENDIX C. BLACKJACK AI
/∗ ∗
∗ The d e f a u l t c o n s t r u c t e r . C r e a t e s a p a c k o f Cards , w i t h
∗ 1 d e c k and no j o k e r s
∗/
p u b l i c Pack ( ) {
this (1 , false ) ;
}
/∗ ∗
∗ C r e a t e s a new p a c k o f c a r d s . T h i s p a c k i s d e f i n e d by
∗ the parameters passed to i t .
∗
∗ @param d e c k s t h e number o f d e c k s i n t h e p a c k .
∗ @param j o k e r s w h e t h e r t h e d e c k s h a v e j o k e r s o r n o t .
∗ @see SKCards . Card # makeDeck
∗/
p u b l i c Pack ( i n t d , b o o l e a n j ) {
decks = d ; j o k e r s = j ;
p a c k = new A r r a y L i s t <Card > ( ) ;
A r r a y L i s t <Card> d e c k = new A r r a y L i s t <Card > ( ) ;
f o r ( i n t i = 1 ; i <= d e c k s ; i ++){
d e c k = makeDeck ( j o k e r s ) ;
deck = s h u f f l e ( deck ) ;
f o r ( Card c : d e c k ) {
p a c k . add ( c ) ;
}
}
p a c k = Pack . s h u f f l e ( p a c k ) ;
currentCard = 0;
}
/∗ ∗
∗ C r e a t e s a new o r d e r e d d e c k o f c a r d s , w i t h o r w i t h o u t j o k e r s d e p e n d i n g
∗ on t h e a r g u m e n t s p a s s e d .
∗ @param j o k e r s a b o o l e a n w h i c h d e t e r m i n e s i f t h e d e c k h a s j o k e s o r n o t .
∗ @return d e c k an A r r a y L i s t o f t h e c a r d s .
∗/
p u b l i c s t a t i c A r r a y L i s t <Card> makeDeck ( b o o l e a n j o k e r s ) {
A r r a y L i s t <Card> d e c k = new A r r a y L i s t <Card > ( ) ;
f o r ( i n t i = 0 ; i < s u i t s . l e n g t h − 1 ; i ++){
f o r ( i n t j = 0 ; j < v a l s . l e n g t h − 1 ; j ++){
C.1. SOURCE CODE
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
91
d e c k . add ( new Card ( v a l s [ j ] , s u i t s [ i ] ) ) ;
}
}
i f ( j o k e r s ){
d e c k . add ( new Card ( F a c e . JOKER , S u i t . NULL ) ) ;
d e c k . add ( new Card ( F a c e . JOKER , S u i t . NULL ) ) ;
}
return deck ;
}
/∗ ∗
∗ P r i n t s out a l l the cards in the deck .
∗/
public void p r i n t ( ) {
f o r ( Card c : p a c k ) {
c . print ();
}
}
/∗ ∗
∗ T a k e s a p a c k o f c a r d s and s h u f f l e s them .
∗ @param p a c k t h e p a c k o f c a r d s t o be s h u f f l e .
∗ @return t h e s h u f f l e d P a c k .
∗/
p u b l i c s t a t i c A r r a y L i s t <Card> s h u f f l e ( A r r a y L i s t <Card> p a c k ) {
A r r a y L i s t <Boolean > s h u f f l e d = new A r r a y L i s t <Boolean > ( ) ;
A r r a y L i s t <Card> s h u f f l e d P a c k = new A r r a y L i s t <Card > ( ) ;
f o r ( i n t i = 0 ; i < p a c k . s i z e ( ) ; i ++){
s h u f f l e d . add ( f a l s e ) ;
}
while ( s h u f f l e d . c o n t a i n s ( f a l s e ) ) {
Double numb = Math . random ( ) ;
numb ∗= 1 2 5 ;
i n t numbr = numb . i n t V a l u e ( ) ;
i n t num = numbr % p a c k . s i z e ( ) ;
i f ( ! s h u f f l e d . g e t ( num ) ) {
s h u f f l e d P a c k . add ( p a c k . g e t ( num ) ) ;
s h u f f l e d . s e t ( num , t r u e ) ;
}
92
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
APPENDIX C. BLACKJACK AI
}
return sh uffle dPac k ;
}
/∗ ∗
∗ R e g e n e r a t e s t h e Pack , by c r e a t i n g a new Pack . T h i s i s i d e n t i c a l t o t h e
∗ p r e v i o u s o n l y i n t h e number o f d e c k s and t h e p r e s e n c e o f j o k e r s , b u t
∗ t h e o r d e r o f t h e Cards i s d i f f e r e n t .
∗/
p r o t e c t e d v o i d newPack ( ) {
p a c k = new A r r a y L i s t <Card > ( ) ;
A r r a y L i s t <Card> d e c k = new A r r a y L i s t <Card > ( ) ;
f o r ( i n t i = 1 ; i <= d e c k s ; i ++){
d e c k = makeDeck ( j o k e r s ) ;
deck = s h u f f l e ( deck ) ;
f o r ( Card c : d e c k ) {
p a c k . add ( c ) ;
}
}
p a c k = Pack . s h u f f l e ( p a c k ) ;
currentCard = 0;
}
/∗ ∗
∗ G i v e s t h e n e x t Card i n t h e Pack . I f t h e Pack h a s b e e n u s e d
∗ up , a new Pack i s g e n e r e a t e d .
∗
∗ @return t h e n e x t Card
∗/
p u b l i c Card n e x t C a r d ( ) {
i f ( c u r r e n t C a r d >= s i z e ( ) ) {
newPack ( ) ;
}
c u r r e n t C a r d ++;
return pack . g e t ( c u r r e n t C a r d −1);
}
/∗ ∗
∗ R e t u r n s t h e s i z e o f t h e Pack
∗
∗ @return s i z e o f t h e Pack
∗/
public int size (){
C.1. SOURCE CODE
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
return pack . s i z e ( ) ;
}
/∗ ∗
∗ R e t u r n s t h e number o f d e c k s i n t h e Pack
∗
∗ @return t h e number o f d e c k s i n t h e Pack
∗/
public i n t decks ( ) {
return decks ;
}
/∗ ∗
∗ Returns a boolean to i n d i c a t e i f t h e r e are
∗ J o k e r s i n t h e Pack o r n o t
∗
∗ @rerurn t r u e i f t h e r e a r e J o e r s
∗ @return f a l s e i f t h e r e a r e no J o k e r s
∗/
public boolean j o k e r s ( ) {
return j o k e r s ;
}
}
C.1.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
93
Blackjack
/ / A d d i n g B l a c k J a c k t o SKCards
package SKCards ;
/∗
∗ I m p o r t i n g t h e s w i n g / awt c o m p o n e n t s and t h e e v e n t m o d e l s
∗ and t h e I / O c l a s s e s
∗/
import j a v a . awt . ∗ ;
import j a v a . awt . e v e n t . ∗ ;
import j a v a x . s w i n g . ∗ ;
import j a v a x . s w i n g . e v e n t . ∗ ;
import j a v a . i o . ∗ ;
/∗ ∗
∗ B l a c k j a c k game w i t h T e m p o r a l D i f f e r e n c e AI .
∗/
p u b l i c c l a s s B l a c k J a c k e x t e n d s JFrame {
/∗ ∗
∗ T h i s c l a s s r e p r e s e n t s a p l a y e r i n t h e B l a c k J a c k game . I t can be e i t h e r a human
∗ p l a y e r or a R e i n f o r c e m e n t Learning player , u s i n g Temporal D i f f e r a n c e .
∗ @author S a q i b A K a k v i ( 3 3 1 0 6 0 8 7 )
∗ @version 1 . 0
94
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
APPENDIX C. BLACKJACK AI
∗ @rubish y e s i t i s
∗/
private c l a s s BPlayer {
/ / ACTIONS
p r i v a t e s t a t i c f i n a l i n t HIT = 1 ;
p r i v a t e s t a t i c f i n a l i n t STAND = 0 ;
p r i v a t e s t a t i c f i n a l i n t DEAL = −1;
/ / CONSANTS
p r i v a t e f i n a l d o u b l e TAU = 0 . 5 ;
p r i v a t e f i n a l double G = 0 . 7 5 ;
p r i v a t e f i n a l double A = 0 . 6 ;
/ / t h e p l a y e r ’ s hand
p r i v a t e Card [ ] hand ;
/ / the player ’ s funds
private int funds ;
/ / t h e p l a y e r s s c o r e s c o u n t i n g low / h i g h when a p p l i c a b l e
private int score , hscore ;
/ / t h e v a l u e s a t t a c h e d o f t h e s t a t e −a c t i o n p a i r s
p r i v a t e double [ ] [ ] v a l ;
/ / ∗∗∗∗∗ TESTING ∗∗∗∗∗∗ t h e number o f t i m e s t h e a g e n t t a k e s a c e r t a i n a c t i o n
private int [ ] [ ] count ;
/ / t h e p l a y e r s l a s t a c t i o n and s c o r e
private int lastaxn , l a s t s c o r e ;
/ / f l a g s t o i n d i c a t e i f t h e p l a y e r i s p l a y i n g an a c e a s h i g h , i s a g r r e d y a g e n
/ / or i s p l a y i n g a f i x e d h e u r i s t i c p o l i c y
boolean high , greedy , f i x e d ;
/ / ∗∗∗∗∗ TESTING ∗∗∗∗∗∗ w r i t i n g a l l t h e a c t i o n d a t a t o f i l e
BufferedWriter p ;
/∗ ∗
∗ Constructor .
∗/
public BPlayer ( ){
hand = new Card [ 5 ] ;
funds = 1000000;
score = 0;
hscore = 10;
v a l = new d o u b l e [ 2 ] [ 2 1 ] ;
c o u n t = new i n t [ 2 ] [ 2 1 ] ;
l a s t a x n = DEAL ;
l a s t s c o r e = 0;
greedy = f a l s e ;
fixed = false ;
try {
p = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” p l a y e r d u m p . c s v ” ) ) ;
p . w r i t e ( ”SCORE ” ) ;
p . w r i t e ( ”HITPROB ” ) ;
p . w r i t e ( ”STANDPROB ” ) ;
p . w r i t e ( ”RNDM ” ) ;
p . w r i t e ( ”AXN ” ) ;
C.1. SOURCE CODE
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
95
p . newLine ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
/∗ ∗
∗ T h i s method u s e s t h e s o f t m a x method o f e x p l o r a t i o n t o p i c k an a c t i o n i n t h
∗ current s t a t e . A f t e r the action i s picked , the previous s t a t e i s updated b
∗ on t h e Temporal−D i f f e r e n c e Q−L e a r n i n g a l g o r i t h m .
∗/
private void e x p l o r e ( ) {
greedy = f a l s e ;
fixed = false ;
i f ( h i g h && h s c o r e <= 2 1 ) {
score = hscore ;
}
e l s e i f ( h i g h && h s c o r e > 21 && s c o r e == h s c o r e ) {
high = f a l s e ;
s c o r e −= 1 0 ;
}
d o u b l e h i t = Math . exp ( v a l [ HIT ] [ s c o r e − 1 ] /TAU ) ;
d o u b l e s t a n d = Math . exp ( v a l [STAND ] [ s c o r e − 1 ] /TAU ) ;
d o u b l e sum = h i t + s t a n d ;
d o u b l e h p r o b = h i t / sum ;
d o u b l e s p r o b = s t a n d / sum ;
d o u b l e maxval = Math . max ( v a l [STAND ] [ s c o r e −1] , v a l [ HIT ] [ s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = A ∗ ( ( G ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
d o u b l e r = Math . random ( ) ;
lastscore = score ;
try {
p.
p.
p.
p.
if
}
write ( Integer . toString ( score ) ) ; p . write ( ” ” ) ;
w r i t e ( Double . t o S t r i n g ( h p r o b ) ) ; p . w r i t e ( ” ” ) ;
w r i t e ( Double . t o S t r i n g ( s p r o b ) ) ; p . w r i t e ( ” ” ) ;
w r i t e ( Double . t o S t r i n g ( r ) ) ; p . w r i t e ( ” ” ) ;
( r > hprob ){
stand ( ) ;
l a s t a x n = STAND ;
p . w r i t e ( ”STAND” ) ;
96
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
APPENDIX C. BLACKJACK AI
else{
hit ();
l a s t a x n = HIT ;
p . w r i t e ( ” HIT ” ) ;
}
p . newLine ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
c o u n t [ l a s t a x n ] [ l a s t s c o r e −1]++;
}
/∗ ∗
∗ T h i s method p r o v i d e s a g r e e d y a g e n t . T h i s a g e n t i s d e f e n s i v e i n i t b e h a v i o
∗ a s i t s t a n d s when t h e v a l u e o f h i t t i n g i s e q u a l t o t h e v a l u e o f s t a n d i n g .
∗/
private void greedy ( ) {
greedy = true ;
fixed = false ;
i f ( h i g h && h s c o r e <= 2 1 ) {
score = hscore ;
}
e l s e i f ( h i g h && h s c o r e > 21 && s c o r e == h s c o r e ) {
high = f a l s e ;
s c o r e −= 1 0 ;
}
lastscore = score ;
i f ( v a l [ HIT ] [ s c o r e −1] > v a l [STAND ] [ s c o r e −1]){
hit ();
l a s t a x n = HIT ;
}
else{
stand ( ) ;
l a s t a x n = STAND ;
}
c o u n t [ l a s t a x n ] [ l a s t s c o r e −1]++;
}
/∗ ∗
∗ T h i s method p r o v i d e s a f i x e d h e u r i s t i c a g e n t f o r t h e R e i n f o r c e m e n t L e a r n i n
∗ agent to learn against .
∗/
private void f i x e d ( ) {
greedy = f a l s e ;
fixed = true ;
C.1. SOURCE CODE
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
97
i f ( h i g h && h s c o r e <= 2 1 ) {
score = hscore ;
}
e l s e i f ( h i g h && h s c o r e > 21 && s c o r e == h s c o r e ) {
high = f a l s e ;
s c o r e −= 1 0 ;
}
i f ( s c o r e < 17){
hit ();
}
else{
stand ( ) ;
}
}
/∗ ∗
∗ T h i s method i s u s e d t o a l l l o c a t e f i n a l r e w a r d s t o t h e l a s t s t a t e v i s i t e d .
∗ a g e n t s do n o t r e c i e v e r e w a r d s , a s t h e y a r e n o t l e a r n i n g . T h i s r e w a r d f u n c t
∗ b a s e d on t h e T e m p o r a l D i f f e r a n c e Q−L e a r n i n g a l g o r i t h m
∗/
p r i v a t e void reward ( double r ){
i f ( greedy | | fixed ){
return ;
}
d o u b l e maxval = Math . max ( v a l [STAND ] [ l a s t s c o r e −1] , v a l [ HIT ] [ l a s t s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = A ∗ ( r + (G ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
}
}
/ / the playing agents
private BPlayer player , d e a l e r ;
/ / t h e pack o f card
p r i v a t e Pack p a c k ;
/ / t h e p l a y e r s ’ hands
private JLabel [ ] pcards , dcards ;
/ / display of the users funds
private JLabel fundsLabel ;
/ / command b u t t o n s
private JButton hit , stand , betButton , c l e a r ;
/ / panels used to d i s p l a y user components
private JPanel buttonPanel , playerPanel , dealerPanel ;
/ / t h e c o n t e n t pane
p r i v a t e C o n t a i n e r cp ;
/ / t e x t f i e l d f o r u s e r t o e n t e r amount t o b e t
private JTextField betField ;
98
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
APPENDIX C. BLACKJACK AI
/ / amount t h e u s e r h a s b e t
private int bet ;
/ / u s e d t o i n d i c a t e c u r r e n t p o s i t i o n i n t h e p l a y e r s hand
private int currentCard ;
/ / f l a g t o i n d i c a t e i f t h e p l a y e r i s p l a y i n g or not
p r i v a t e boolean play ;
/ / ∗∗∗ TEST ∗∗∗ s t o r e s number o f t i m e s e a c h r e s u l t i s r e c o r d e d .
private int [] r e s u l t ;
/ / ∗∗∗ TEST ∗∗∗ s t o r e s number o f t i m e s e a c h r e s u l t i s r e c o r d e d .
private int [] deals ;
/ / ∗∗∗ TEST ∗∗∗ r e c o r d s t h e s c o r e s o f e a c h p l a y e r when t h e draw .
p r i v a t e B u f f e r e d W r i t e r push ;
/ / ∗∗∗ TEST ∗∗∗ s t o r e s number o f t i m e s t h e p l a y e r b u s t s a t e a c h s c o r e .
private int [] busts ;
/∗ ∗
∗ The main method . C r e a t e s a new B l a c k J a c k i n t e r f a c e .
∗ @param a r g s n o t u s e d
∗/
p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) { new B l a c k J a c k ( ) ; }
/∗ ∗
∗ S o l e c o n s t r u c t o r . C r e a t e s t h e i n t e r f a c e and t h e p l a y e r s . B e g i n s t h e l e a r n i n g
∗ for the agents ( s ) .
∗/
public BlackJack (){
super ( ” BlackJack ” ) ;
setSize (600 ,600);
Dimension d = ( T o o l k i t . g e t D e f a u l t T o o l k i t ( ) ) . g e t S c r e e n S i z e ( ) ;
int h = ( int )( d . getHeight ( ) ) ;
i n t w = ( i n t ) ( d . getWidth ( ) ) ;
i n t x = (w − 6 0 0 ) / 2 ;
i n t y = ( h − 600) / 2;
setLocation (x , y );
s e t D e f a u l t C l o s e O p e r a t i o n ( EXIT ON CLOSE ) ;
p a c k = new Pack ( ) ;
p l a y e r = new B P l a y e r ( ) ;
d e a l e r = new B P l a y e r ( ) ;
play = true ;
r e s u l t = new i n t [ 6 ] ;
d e a l s = new i n t [ 2 1 ] ;
b u s t s = new i n t [ 2 1 ] ;
currentCard = 0;
player . score = 0;
try {
p u s h = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” p u s h . t x t ” ) ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
C.1. SOURCE CODE
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
99
cp = g e t C o n t e n t P a n e ( ) ;
cp . s e t L a y o u t ( new B o r d e r L a y o u t ( ) ) ;
h i t = new J B u t t o n ( ” HIT ” ) ;
h i t . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
hit ();
}
});
hit . setEnabled ( false ) ;
s t a n d = new J B u t t o n ( ”STAND” ) ;
s t a n d . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
stand ( ) ;
}
});
stand . setEnabled ( false ) ;
b e t F i e l d = new J T e x t F i e l d ( ” 10 ” ) ;
b e t B u t t o n = new J B u t t o n ( ”BET” ) ;
b e t B u t t o n . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
bet ( ) ;
}
});
c l e a r = new J B u t t o n ( ”CLEAR” ) ;
c l e a r . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
clear ();
}
});
b u t t o n P a n e l = new J P a n e l ( ) ;
b u t t o n P a n e l . s e t L a y o u t ( new G r i d L a y o u t ( 5 , 1 ) ) ;
buttonPanel
buttonPanel
buttonPanel
buttonPanel
buttonPanel
. add ( h i t ) ;
. add ( s t a n d ) ;
. add ( b e t F i e l d ) ;
. add ( b e t B u t t o n ) ;
. add ( c l e a r ) ;
p c a r d s = new J L a b e l [ 5 ] ;
d c a r d s = new J L a b e l [ 5 ] ;
f u n d s L a b e l = new J L a b e l ( I n t e g e r . t o S t r i n g ( p l a y e r . f u n d s ) ) ;
p l a y e r P a n e l = new J P a n e l ( ) ;
p l a y e r P a n e l . s e t L a y o u t ( new G r i d L a y o u t ( 1 , 6 ) ) ;
100
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
APPENDIX C. BLACKJACK AI
d e a l e r P a n e l = new J P a n e l ( ) ;
d e a l e r P a n e l . s e t L a y o u t ( new G r i d L a y o u t ( 1 , 6 ) ) ;
f o r ( i n t i = 0 ; i < 5 ; i ++){
p c a r d s [ i ] = new J L a b e l ( ) ;
d c a r d s [ i ] = new J L a b e l ( ) ;
p l a y e r P a n e l . add ( p c a r d s [ i ] ) ;
d e a l e r P a n e l . add ( d c a r d s [ i ] ) ;
}
p l a y e r P a n e l . add ( f u n d s L a b e l ) ;
d e a l e r P a n e l . add ( new J L a b e l ( ) ) ;
cp . add ( d e a l e r P a n e l , B o r d e r L a y o u t . NORTH ) ;
cp . add ( b u t t o n P a n e l , B o r d e r L a y o u t . EAST ) ;
cp . add ( p l a y e r P a n e l , B o r d e r L a y o u t . SOUTH ) ;
s e t V i s i b l e ( true ) ;
int c = 0;
f o r ( i n t i = 0 ; i < 1 0 0 0 ; i ++){
bet ( ) ;
while ( play ){
player . explore ( ) ;
}
while ( ! play ){
dealer . fixed ( ) ;
}
System . o u t . p r i n t l n ( ” p l a y e d ” + c ) ;
c ++;
}
write (1);
f o r ( i n t i = 0 ; i < 9 0 0 0 ; i ++){
bet ( ) ;
while ( play ){
player . explore ( ) ;
}
while ( ! play ){
dealer . fixed ( ) ;
}
System . o u t . p r i n t l n ( ” p l a y e d ” + c ) ;
c ++;
}
write (2);
f o r ( i n t i = 0 ; i < 4 0 0 0 0 ; i ++){
bet ( ) ;
while ( play ){
player . explore ( ) ;
}
while ( ! play ){
C.1. SOURCE CODE
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
101
dealer . fixed ( ) ;
}
System . o u t . p r i n t l n ( ” p l a y e d ” + c ) ;
c ++;
}
write (3);
r e s u l t = new i n t [ 6 ] ;
b u s t s = new i n t [ 2 1 ] ;
p l a y e r . c o u n t = new i n t [ 2 ] [ 2 1 ] ;
f o r ( i n t i = 0 ; i < 5 0 0 0 ; i ++){
bet ( ) ;
while ( play ){
player . greedy ( ) ;
}
while ( ! play ){
dealer . fixed ( ) ;
}
System . o u t . p r i n t l n ( ” p l a y e d ” + c ) ;
c ++;
}
write (4);
writecount ( ) ;
try {
push . f l u s h ( ) ;
push . c l o s e ( ) ;
player . p . flush ( ) ;
player . p . close ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
/∗ ∗
∗ ∗∗∗∗∗ TESTING METHOD∗∗∗∗∗∗ 
∗ W r i t e s a l l t h e d a t a c o l l e c t e d t o . c s v f i l e s , w h i c h can be i m p o r t e d i n t o an
∗ s p r e a d s h e e t package f o r f u r t h e r data a n a l y s i s .
∗ @param x t h e number t a g f o r t h e f i l e .
∗/
private void w r i t e ( i n t x ){
try {
B u f f e r e d W r i t e r bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” r e s u l t s ” + x + ” . c
bw .
bw .
bw .
bw .
bw .
w r i t e ( ” P l a y e r Bust ” ) ;
write ( ”\ t ” ) ;
w r i t e ( ” Dealer Bust ” ) ;
write ( ”\ t ” ) ;
w r i t e ( ” Both B u s t ” ) ;
102
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
APPENDIX C. BLACKJACK AI
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” P l a y e r w i n s ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” P l a y e r L o s e s ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” Push ” ) ;
bw . newLine ( ) ;
f o r ( i n t i = 0 ; i < r e s u l t . l e n g t h ; i ++){
bw . w r i t e ( I n t e g e r . t o S t r i n g ( r e s u l t [ i ] ) ) ;
bw . w r i t e ( ” \ t ” ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” p v a l s ” + x + ” . c s v ” ) ) ;
bw . w r i t e ( ” s c o r e ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” s t a n d ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” h i t ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . newLine ( ) ;
f o r ( i n t i = 0 ; i < p l a y e r . v a l [ 0 ] . l e n g t h ; i ++){
bw . w r i t e ( I n t e g e r . t o S t r i n g ( i ) ) ;
bw . w r i t e ( ” \ t ” ) ;
f o r ( i n t j = 0 ; j < p l a y e r . v a l . l e n g t h ; j ++){
bw . w r i t e ( Double . t o S t r i n g ( p l a y e r . v a l [ j ] [ i ] ) ) ;
bw . w r i t e ( ” \ t ” ) ;
}
bw . newLine ( ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” d v a l s ” + x + ” . c s v ” ) ) ;
bw . w r i t e ( ” s c o r e ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” s t a n d ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( ” h i t ” ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . newLine ( ) ;
f o r ( i n t i = 0 ; i < p l a y e r . c o u n t [ 0 ] . l e n g t h ; i ++){
bw . w r i t e ( I n t e g e r . t o S t r i n g ( i ) ) ;
bw . w r i t e ( ” \ t ” ) ;
f o r ( i n t j = 0 ; j < p l a y e r . c o u n t . l e n g t h ; j ++){
bw . w r i t e ( Double . t o S t r i n g ( p l a y e r . c o u n t [ j ] [ i ] ) ) ;
C.1. SOURCE CODE
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
103
bw . w r i t e ( ” \ t ” ) ;
}
bw . newLine ( ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” b u s t s ” + x + ” . c s v ” ) ) ;
f o r ( i n t i = 0 ; i < b u s t s . l e n g t h ; i ++){
bw . w r i t e ( I n t e g e r . t o S t r i n g ( i ) ) ;
bw . w r i t e ( ” \ t ” ) ;
bw . w r i t e ( I n t e g e r . t o S t r i n g ( b u s t s [ i ] ) ) ;
bw . newLine ( ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
public void w r i t e c o u n t ( ) {
try {
B u f f e r e d W r i t e r bw = new B u f f e r e d W r i t e r ( new F i l e W r i t e r ( ” c o u n t . t x t ” ) ) ;
f o r ( i n t i = 0 ; i < 2 ; i ++){
f o r ( i n t j = 0 ; j < 2 1 ; j ++){
bw . w r i t e ( new I n t e g e r ( p l a y e r . c o u n t [ j ] [ i ] ) . t o S t r i n g ( ) ) ;
bw . w r i t e ( ” ” ) ;
}
bw . newLine ( ) ;
}
bw . f l u s h ( ) ;
bw . c l o s e ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
}
}
/∗ ∗
∗ T h i s method i s u s e d t o u p d a t e t h e GUI t o t h e m o s t c u r r e n t s t a t e . E v e r y c a l l o f
∗ m e t h o d s r e d r a w s a l l e l e m e n t s on s c r e e n and t h e n r e p a i n t s t h e JFrame .
∗ @see c l e a r
∗/
private void r e f r e s h ( ) {
i f ( play ){
f o r ( i n t i = 0 ; i < 5 ; i ++){
i f ( p l a y e r . hand [ i ] ! = n u l l ) {
104
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
APPENDIX C. BLACKJACK AI
p c a r d s [ i ] . s e t I c o n ( p l a y e r . hand [ i ] . t o I m a g e ( ” Normal ” ) ) ;
}
}
d c a r d s [ 0 ] . s e t I c o n ( d e a l e r . hand [ 0 ] . t o I m a g e ( ” Normal ” ) ) ;
d c a r d s [ 1 ] . s e t I c o n ( d e a l e r . hand [ 1 ] . t o B a c k I m a g e ( ” T r i c k s t e r ” ) ) ;
}
else{
f o r ( i n t i = 0 ; i < 5 ; i ++){
i f ( d e a l e r . hand [ i ] ! = n u l l ) {
d c a r d s [ i ] . s e t I c o n ( d e a l e r . hand [ i ] . t o I m a g e ( ” Normal ” ) ) ;
}
}
}
fundsLabel . setText ( I n t e g e r . t o S t r i n g ( player . funds ) ) ;
( ( JComponent ) ( cp ) ) . r e v a l i d a t e ( ) ;
repaint ();
}
/∗ ∗
∗ B e t s money f o r t h e p l a y e r . The amount t o be b e t i s p a r s e d f r o m b e t F i e l d . b e t F i
∗ i s s e t t o 10 by d e f a u l t . T h i s i s s o t h e AI d o e s n o t h a v e t o be made t o b e t , a s
∗ can s i m p l y c a l l t h e method .
∗/
private void bet ( ) {
bet = Integer . parseInt ( betField . getText ( ) ) ;
i f ( b e t <= p l a y e r . f u n d s ) {
p l a y e r . f u n d s −= b e t ;
deal ( ) ;
h i t . setEnabled ( true ) ;
stand . setEnabled ( true ) ;
betButton . setEnabled ( false ) ;
}
refresh ();
}
/∗ ∗
∗ D e a l s a new hand . The p l a y e r / d e a l e r h a n d s a r e r e i n i t i a l i s e d and t h e p o i n t e r a r
∗ a l l r e s e t . Both s c o r e s are t h e n updated .
∗ @see h i t
∗ @see s t a n d
∗/
private void d e al ( ) {
d e a l e r . hand [ 0 ] = p a c k . n e x t C a r d ( ) ;
d e a l e r . s c o r e += d e a l e r . hand [ 0 ] . g e t B J P o i n t s ( ) ;
i f ( d e a l e r . hand [ 0 ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
d e a l e r . high = true ;
dealer . hscore = dealer . score + 10;
C.1. SOURCE CODE
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
105
}
d e a l e r . hand [ 1 ] = p a c k . n e x t C a r d ( ) ;
d e a l e r . s c o r e += d e a l e r . hand [ 0 ] . g e t B J P o i n t s ( ) ;
i f ( d e a l e r . hand [ 1 ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
d e a l e r . high = true ;
dealer . hscore = dealer . score + 10;
}
d e a l e r . l a s t a x n = B P l a y e r . DEAL ;
i f ( d e a l e r . h i g h && d e a l e r . h s c o r e <= 2 1 ) {
dealer . score = dealer . hscore ;
}
p l a y e r . hand [ 0 ] = p a c k . n e x t C a r d ( ) ;
p l a y e r . s c o r e += p l a y e r . hand [ 0 ] . g e t B J P o i n t s ( ) ;
i f ( p l a y e r . hand [ 0 ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
player . high = true ;
player . hscore = player . score + 10;
}
p l a y e r . hand [ 1 ] = p a c k . n e x t C a r d ( ) ;
p l a y e r . s c o r e += p l a y e r . hand [ 1 ] . g e t B J P o i n t s ( ) ;
i f ( p l a y e r . hand [ 1 ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
player . high = true ;
player . hscore = player . score + 10;
}
i f ( p l a y e r . h i g h && p l a y e r . h s c o r e <= 2 1 ) {
player . score = player . hscore ;
}
d e a l s [ p l a y e r . s c o r e −1]++;
p l a y e r . l a s t a x n = B P l a y e r . DEAL ;
currentCard = 2;
}
/∗ ∗
∗ T h i s method a d d s a Card t o t h e a c t i v e B P l a y e r ’ s hand . The p o i n t e r i s moved t o
∗ n e x t Card . The p l a y e r i s f o r c e d t o s t a n d i f t h e y e x c e e d a s c o r e o f 21 o r h a v e
∗ cards .
∗ @see d e a l
∗ @see s t a n d
∗/
private void h i t ( ) {
i f ( play ){
p l a y e r . hand [ c u r r e n t C a r d ] = p a c k . n e x t C a r d ( ) ;
p l a y e r . s c o r e += p l a y e r . hand [ c u r r e n t C a r d ] . g e t B J P o i n t s ( ) ;
i f ( p l a y e r . hand [ c u r r e n t C a r d ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
player . high = true ;
p l a y e r . h s c o r e += p l a y e r . hand [ c u r r e n t C a r d ] . g e t B J P o i n t s ( ) ;
}
refresh ();
c u r r e n t C a r d ++;
i f ( p l a y e r . s c o r e > 21 | | c u r r e n t C a r d == 5 ) {
stand ( ) ;
}
106
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
APPENDIX C. BLACKJACK AI
}
else{
d e a l e r . hand [ c u r r e n t C a r d ] = p a c k . n e x t C a r d ( ) ;
d e a l e r . s c o r e += d e a l e r . hand [ c u r r e n t C a r d ] . g e t B J P o i n t s ( ) ;
i f ( d e a l e r . hand [ c u r r e n t C a r d ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
d e a l e r . high = true ;
d e a l e r . h s c o r e += d e a l e r . hand [ c u r r e n t C a r d ] . g e t B J P o i n t s ( ) ;
}
refresh ();
c u r r e n t C a r d ++;
i f ( d e a l e r . s c o r e > 21 | | c u r r e n t C a r d == 5 ) {
stand ( ) ;
}
}
}
/∗ ∗
∗ S t a n d s f o r t h e a g e n t t h a t c a l l e d t h e method . I f i t i s t h e p l a y e r , t h e n t h e p l a
∗ i s p a s s e d on t o t h e d e a l e r . I f i t i s t h e d e a l e r , t h e n t h e r e s u l t i s c a l c u l a t e d
∗ and t h e hand e n d s
∗ @see d e a l
∗ @see h i t
∗/
private void s t a n d ( ) {
i f ( play ){
play = f a l s e ;
currentCard = 2;
refresh ();
}
else{
i f ( p l a y e r . s c o r e > 21 && d e a l e r . s c o r e <= 2 1 ) {
/ / player bust
p l a y e r . reward ( −20);
r e s u l t [0]++;
b u s t s [ p l a y e r . l a s t s c o r e −1]++;
}
e l s e i f ( p l a y e r . s c o r e <= 21 && d e a l e r . s c o r e > 2 1 ) {
/ / dealer bust
p l a y e r . f u n d s += b e t ∗ 2 ;
p l a y e r . reward ( 1 0 ) ;
r e s u l t [1]++;
}
e l s e i f ( p l a y e r . s c o r e > 21 && d e a l e r . s c o r e > 2 1 ) {
/ / d e a l e r and p l a y e r b u s t
p l a y e r . reward ( −10);
r e s u l t [2]++;
b u s t s [ p l a y e r . l a s t s c o r e −1]++;
}
e l s e i f ( player . score > d e a l e r . score ){
/ / p l a y e r wins
p l a y e r . f u n d s += b e t ∗ 2 ; ;
p l a y e r . reward ( 1 0 ) ;
r e s u l t [3]++;
C.1. SOURCE CODE
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
107
}
e l s e i f ( player . score < d e a l e r . score ){
/ / player loses
p l a y e r . reward ( −10);
r e s u l t [4]++;
}
else{
/ / push
p l a y e r . f u n d s += b e t ;
p l a y e r . reward ( 5 ) ;
r e s u l t [5]++;
try {
push . w r i t e ( I n t e g e r . t o S t r i n g ( p l a y e r . s c o r e ) ) ;
push . w r i t e ( ” ” ) ;
push . w r i t e ( I n t e g e r . t o S t r i n g ( d e a l e r . s c o r e ) ) ;
p u s h . newLine ( ) ;
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
bet = 0;
clear ();
play = true ;
}
}
/∗ ∗
∗ C l e a r s a l l GUI e l e m e n t s and r e s e t s a l l v a l u e s t o t h i e r
∗ @see r e f r e s h
∗/
private void c l e a r ( ) {
i n t i t i a l values .
p l a y e r P a n e l . removeAll ( ) ;
d e a l e r P a n e l . removeAll ( ) ;
f o r ( i n t i = 0 ; i < 5 ; i ++){
p l a y e r . hand [ i ] = n u l l ;
d e a l e r . hand [ i ] = n u l l ;
p c a r d s [ i ] = new J L a b e l ( ) ;
d c a r d s [ i ] = new J L a b e l ( ) ;
p l a y e r P a n e l . add ( p c a r d s [ i ] ) ;
d e a l e r P a n e l . add ( d c a r d s [ i ] ) ;
}
p l a y e r P a n e l . add ( f u n d s L a b e l ) ;
bet = 0;
currentCard = 0;
player . score = 0;
dealer . score = 0;
d e a l e r . high = f a l s e ;
player . high = f a l s e ;
f u n d s L a b e l . s e t T e x t ( ( new I n t e g e r ( p l a y e r . f u n d s ) ) . t o S t r i n g ( ) ) ;
108
781
782
783
784
785
786
APPENDIX C. BLACKJACK AI
betButton . setEnabled ( true ) ;
hit . setEnabled ( false ) ;
stand . setEnabled ( false ) ;
repaint ();
}
}
Appendix D
Final System
D.1
Source Code
D.1.1
Face
See Appendix C.1.1
D.1.2
Suit
See Appendix C.1.2
D.1.3
Card
See Appendix C.1.3
D.1.4
Pack
See Appendix C.1.2
D.1.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Shoe
/ / A d d i n g Shoe t o t h e SKCards p a c k a g e
package SKCards ;
/∗ ∗
∗ T h i s o b j e c t r e p r e s e n t s a D e a l e r ’ s s h o e . T h i s i s a number o f d e c k s
∗ o f c a r d s i n a c l o s e d c o n t a i n e r . T h e r e i s a s p e c a i a l c u t card , w h i c h
∗ i s i n s e r t e d r a n d o m l y i n t o t h e Shoe . When t h i s i s d e a l t , t h e s h o e i s
∗ d i s c a r d e d and a new one i s u s e d . S h o e s a r e u s e d t o c o u n t e r Card c o u n t i n g .
∗
∗ @author S a q i b A K a k v i
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @see SKCards . Pack Pack
∗ @SK . bug none
∗/
p u b l i c c l a s s Shoe e x t e n d s Pack {
/ / the position of the c u t t o f f
109
110
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
APPENDIX D. FINAL SYSTEM
int cutoff ;
/ / t h e s i z e o f t h e Shoe
int size ;
/∗ ∗
∗ C o n s t r u c t o r . C r e a t e s a p a c k w i t h t h e s p e c i f i e d p a r a m e t e r s by c a l l i n g
∗ s u p e r ( d , j ) . The c u t o f f Card ’ s p o s i t i o n i s c a l c u l a t e d and i s s t o r e d .
∗ No e x t r a Card i s added t o t h e Pack , s o t h e s i z e r e m a i n s t h e same .
∗
∗ @param d t h e number o f d e c k s
∗ @param j i f t h e r e a r e j o k e r s i n t h e d e c k
∗/
p u b l i c Shoe ( i n t d , b o o l e a n j ) {
super ( d , j ) ;
s i z e = super . s i z e ( ) ;
calculateCutoff ();
}
/∗ ∗
∗ C a l c u l a t e s t h e c u t o f f Card p o s i t i o n . The Card w i l l be i n t h e r a n g e o f
∗ 0−85% o f t h e Shoe s i z e . Hence a maximum o f o f 85% o f t h e Pack i s
∗ played .
∗/
private void c a l c u l a t e C u t o f f ( ) {
c u t o f f = new Double ( Math . random ( ) ∗ 100 ∗ 0 . 8 5 ) . i n t V a l u e ( ) ∗ s i z e ;
}
/∗ ∗
∗ C h e c k s i f t h e c u r r e n t Card i s t h e c u t o f f Card . I f i t i s t h e n a new Pack
∗ i s c r e a t e d and t h e c u t o f f Card ’ s p o s i t i o n i s r e c a l c u l a t e d . Then t h e
∗ method r e t u r n s t h e Card f r o m s u p e r . n e x t C a r d ( ) .
∗
∗ @ o v e r r i d e SKCards . Pack n e x t C a r d
∗ @return t h e n e x t Card i n t h e Shoe
∗/
p u b l i c Card n e x t C a r d ( ) {
i f ( c u r r e n t C a r d == c u t o f f ) {
newPack ( ) ;
calculateCutoff ();
}
return super . nextCard ( ) ;
}
}
D.1.6
1
2
3
4
BJPlayer
/ / A d d i n g P l a y e r t o SKCards
package SKCards ;
D.1. SOURCE CODE
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
/∗
∗ Importing the ArrayList class
∗ and t h e A r r a y s c l a s s
∗ and t h e s w i n g / awt c o m p o n e n t s
∗/
import j a v a . u t i l . A r r a y L i s t ;
import j a v a . u t i l . A r r a y s ;
import j a v a . awt . ∗ ;
import j a v a x . s w i n g . ∗ ;
/∗ ∗
∗ T h i s i s a p l a y e r i n t h e B l a c k J a c k game
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 1 . 0 C
∗/
public c l a s s BJPlayer extends Player {
protected int score , hscore ;
p r o t e c t e d Card [ ] hand ;
protected BJRules r u l e s ;
boolean high ;
/∗ ∗
∗ C r e a t e s a new p l a y e r o b j e c t w i t h a g i v e n d e t a i l s .
∗
∗ @param i d t h e p l a y e r name
∗ @param c a s h t h e amount o f money t h e p l a y e r s t a r t s w i t h
∗ @param s e t P a c k t h e p a c k o f c a r d s t o be u s e d i n t h e game
∗ @param b c k t h e c a r d b a c k t h e p l a y e r u s e s
∗ @param c s t h e c a r d s e t u s e d by t h e p l a y e r
∗/
p u b l i c B J P l a y e r ( S t r i n g i d , i n t c a s h , S t r i n g bck , S t r i n g cs ,
BJRules r ){
rules = r ;
funds = cash ;
name = i d ;
b a c k = bck ;
cardSet = cs ;
high = f a l s e ;
score = 0;
hscore = 10;
hand = new Card [ 5 ] ;
}
/∗ ∗
∗ B a s i c c o n s t r u c t o r . S i m p l y t a k e s i n t h e r u l e s . Used f o r
111
112
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
APPENDIX D. FINAL SYSTEM
∗ BJRLPlayer
∗
∗ @param r t h e r u l e s
∗/
public BJPlayer ( BJRules r ){
rules = r ;
}
/∗ ∗
∗ C r e a t e s a new hand o f c a r d s f o r t h e c u r r e n t p l a y e r and one f o r t h e c o m p t e r
∗ and t h e n d e a l s two c a r d s t o e a c h hand
∗/
p u b l i c v o i d newHand ( ) {
hand = new Card [ 5 ] ;
score = 0;
}
/∗ ∗
∗ H i t s t h e p l a y e r ’ s hand
∗
∗ @param c u r r e n t C a r d t h e c u r r e n t c a r d
∗ @param n e x t C a r d t h e c a r d t o be added
∗/
p u b l i c i n t h i t ( i n t c u r r e n t C a r d , Card n e x t C a r d ) {
hand [ c u r r e n t C a r d ] = n e x t C a r d ;
s c o r e += r u l e s . g e t B J P o i n t s ( hand [ c u r r e n t C a r d ] ) ;
i f ( hand [ c u r r e n t C a r d ] . g e t V a l u e ( ) . e q u a l s ( ”A” ) ) {
high = true ;
h s c o r e += r u l e s . g e t B J P o i n t s ( hand [ c u r r e n t C a r d ] ) ;
}
i f ( h i g h && h s c o r e <= 2 1 ) {
score = hscore ;
}
e l s e i f ( h i g h && h s c o r e > 2 1 ) {
i f ( h s c o r e == s c o r e ) {
s c o r e −= 1 0 ;
}
high = f a l s e ;
}
return score ;
}
/∗ ∗
∗ Returns the players current score
∗
∗ @return s c o r e
∗/
D.1. SOURCE CODE
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
113
public int getScore (){
return score ;
}
/∗ ∗
∗ Ends t h e c u r r e n t hand
∗/
p u b l i c v o i d endHand ( ) {
i n t comp = 0 ;
int player = 0;
System . o u t . p r i n t l n ( ” p l a y e r −−>” + p l a y e r ) ;
System . o u t . p r i n t l n ( ” c o m p u t e r −−>” + comp ) ;
i f ( comp > 0 | | p l a y e r > 0 ) {
i f ( comp > p l a y e r ) {
JOptionPane . showMessageDialog ( null ,
rules . display ( ) ;
}
e l s e i f ( p l a y e r > comp ) {
JOptionPane . showMessageDialog ( null ,
f u n d s += b e t ;
rules . display ( ) ;
}
e l s e i f ( comp == p l a y e r ) {
JOptionPane . showMessageDialog ( null ,
f u n d s += b e t ;
rules . display ( ) ;
}
else{
JOptionPane . showMessageDialog ( null ,
}
}
else{
J O p t i o n P a n e . s h o w M e s s a g e D i a l o g ( n u l l , ”NO
rules . display ( ) ;
}
}
/∗ ∗
∗ C l e a r s a l l t h e h a n d s and c l e a r s t h e i n t e r f a c e
∗/
public void c l e a r ( ) {
f o r ( i n t i = 0 ; i < hand . l e n g t h ; i ++){
hand [ i ] = n u l l ;
}
rules . clear ();
}
”HOUSE WINS ! ! ! ” ) ;
”PLAYER WINS ! ! ! ” ) ;
”PUSH ! ! ! ” ) ;
”NO RESULT FOUND” ) ;
RESULT FOUND” ) ;
114
167
168
169
170
171
172
173
174
175
176
APPENDIX D. FINAL SYSTEM
/∗ ∗
∗ Gets t h e p l a y e r ’ s c u r r e n t hands
∗ @return t h e p l a y e r ’ s h a n d s
∗/
p u b l i c Card [ ] g e t H a n d ( ) {
r e t u r n hand ;
}
}
D.1.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
BJUI
/ / a d d i n g BJUI t o SKCards
package SKCards ;
/∗
∗
∗/
import
import
import
import
j a v a . awt . ∗ ;
j a v a . awt . e v e n t . ∗ ;
j a v a x . swing . ∗ ;
j a v a x . swing . e v e n t . ∗ ;
/∗ ∗
∗ T h i s i s t h e User I n t e r f a c e f o r t h e B l a c k j a c k Game .
∗ I t a l l o w s t h e human p l a y e r t o a c c e s s t h e f u n c t i o n s
∗ a n s shows t h e c u r r e n t s t a t e o f t h e game .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @SK . bug none
∗/
p u b l i c c l a s s BJUI e x t e n d s JFrame implements UI{
/ / t h e p l a y e r s ’ hands
private JLabel [ ] pcards , dcards ;
/ / display of the users funds
private JLabel fundsLabel ;
/ / command b u t t o n s
private JButton hit , stand , bet , c l e a r , e x i t ;
/ / panels used to d i s p l a y user components
private JPanel buttonPanel , playerPanel , dealerPanel ;
/ / t h e c o n t e n t pane
p r i v a t e C o n t a i n e r cp ;
/ / t e x t f i e l d f o r u s e r t o e n t e r amount t o b e t
private JTextField betField ;
/ / t h e game r u l e s
BJRules r u l e s ;
/∗ ∗
∗ C r e a t e s a new UI . T a k e s i n t h e r u l e s f o r t h e c u r r e n t game .
∗ T h i s a l l o w s t h e two o b j e c t s t o c o m m u n i c a t e .
D.1. SOURCE CODE
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
∗
∗ @param r t h e R u l e s f o r t h e game
∗/
p u b l i c BJUI ( B J R u l e s r ) {
super ( ” BlackJack ” ) ;
setSize (600 ,600);
Dimension d = ( T o o l k i t . g e t D e f a u l t T o o l k i t ( ) ) . g e t S c r e e n S i z e ( ) ;
int h = ( int )( d . getHeight ( ) ) ;
i n t w = ( i n t ) ( d . getWidth ( ) ) ;
i n t x = (w − 6 0 0 ) / 2 ;
i n t y = ( h − 600) / 2;
s e t D e f a u l t C l o s e O p e r a t i o n ( DO NOTHING ON CLOSE ) ;
setLocation (x , y );
rules = r ;
cp = g e t C o n t e n t P a n e ( ) ;
cp . s e t L a y o u t ( new B o r d e r L a y o u t ( ) ) ;
h i t = new J B u t t o n ( ” HIT ” ) ;
h i t . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
rules . hit ();
}
});
hitEnabled ( false );
s t a n d = new J B u t t o n ( ”STAND” ) ;
s t a n d . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
rules . stand ( ) ;
}
});
standEnabled ( false ) ;
b e t F i e l d = new J T e x t F i e l d ( ” 10 ” ) ;
b e t = new J B u t t o n ( ”BET” ) ;
b e t . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
rules . bet ( Integer . parseInt ( betField . getText ( ) ) ) ;
}
});
betEnabled ( true ) ;
c l e a r = new J B u t t o n ( ”CLEAR” ) ;
c l e a r . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
clear ();
}
115
116
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
APPENDIX D. FINAL SYSTEM
});
e x i t = new J B u t t o n ( ” EXIT ” ) ;
e x i t . a d d M o u s e L i s t e n e r ( new MouseAdapter ( ) {
p u b l i c v o i d m o u s e C l i c k e d ( MouseEvent e ) {
BJUI . t h i s . d i s p o s e ( ) ;
}
});
b u t t o n P a n e l = new J P a n e l ( ) ;
b u t t o n P a n e l . s e t L a y o u t ( new G r i d L a y o u t ( 6 , 1 ) ) ;
buttonPanel
buttonPanel
buttonPanel
buttonPanel
buttonPanel
buttonPanel
. add ( h i t ) ;
. add ( s t a n d ) ;
. add ( b e t F i e l d ) ;
. add ( b e t ) ;
. add ( c l e a r ) ;
. add ( e x i t ) ;
p c a r d s = new J L a b e l [ 5 ] ;
d c a r d s = new J L a b e l [ 5 ] ;
f u n d s L a b e l = new J L a b e l ( ) ;
playerPanel
playerPanel
dealerPanel
dealerPanel
= new J P a n e l ( ) ;
. s e t L a y o u t ( new G r i d L a y o u t ( 1 , 6 ) ) ;
= new J P a n e l ( ) ;
. s e t L a y o u t ( new G r i d L a y o u t ( 1 , 6 ) ) ;
f o r ( i n t i = 0 ; i < 5 ; i ++){
p c a r d s [ i ] = new J L a b e l ( ) ;
d c a r d s [ i ] = new J L a b e l ( ) ;
p l a y e r P a n e l . add ( p c a r d s [ i ] ) ;
d e a l e r P a n e l . add ( d c a r d s [ i ] ) ;
}
p l a y e r P a n e l . add ( f u n d s L a b e l ) ;
d e a l e r P a n e l . add ( new J L a b e l ( ) ) ;
cp . add ( d e a l e r P a n e l , B o r d e r L a y o u t . NORTH ) ;
cp . add ( b u t t o n P a n e l , B o r d e r L a y o u t . EAST ) ;
cp . add ( p l a y e r P a n e l , B o r d e r L a y o u t . SOUTH ) ;
s e t V i s i b l e ( true ) ;
}
/∗ ∗
∗ R e f r e s h e s t h e GUI t o show t h e c u r r e n t s i t u a t i o n .
∗ @override d i s p l a y
∗/
public void d i s p l a y ( ) {
boolean play = r u l e s . g e t P l a y ( ) ;
D.1. SOURCE CODE
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
117
Card [ ] phand = r u l e s . g e t P l a y e r H a n d ( r u l e s . PLAYER ) ;
Card [ ] c h a n d = r u l e s . g e t P l a y e r H a n d ( r u l e s . DEALER ) ;
S t r i n g c s = r u l e s . g e t P l a y e r C a r d S e t ( r u l e s . PLAYER ) ;
S t r i n g bk = r u l e s . g e t P l a y e r B a c k ( r u l e s . PLAYER ) ;
f o r ( i n t i = 0 ; i < phand . l e n g t h ; i ++){
i f ( phand [ i ] ! = n u l l ) {
p c a r d s [ i ] . s e t I c o n ( phand [ i ] . t o I m a g e ( c s ) ) ;
}
}
d c a r d s [ 0 ] . s e t I c o n ( chand [ 0 ] . toImage ( cs ) ) ;
i f ( play ){
d c a r d s [ 1 ] . s e t I c o n ( c h a n d [ 1 ] . t o B a c k I m a g e ( bk ) ) ;
}
else{
f o r ( i n t i = 1 ; i < c h a n d . l e n g t h ; i ++){
i f ( chand [ i ] != n u l l ){
d c a r d s [ i ] . s e t I c o n ( chand [ i ] . toImage ( cs ) ) ;
}
}
}
f u n d s L a b e l . s e t T e x t ( r u l e s . g e t P l a y e r F u n d s ( r u l e s . PLAYER ) . t o S t r i n g ( ) ) ;
repaint ();
}
/∗ ∗
∗ D i s p l a y s a new hand on t h e GUI .
∗ @override newDisplay
∗/
public void newDisplay ( ) {
Card [ ] phand = r u l e s . g e t P l a y e r H a n d ( r u l e s . PLAYER ) ;
Card [ ] c h a n d = r u l e s . g e t P l a y e r H a n d ( r u l e s . DEALER ) ;
S t r i n g c s = r u l e s . g e t P l a y e r C a r d S e t ( r u l e s . PLAYER ) ;
S t r i n g bk = r u l e s . g e t P l a y e r B a c k ( r u l e s . PLAYER ) ;
f o r ( i n t i = 0 ; i < phand . l e n g t h ; i ++){
i f ( phand [ i ] ! = n u l l ) {
p c a r d s [ i ] . s e t I c o n ( phand [ i ] . t o I m a g e ( c s ) ) ;
}
}
d c a r d s [ 0 ] . s e t I c o n ( chand [ 0 ] . toImage ( cs ) ) ;
d c a r d s [ 1 ] . s e t I c o n ( c h a n d [ 1 ] . t o B a c k I m a g e ( bk ) ) ;
f u n d s L a b e l . s e t T e x t ( r u l e s . g e t P l a y e r F u n d s ( r u l e s . PLAYER ) . t o S t r i n g ( ) ) ;
repaint ();
118
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
APPENDIX D. FINAL SYSTEM
}
/∗ ∗
∗ C l e a r s t h e GUI
∗ @override c l e a r
∗/
public void c l e a r ( ) {
System . o u t . p r i n t l n ( ) ;
p l a y e r P a n e l . removeAll ( ) ;
d e a l e r P a n e l . removeAll ( ) ;
f o r ( i n t i = 0 ; i < 5 ; i ++){
p c a r d s [ i ] = new J L a b e l ( ) ;
d c a r d s [ i ] = new J L a b e l ( ) ;
p l a y e r P a n e l . add ( p c a r d s [ i ] ) ;
d e a l e r P a n e l . add ( d c a r d s [ i ] ) ;
}
p l a y e r P a n e l . add ( f u n d s L a b e l ) ;
betEnabled ( true ) ;
hitEnabled ( false );
standEnabled ( false ) ;
repaint ();
}
/∗ ∗
∗ G e t s t h e amount b e t f r o m t h e b e t F i e l d .
∗
∗ @return b e t
∗/
public int getBet (){
return ( I n t e g e r . p a r s e I n t ( b e t F i e l d . getText ( ) ) ) ;
}
/∗ ∗
∗ Enables or d i s a b l e s t h e h i t b u t t o n
∗
∗ @boolean h i t b u t t o n s t a t e
∗/
p u b l i c v o i d h i t E n a b l e d ( b o o l e a n en ) {
h i t . s e t E n a b l e d ( en ) ;
}
/∗ ∗
∗ Enables or d i s a b l e s t h e s t a n d b u t t o n
∗
∗ @boolean s t a n d b u t t o n s t a t e
∗/
p u b l i c v o i d s t a n d E n a b l e d ( b o o l e a n en ) {
s t a n d . s e t E n a b l e d ( en ) ;
}
D.1. SOURCE CODE
258
259
260
261
262
263
264
265
266
267
/∗ ∗
∗ Enables or d i s a b l e s t h e b e t b u t t o n
∗
∗ @boolean b e t b u t t o n s t a t e
∗/
p u b l i c v o i d b e t E n a b l e d ( b o o l e a n en ) {
b e t . s e t E n a b l e d ( en ) ;
}
}
D.1.8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
119
BJRules
/ / a d d i n g B J R u l e s t o SKCards
package SKCards ;
/∗
∗ i m p o r t i n g t he swing c l a s s e s .
∗/
import j a v a x . s w i n g . ∗ ;
/∗ ∗
∗ The r u l e s o f t h e B l a c k j a c k game .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @SK . bug none
∗/
public c l a s s BJRules extends Rules {
p u b l i c s t a t i c f i n a l i n t PLAYER = 0 ;
p u b l i c s t a t i c f i n a l i n t DEALER = 1 ;
p r i v a t e boolean play ;
int currentCard , pscore , dscore , bet ;
p u b l i c B J R u l e s ( S t r i n g i d , i n t c a s h , Pack s e t P a c k , S t r i n g bck , S t r i n g cs ,
SKMain m) {
main = m;
u i = new BJUI ( t h i s ) ;
pack = s e t P a c k ;
p l a y e r s = new P l a y e r [ 2 ] ;
p l a y e r s [PLAYER] = new B J P l a y e r ( i d , c a s h , bck , cs , t h i s ) ;
p l a y e r s [DEALER] = new B J R L P l a y e r ( t h i s ) ;
play = true ;
}
120
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
APPENDIX D. FINAL SYSTEM
p u b l i c v o i d newGame ( ) {
clear ();
( ( BJUI ) u i ) . h i t E n a b l e d ( t r u e ) ;
( ( BJUI ) u i ) . s t a n d E n a b l e d ( t r u e ) ;
( ( BJUI ) u i ) . b e t E n a b l e d ( f a l s e ) ;
p l a y e r s [PLAYER ] . newHand ( ) ;
p l a y e r s [DEALER ] . newHand ( ) ;
( ( B J P l a y e r ) ( p l a y e r s [PLAYER ] ) ) . h i t ( 0 , p a c k .
p s c o r e = ( ( B J P l a y e r ) ( p l a y e r s [PLAYER ] ) ) . h i t
( ( B J P l a y e r ) ( p l a y e r s [DEALER ] ) ) . h i t ( 0 , p a c k .
d s c o r e = ( ( B J P l a y e r ) ( p l a y e r s [DEALER ] ) ) . h i t
currentCard = 2;
display ( ) ;
nextCard ( ) ) ;
( 1 , pack . nextCard ( ) ) ;
nextCard ( ) ) ;
( 1 , pack . nextCard ( ) ) ;
}
p u b l i c v o i d endGame ( ) {
Card [ ] phand = ( ( B J P l a y e r ) p l a y e r s [PLAYER ] ) . g e t H a n d ( ) ;
Card [ ] c h a n d = ( ( B J P l a y e r ) p l a y e r s [DEALER ] ) . g e t H a n d ( ) ;
f o r ( i n t i = 0 ; i < phand . l e n g t h ; i ++){
i f ( phand [ i ] ! = n u l l ) {
System . o u t . p r i n t ( phand [ i ] . t o S t r i n g ( ) ) ;
}
}
System . o u t . p r i n t l n ( ” PLAYER ” + p s c o r e ) ;
f o r ( i n t i = 0 ; i < c h a n d . l e n g t h ; i ++){
i f ( chand [ i ] != n u l l ){
System . o u t . p r i n t ( c h a n d [ i ] . t o S t r i n g ( ) ) ;
}
}
System . o u t . p r i n t l n ( ” DEALER ” + d s c o r e ) ;
i f ( p s c o r e > 21 && d s c o r e <= 2 1 ) {
JOptionPane . showMessageDialog ( null , ” P l a y e r i s b u s t ! ” ) ;
( ( B J R L P l a y e r ) ( p l a y e r s [DEALER ] ) ) . r e w a r d ( 1 0 ) ;
}
e l s e i f ( d s c o r e > 21 && p s c o r e <= 2 1 ) {
JOptionPane . showMessageDialog ( null , ” De al e r i s b u s t ! ” ) ;
p l a y e r s [PLAYER ] . s e t F u n d s ( b e t ∗ 2 ) ;
( ( B J R L P l a y e r ) ( p l a y e r s [DEALER ] ) ) . r e w a r d ( − 1 0 ) ;
}
e l s e i f ( p s c o r e > 21 && d s c o r e > 2 1 ) {
J O p t i o n P a n e . s h o w M e s s a g e D i a l o g ( n u l l , ” Both b u s t ! ” ) ;
p l a y e r s [PLAYER ] . s e t F u n d s ( b e t ) ;
( ( B J R L P l a y e r ) ( p l a y e r s [DEALER ] ) ) . r e w a r d ( − 1 0 ) ;
}
e l s e i f ( pscore > dscore ){
JOptionPane . showMessageDialog ( null , ” P l a y e r wins ! ” ) ;
p l a y e r s [PLAYER ] . s e t F u n d s ( b e t ∗ 2 ) ;
( ( B J R L P l a y e r ) ( p l a y e r s [DEALER ] ) ) . r e w a r d ( − 1 0 ) ;
}
D.1. SOURCE CODE
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
121
e l s e i f ( p s c o r e == d s c o r e ) {
J O p t i o n P a n e . s h o w M e s s a g e D i a l o g ( n u l l , ” Push ! ” ) ;
p l a y e r s [PLAYER ] . s e t F u n d s ( b e t ) ;
( ( B J R L P l a y e r ) ( p l a y e r s [DEALER ] ) ) . r e w a r d ( 5 ) ;
}
else{
JOptionPane . showMessageDialog ( null , ” P l a y e r l o s e s ! ” ) ;
}
clear ();
play = true ;
}
public void h i t ( ) {
i f ( play ){
p s c o r e = ( ( B J P l a y e r ) ( p l a y e r s [PLAYER ] ) ) . h i t ( c u r r e n t C a r d , p a c k . n e x t C a r d ( ) ) ;
c u r r e n t C a r d ++;
i f ( p s c o r e > 21 | | c u r r e n t C a r d == 5 ) {
display ( ) ;
stand ( ) ;
}
}
else{
d s c o r e = ( ( B J P l a y e r ) ( p l a y e r s [DEALER ] ) ) . h i t ( c u r r e n t C a r d , p a c k . n e x t C a r d ( ) ) ;
c u r r e n t C a r d ++;
i f ( p s c o r e > 21 | | c u r r e n t C a r d == 5 ) {
display ( ) ;
stand ( ) ;
}
}
display ( ) ;
}
public void s t a n d ( ) {
System . o u t . p r i n t l n ( p l a y ) ;
i f ( play ){
play = f a l s e ;
display ( ) ;
currentCard = 2;
while ( ! play ){
( ( B J R L P l a y e r ) p l a y e r s [DEALER ] ) . l e a r n ( ) ;
}
}
else{
play = true ;
endGame ( ) ;
}
display ( ) ;
}
p u b l i c v o i d b e t ( i n t amt ) {
i f ( amt <= p l a y e r s [PLAYER ] . g e t F u n d s ( ) ) {
b e t = amt ;
p l a y e r s [PLAYER ] . s e t F u n d s (−1 ∗ b e t ) ;
122
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
APPENDIX D. FINAL SYSTEM
}
else{
J O p t i o n P a n e . s h o w M e s s a g e D i a l o g ( n u l l , ”INSUFFICENT FUNDS ! ! ! ” ) ;
}
newGame ( ) ;
display ( ) ;
}
p u b l i c i n t g e t B J P o i n t s ( Card c ) {
int points = c . getPoints ( ) ;
i f ( p o i n t s == 10 | | p o i n t s == 11 | | p o i n t s == 12 | | p o i n t s == 1 3 ) {
return 10;
}
else{
return p o i n t s ;
}
}
public boolean g e t P l a y ( ) {
return play ;
}
p u b l i c Card [ ] g e t P l a y e r H a n d ( i n t p ) {
return ( ( B J P l a y e r ) p l a y e r s [ p ] ) . getHand ( ) ;
}
}
D.1.9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
RLPlayer
/ / a d d i n g B J R L P l a y e r t o SKCards
package SKCards ;
/∗
∗ i m p o r t i n g t h e IO c l a s s e s
∗/
import j a v a . i o . ∗ ;
/∗ ∗
∗ The RL P l a y e r f o r t h e B l a c k j a c k game .
∗
∗ @author SK
∗ @version 2 . 0A
∗ @since 2 . 0 A
∗ @SK . bug none
∗/
p u b l i c c l a s s BJRLPlayer extends B J P l a y e r {
p r i v a t e f i n a l d o u b l e ALPHA = 0 . 6 ;
p r i v a t e f i n a l d o u b l e GAMMA = 0 . 7 5 ;
p r i v a t e f i n a l d o u b l e TAU = 0 . 5 ;
D.1. SOURCE CODE
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
123
p r i v a t e s t a t i c f i n a l i n t HIT = 1 ;
p r i v a t e s t a t i c f i n a l i n t STAND = 0 ;
p r i v a t e s t a t i c f i n a l i n t DEAL = −1;
p r i v a t e double [ ] [ ] v a l ;
private int lastaxn , l a s t s c o r e ;
/∗ ∗
∗ S o l e c o n s t r u c t o r . T a k e s i n t h e r u l e s and p a s s e s i t t o
∗ super ( r ) .
∗
∗ @param r t h e r u l e s
∗/
p u b l i c BJRLPlayer ( BJRules r ){
super ( r ) ;
v a l = new d o u b l e [ 2 ] [ 2 1 ] ;
try {
B u f f e r e d R e a d e r b r = new B u f f e r e d R e a d e r ( new F i l e R e a d e r ( ” r l v a l s . t x t ” ) ) ;
S t r i n g data = br . readLine ( ) ;
int i = 0;
w h i l e ( d a t a ! = n u l l && i < v a l [ 0 ] . l e n g t h ) {
System . o u t . p r i n t l n ( d a t a ) ;
S t r i n g [ ] ds = d a t a . s p l i t ( ” , ” ) ;
v a l [STAND ] [ i ] = Double . p a r s e D o u b l e ( d s [ 0 ] ) ;
v a l [ HIT ] [ i ] = Double . p a r s e D o u b l e ( d s [ 1 ] ) ;
data = br . readLine ( ) ;
i ++;
}
}
c a t c h ( E x c e p t i o n ex ) {
ex . p r i n t S t a c k T r a c e ( ) ;
}
}
/∗ ∗
∗ S t a r t s a new hand .
∗/
p u b l i c v o i d newHand ( ) {
l a s t a x n = DEAL ;
s u p e r . newHand ( ) ;
}
/∗ ∗
124
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
APPENDIX D. FINAL SYSTEM
∗ Plays a greedy p o l i c y , but u p d a t e s t h e v a l u e s as i t p l a y s
∗/
public void l e a r n ( ) {
i f ( s c o r e > 21){
return ;
}
System . o u t . p r i n t l n ( ” HIT = ” + v a l [ HIT ] [ s c o r e − 1 ] ) ;
System . o u t . p r i n t l n ( ”STAND = ” + v a l [STAND ] [ s c o r e − 1 ] ) ;
d o u b l e maxval = Math . max ( v a l [STAND ] [ s c o r e −1] , v a l [ HIT ] [ s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = ALPHA ∗ ( (GAMMA ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
lastscore = score ;
i f ( v a l [ HIT ] [ s c o r e −1] > v a l [STAND ] [ s c o r e −1]){
rules . hit ();
l a s t a x n = HIT ;
System . o u t . p r i n t l n ( ” HIT ” ) ;
}
else{
rules . stand ( ) ;
l a s t a x n = STAND ;
System . o u t . p r i n t l n ( ”STAND” ) ;
}
}
/∗ ∗
∗ A l l o c a t e s t h e f i n a l reward t o t h e agent .
∗
∗ @param r t h e r e w a r d
∗/
p u b l i c void reward ( double r ){
d o u b l e maxval = Math . max ( v a l [STAND ] [ l a s t s c o r e −1] , v a l [ HIT ] [ l a s t s c o r e − 1 ] ) ;
i f ( l a s t a x n ! = DEAL) {
d o u b l e r d = ALPHA ∗ ( r + (GAMMA ∗ maxval ) − v a l [ l a s t a x n ] [ l a s t s c o r e − 1 ] ) ;
v a l [ l a s t a x n ] [ l a s t s c o r e −1] += r d ;
}
}
}
Appendix E
Sample Blackjack Game
The following notation is used in the sample game:
[] - a face down card
— - Separation of two hands
Cards - Face values - 2 -9 ⇒ 2 - 9, T ⇒ 10, J ⇒ Jack, Q ⇒ Queen, K ⇒ King
Suits - D - Diamonds, C - Clubs, H - Hearts, S - Spades
e.g. KS ⇒ King of Spades, 2D ⇒ 2 of Diamonds, TH ⇒ 10 of Hearts
Dealer AC []
Bet
Player [] []
Player bets $10, and takes an insurance
Dealer AC []
Bet $10 $10
Player AD AH
Player splits
Dealer AC []
Bet $10 $10 $10
Player AD QS | AH AS
(BLACKJACK)
Player stands on hand 1, splits hand 2
Dealer AC []
Bet $10 $10 $10 $10
Player AD QS | AH 2C | AS 4S
Player hits hand 2
Dealer AC []
Bet $10 $10 $10 $10
Player AD QS | AH 2C 4D | AS 4S
Player hits hand 2
Dealer AC []
Bet $10 $10 $10 $10
Player AD QS | AH 2C 4D KD| AS 4S
(Ace now counts ‘‘low" (1))
Player hits hand 2
Dealer AC []
Bet $10 $10 $10 $10
Player AD QS | AH 2C 4D KD 5S | AS 4S
(BUST!)
Player doubles down on hand 3 (counting ace ‘‘high")
125
126
APPENDIX E. SAMPLE BLACKJACK GAME
Dealer AC []
Bet $10 $10 $10 $20
Player AD QS | AH 2C 4D KD 5S | AS 4S 6C
Player stands on hand 3, dealer plays by turning over his second card
Dealer AC JC (BLACKJACK!)
Bet $10 $10 $10 $20
Player AD QS | AH 2C 4D KD 5S | AS 4S 6C
The dealer has hit a Blackjack, so the player receives their $10. The player loses the $10 from hand
2 because they went bust. As for hands 1 and 3, the dealer and player have tied. However some casinos
enforce the rule the “House wins a push”. Under this rule, the player would also lose the $30. If the house
treats pushes normally, then the player will get his $30 back. Although there are many different variations
on the rules, known as “house rule”, they are all well known and accepted by the players.
Bibliography
Parlett, D. (1994). Teach Yourself Card Games. Hodder Headline Plc, Euston Road, London, NW1 3BH.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge,
MA, USA.
127

Download Report

Reinforcement Learning for Blackjack

Paperzz.com

Your Paperzz