A Tic Tac Toe Learning Machine Involving the

A Tic Tac Toe Learning Machine Involving the Automatic
Generation and Application of Heuristics
Scott Doherty
SUNY Oswego
Abstract
Machine learning is a field of exciting prospects. While the current author doubts the end of seeking to make machines “think” (Turing 1950), I recognize that many great things can be discovered about information and the way we think, as well as produce very practical and efficient methods of doing work and research. The tic tac toe heuristic learning machine is an educational experiment in machine learning in the rule­based paradigm. It aims to produce the result of an heuristic player seeing increased success in playing tic tac toe as measured by greater proportion of wins to losses than were obtained without a rule base.
Keywords: Machine learning; rule bases; game playing agents.
1 Introduction
The tic tac toe heuristic learning machine project was an attempt at learning the basics of rule­
based machine learning (see Freeman­Hargis). The project was implemented using common LISP in an object oriented way. The heuristic player was allowed to play some number of games with a random player through which it acquired a list of the games in which it won. After this learning period it was allowed again to play against a random player, but this time it checked the current game with past winning games and made moves consistent with the winning games.
For both sets of play, the training and the testing, statistics were gathered as to the proportion of wins, losses and draws. The statistics were then compared to indicate if the machine had indeed learned. Learning was indicated by an increase in the percentage of wins to losses and draws. The heuristic that was applied to the rule base was simply one of similarity of sequence. If the current play matches the beginning sequence of a past winning game, then use the next move in the winning game. The assumption is that we will be more likely to get a winning game if we try to repeat past winning games. There was no sorting applied to the set of rules the player had. Therefore whichever winning game was encountered first was put first in the rule base. Likewise when applying the rules, whichever rule was found to match the current play first was the rule that was chosen.
2 Related Research
There seem to be a number of different approaches to game playing AI. On one end of the spectrum is the “brute force” approach calculates every possible move and gets the best one that it finds. On the other end are more ad­hoc, random, make your best guest fast approaches. Each tries to address the great issue of computational complexity; exactness versus time and space. Since any game represents a multiplicity of possible moves each permutation of which will lead to a different outcome of the game, interest in exactness seeks the best possible move out of all possible moves. It aims at if possible to find the perfect game strategy by which one could win invariably. Since this requires looking at a lot of possible moves and their respective outcomes, it can take a lot of time. Moreover it can use a lot of resources in order to hold in memory all the data that is being processed.
The other approaches are born it seems only out of this fact. For, if the brute force method took no time, it would be the only one used seeing it produces the best possible move. Since you can't get better than that, that is what we would aim for. Because, however, things take time and data takes up space and both are limited, we seek to find better ways to find good ways of doing things.
One research project that took the brute force method created a tree of all possible moves and applied “minimax searches that include alpha­beta pruning, no alpha­beta pruning, move ordering, and heuristics that attempt to maximize the computer scores and minimize the opponents.” [Hernandez] The game utilizes varying approaches such that ”When the computer makes a move, this function calls Best_No_Pruning, Best_No_ordering, Best_Move, or Heuristic_Move based upon the argument search and after the move is made displays statistic information.” (Hernandez 1999) Hernandez notes that “this is a very computer intensive search algorithm.” (see Wilson 2008 for definitions)
Another approach which sought to create a generalized autonomous game player which could play any game that was given to it used a more heuristic approach and was even pitted against other agents using an exhaustive search method.(Kuhlmann 2006) The heuristic machine performed better than the exhaustive machines on all but one game.
3 Approach
As noted above in the introduction, this project was implemented using common lisp and the CLOS object oriented paradigm (Steele 1990). Classes were created representing a human player, random player and heuristic learning player. The human player was used to test the heuristic player to see if it was following its heuristics. The main focus of the project was to obtain insight into how a machine may learn. The rule­based paradigm was chosen for this AI agent. The agent would engage in a learning period where it played against a random player and collected winning games from the results.
The way this was implemented was by storing the lisp list for the winning game in a list of lists. I chose this representation for its simplicity and flexibility. This abstracted the knowledge of choice out of the rule base and placed it at a higher level. Thus the rule base was more of memorized data and the choose heuristic play method was the intelligence. However lisp allows for programs to also be data and this is how the teacher and other students implemented it.
4 Knowledge Representation
I will answer this question in terms of an excellent definition given by Randall Davis(Davis 1993)
•
•
A Knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it. We tried to represent the world of the game through a map of played games so that the agent would be able to look at formerly successful games in order to know how to win
without having to fail in “real life” to get the knowledge of winning games. However there was a training period in which it was allowed to gather winning games from a trial and error approach.
It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? The world model or ontology that was chosen was simple. We had classes representing the different kind of players as well as structures which represented a play of the game •
•
•
and the board game itself. The world consisted simply in the tic tac toe game and its parts.
It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends. i. The conception of intelligent thought here assumed was a set of if thens such that if a move met our desired end, namely winning we would choose it.
ii. The set of inferences that the machine was allowed to gather were only those in which the moves by X produced a winning game. Or else if no game it had learned matched the current play, it would choose a move at random.
iii. The set of inferences or if­thens were what we gathered during the learning stages of the game in order to build up our rule base.
It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. Computers as finite state machines are ideal for representing the structures involved in games and computing successful game plays.
It is a medium of human expression, i.e., a language in which we say things about the world. The approach here chose a rather simple structure for the world of tic tac toe that represented the game, its players and its parts.
The knowledge in this case that is represented is “that which would be a good move to play in order to win”. This knowledge is stored as a list of winning games each of which consist of a list of moves that happened in a particular game where X wins. The functions which make up the decision making process could also be considered as an abstraction of knowledge, thus the program itself is part of the representation of the knowledge of how to play tic tac toe.
Hernandez took a different approach. He did not use any object orientation or classes and only used a sheer algorithmic approach where he chose the best search method based on some conditions.
( Hernandez code)
5 Program Abstractions
COMMENTS HIGHLIGHTED
; IS THE GIVEN RULE APPLICABLE
;this method recieves a winning play of the game list which it will compare with the play so far
(defmethod applicablep ((rule list) &Aux len the­play dif)
(setf len (list­length *play­so­far*))
(setf dif (­ 9 len)) ;(format t "~%Checking rule: ~A~%" rule) (setf the­play (butlast rule dif));get the part of a rule that is the same as *play­so­far*
(matches *play­so­far* the­play)
)
;METHOD TO SELECT A MOVE FROM THE RULE BASE
(defmethod select­from­rule­base ((p heuristic­machine­player) &Aux rule­base rule)
(setf rule­base (heuristic­machine­player­rules p));get the h­players rule base
;(format t "~%rule­base: ~A~%" rule­base)
(dolist (rule rule­base);iterate over the list of stored rules checking for applicability
(cond
((eq *play­so­far* nil)
; if there is no play return nil
(return­from select­from­rule­base nil)
)
((applicablep rule);check if the rule in the list is applicable
(return­from select­from­rule­base (apply­rule rule)); if it is applicable break out of loop and use rule
)
)
)
nil
)
;THE LEARNING MACHINE PLAYS A GAME AND STORES THE PLAY IF IT WINS
(defmethod random­play­and­learn ((x heuristic­learning­machine­player) (o random­
machine­player)
&Aux p result ) (setf p (random­play x o))
;create a random play with x and o
(setf result (analyze p))
;find out if x won the game
(cond
((eq result 'w)
; if x won, add this play to his rule base
(add­rule x p)
)
)
)
6 Results
100 learning 10 analyzing
(demo­hlm­vs­random 100 10 t) stats before learning = ((W 20.0) (L 50.0) (D 30.000002))
stats after learning = ((W 60.000004) (L 40.0) (D 0.0))
100 learning 100 analyzing
Gathering Statistics for 100 games.
stats before learning = ((W 55.0) (L 27.000002) (D 18.0))
Finished learning. Gathering statistics.
Gathering Statistics for 100 games.
stats after learning = ((W 67.0) (L 19.0) (D 14.0))
1,000 learning 100 analyzing
(demo­hlm­vs­random 1000 100 nil) Gathering Statistics for 100 games.
stats before learning = ((W 57.0) (L 28.0) (D 15.000001))
Finished learning. Gathering statistics.
Gathering Statistics for 100 games.
stats after learning = ((W 65.0) (L 26.0) (D 9.0))
10,000 learning 100 analyzing
(demo­hlm­vs­random 10000 100 nil) Gathering Statistics for 100 games.
stats before learning = ((W 51.0) (L 39.0) (D 10.0))
Finished learning. Gathering statistics.
Gathering Statistics for 100 games.
stats after learning = ((W 65.0) (L 26.0) (D 9.0))
100,000 learning 100 analyzing
(demo­hlm­vs­random 100000 100 nil) Gathering Statistics for 100 games.
stats before learning = ((W 64.0) (L 27.000002) (D 9.0))
Finished learning. Gathering statistics.
Gathering Statistics for 100 games.
stats after learning = ((W 74.0) (L 21.0) (D 5.0))
The first result posted 20% wins before learning. This is due, I think, to the small number of statistical samples taken, namely 10. Since the 10 samples are taken at random from 100 games, it is possible for the computer to grab a handful of losses in one instance and a handful of wins in another. Thus increasing the sample size to 100 gave the expected ~55% for X wins before learning.
In the cases with more learning and greater sample size, the agent displays significant increase in its wins after learning by acquiring a rule base from which to make future moves. 7 Discussion
The heuristic machine displayed a clear increase in the number of wins to number of losses after the training period. However, I did come upon some anomalous results such that the player actually decreased its number of wins after learning. This was probably a bug in my lisp code , but I do question if there could also possibly be instances such that in all the possible moves there may be some beginning sequences that have a predominance of losing endings for X. So, if the heuristic agent happened to use one of those rules which was at the top of its rule list, it may produce such unexpected results.
The fact that X has a statistically greater possibility of winning tic tac toe than O does was an interesting discovery for me. It is interesting to see that a game is not equal for both players, and it prompts me to search for other such inequalities in other games.
8 Future Work
This project has served to provide a tool in my toolbox which will shape the way I think about approaching programming problems. Applying machine learning techniques to solving design issues is now something which I can do. I would like to keep the tic tac toe code and refine it and go over it when I have more time to think about everything that it is doing. As I approach the “learning agent” project (Doherty 2009) to teach a machine a simple taxonomy, this project is entirely relevant and will serve as a pattern for my construction of classes and of how to add rules to an agents base.
9 Conclusion
Using a simple game of tic tac toe we learned some basic rule­based approach techniques to machine learning. The project has given important knowledge toward learning about the use and limits of artificial intelligence. While producing interesting successful results, the project has also made known some of the limitations of AI and some of the constraints that an engineer must overcome to produce machine learning. Seeing the agent learn is an exciting incentive to continue exploring the applications of AI and other techniques and paradigms of machine learning. Tic tac toe is a trivial game that allowed for focusing on the AI parts of the program. References
Hernandez, S. 1999. The Tic­Tac­Toe Game, A conversion from "C" to Lisp. Lab located at: http://www.ecst.csuchico.edu/~amk/foo/csci322/labs/examples/seberio/lab02/tic­tac­toe.html
Hernandez, S. Code 1999. The Tic­Tac­Toe Game, A conversion from "C" to Lisp. Lab located at:
1 http://www.ecst.csuchico.edu/~amk/foo/csci322/labs/examples/seberio/lab02/tic­tac­toe.cl
Kuhlmann, G.; Dresner, K.;Stone, P. 2006. Automatic Heuristic Construction in a Complete General Game Player. In Proceedings of the Twenty­First National Conference on Artificial Intelligence (AAAI­06), Boston, MA, July 2006.
Bottomley, H. 2002. How many Tic­Tac­Toe (noughts and crosses) games are possible?
http://www.btinternet.com/~se16/hgb/tictactoe.htm R. Davis, H. Shrobe, and P. Szolovits. What is a Knowledge Representation? AI Magazine, 14(1):17­33, 1993
Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433­460.
Freeman­Hargis, J. Rule­Based Systems and Identification Trees: Introduction to Rule­Based Systems
From http://ai­depot.com/Tutorial/RuleBased.html Accessed Mar 5, 2009
Doherty S, 2009. Teaching a Machine a Simple Taxonomy
From: http://www.cs.oswego.edu/~sdoherty/CSC466/ Accessed Mar 5, 2009
Wilson, B.2008. The AI Dictionary
From http://www.cse.unsw.edu.au/~billw/aidict.html Accessed Mar. 5, 2009
Steele, G. 1990 Common Lisp the Language, 2nd edition
Thinking Machines, Inc. 1990