Report – Hw3 Question 2 0 | 1 | 2 3 | 4 | 5 6 | 7 | 8 Figure 1: Game

Report – Hw3
Question 2
Problem Description:
Develop tic-tac-toe player using genetic programming. The player is a scheme program,
receives the state of the board as an argument, and returns a number € [0..8] which is
location in the board the player chooses to mark (see figure 1).

0|1|2
3|4|5
6|7|8
Figure 1:
Game Board
Indexes
Individuals: Each individual represents a Tic Tac Toe strategy. The genome we used
is a scheme program, built as a tree.
Terminal Set:
1. MY_SYMBOL: The symbol the player plays: X or O.
2. OTHER_SYMBOL: The other symbol, X or O.
3. RANDOM_FREE_SPACE: returns a random index of empty space on the board.
4. WIN: check the board if there is a line or a column with 2 of my symbol. If the third place it empty
returns it. Else returns -1.
5. BLOCK: check the board if there is a line or a column with 2 of the opponent symbol and returns the
third place. Else returns -1.
Function Set (no. of children for node):
1. IFLTE (4): if arg0 <= arg1, returns arg2, else returns arg3.
2. CHECK_LINE, CHECK_COL (1): Returns the location [0..8] of the first empty space in row\column no.
arg0. If row\column is full, returns -1.
3. MINE_IN_LINE, MINE_IN_COL (1): Returns how many of my symbol is in row\column no. arg0.

Evaluator: GP Tic Tac Toe Evaluator. This evaluator calculates fitness for each of the individuals in the
population. It simulates 100 games of Tic Tac Toe between the individual strategy and different
opponents, as will be described later in the experiments below. Each win adds +1 to the fitness
evaluation, lose -1, tie 0, and a penalty of -10 to an illegal action of the individual, for example, trying to
mark space which is already marked.
The evaluator includes a partial Scheme interpreter, which defines the functions needed by the players.
Also, it provides a platform to run the game, a board and a mechanism to control the game flow and
fixing any illegal number received to be in the range of [0..8].
Notice: it doesn’t deals with all illegal operations. If it receives a result which is already marked it exits
the game with a penalty of -10 points.

Selection: We used Tournament Selection: each time 10 individuals are randomly picked, and the one
with best fitness is chosen.

Population size: 10,000. The population is initialized with random 5-levels full trees.

Crossover: Exchange sub-trees crossover with Elites. The operator randomly chooses a node at each
tree, and replaces the sub-trees rooted in those nodes. The two fittest individuals in the population are
preserved untouched. Pc = 0.8
In order to prevent the trees from growing too big, if the size of a tree exceeds 300 nodes, it is removed
from the population and replaced with new random 5-levels tree.

Mutation: Random sub-tree replace mutator with Elites. Randomly picks a node in the tree, and replaces
it with new randomly generated terminal. The two fittest individuals in the population are preserved
untouched. Pm = 0.4

Generations: 51
Experiment A: basic functions and terminals
We try to develop a player without the special functions mentioned above, but only with few basic functions
and terminals, not directly related to the game.
Terminals: 0, 1, 2, 3: constants. X1...X8: Xi returns the value of the board at index i.
Functions: +, -, IFLTE, +3, -3
Generations: 50
Population size: 10,000
Number of runs: 10
We have run the experiment 10 times and in each time the evolution doesn’t succeed to create a legal
player strategy (i.e. the player always tries to mark a place which is already marked).
Experiment B: complex functions and terminals
Terminals: RANDOM_FREE_SPACE, MY_SYMBOL, OTHER_SYMBOL, 0, 1, 2, -1, WIN, BLOCK
Functions: IFLTE, CHECK_LINE, CHECK_COL, MINE_IN_LINE, MINE_IN_COL
The main idea was to help the evolution by being more specific to the current problem.
We can see as showed above that this time the algorithm creates a legal player strategy and improves as the
generations grow.
Experiment C: 3-stages Evaluator
In this experiment we have created another two basic players to help train the population.
1. The first one is the random player based on the RANDOM_FREE_SPACE function.
Its code is:
(RANDOM_FREE_SPACE)
2. The second one is called "good player" and he implements a little better strategy
Its code is:
(IFLTE (WIN) (-1)
(IFLTE (BLOCK) (-1)
(RANDOM_FREE_SPACE)
(BLOCK))
(WIN))
The basic idea: try to win. If you can, then win. Else try to block, is you can't, choose a random place.
After we created these two players we had three stages of evaluation:
1. In generations 0 to 15: Play 100 times against the random player.
2. In generations 16 to 40: Play 100 times against the good player.
3. In generations 41 to 50: Choose randomly 100 players in the population and play against them.
The concept was to train the individuals against different styles of opponents, to diverse their strategy.
We can see that according to the evolution strategy we used the fitness in generation 16 significantly
decreases as we change opponent to the good player, and then transform its game strategy to match it.
We can see that against the random player, the results are little bit better, and against the good player the
results in Experiment C are much better: We assume that this is because Experiments C has training against
the good player, so the population designs strategies against it.
Conclusions and observations:
1. In the current GP configuration we have chosen, we can see that the evolution doesn't help if we
don't help it in the form of being specific to the problem, i.e. adding nodes with relation to the
specific game rules.
2. It's hard for the evolution to create new functions and complex strategies like planning ahead. But if
we give these functions in advance (as for win and block), the evolution succeed in using them and
besting the players which using them.
3. In order to achieve high selection pressure, we set the tournament selection constant to be 10. This
is to ensure the worst individuals in population won't survive. Generally, the worst individuals are
those who make illegal moves.
4. As we can see from Experiment A and B, adding relevant functions (WIN, BLOCK,
MINE_IN_ROW\COL, etc.) had crucial effect on the convergence of the population. Without them
even the best individual in the population could not learn to avoid making illegal moves.