Report – Hw3 Question 2 Problem Description: Develop tic-tac-toe player using genetic programming. The player is a scheme program, receives the state of the board as an argument, and returns a number € [0..8] which is location in the board the player chooses to mark (see figure 1). 0|1|2 3|4|5 6|7|8 Figure 1: Game Board Indexes Individuals: Each individual represents a Tic Tac Toe strategy. The genome we used is a scheme program, built as a tree. Terminal Set: 1. MY_SYMBOL: The symbol the player plays: X or O. 2. OTHER_SYMBOL: The other symbol, X or O. 3. RANDOM_FREE_SPACE: returns a random index of empty space on the board. 4. WIN: check the board if there is a line or a column with 2 of my symbol. If the third place it empty returns it. Else returns -1. 5. BLOCK: check the board if there is a line or a column with 2 of the opponent symbol and returns the third place. Else returns -1. Function Set (no. of children for node): 1. IFLTE (4): if arg0 <= arg1, returns arg2, else returns arg3. 2. CHECK_LINE, CHECK_COL (1): Returns the location [0..8] of the first empty space in row\column no. arg0. If row\column is full, returns -1. 3. MINE_IN_LINE, MINE_IN_COL (1): Returns how many of my symbol is in row\column no. arg0. Evaluator: GP Tic Tac Toe Evaluator. This evaluator calculates fitness for each of the individuals in the population. It simulates 100 games of Tic Tac Toe between the individual strategy and different opponents, as will be described later in the experiments below. Each win adds +1 to the fitness evaluation, lose -1, tie 0, and a penalty of -10 to an illegal action of the individual, for example, trying to mark space which is already marked. The evaluator includes a partial Scheme interpreter, which defines the functions needed by the players. Also, it provides a platform to run the game, a board and a mechanism to control the game flow and fixing any illegal number received to be in the range of [0..8]. Notice: it doesn’t deals with all illegal operations. If it receives a result which is already marked it exits the game with a penalty of -10 points. Selection: We used Tournament Selection: each time 10 individuals are randomly picked, and the one with best fitness is chosen. Population size: 10,000. The population is initialized with random 5-levels full trees. Crossover: Exchange sub-trees crossover with Elites. The operator randomly chooses a node at each tree, and replaces the sub-trees rooted in those nodes. The two fittest individuals in the population are preserved untouched. Pc = 0.8 In order to prevent the trees from growing too big, if the size of a tree exceeds 300 nodes, it is removed from the population and replaced with new random 5-levels tree. Mutation: Random sub-tree replace mutator with Elites. Randomly picks a node in the tree, and replaces it with new randomly generated terminal. The two fittest individuals in the population are preserved untouched. Pm = 0.4 Generations: 51 Experiment A: basic functions and terminals We try to develop a player without the special functions mentioned above, but only with few basic functions and terminals, not directly related to the game. Terminals: 0, 1, 2, 3: constants. X1...X8: Xi returns the value of the board at index i. Functions: +, -, IFLTE, +3, -3 Generations: 50 Population size: 10,000 Number of runs: 10 We have run the experiment 10 times and in each time the evolution doesn’t succeed to create a legal player strategy (i.e. the player always tries to mark a place which is already marked). Experiment B: complex functions and terminals Terminals: RANDOM_FREE_SPACE, MY_SYMBOL, OTHER_SYMBOL, 0, 1, 2, -1, WIN, BLOCK Functions: IFLTE, CHECK_LINE, CHECK_COL, MINE_IN_LINE, MINE_IN_COL The main idea was to help the evolution by being more specific to the current problem. We can see as showed above that this time the algorithm creates a legal player strategy and improves as the generations grow. Experiment C: 3-stages Evaluator In this experiment we have created another two basic players to help train the population. 1. The first one is the random player based on the RANDOM_FREE_SPACE function. Its code is: (RANDOM_FREE_SPACE) 2. The second one is called "good player" and he implements a little better strategy Its code is: (IFLTE (WIN) (-1) (IFLTE (BLOCK) (-1) (RANDOM_FREE_SPACE) (BLOCK)) (WIN)) The basic idea: try to win. If you can, then win. Else try to block, is you can't, choose a random place. After we created these two players we had three stages of evaluation: 1. In generations 0 to 15: Play 100 times against the random player. 2. In generations 16 to 40: Play 100 times against the good player. 3. In generations 41 to 50: Choose randomly 100 players in the population and play against them. The concept was to train the individuals against different styles of opponents, to diverse their strategy. We can see that according to the evolution strategy we used the fitness in generation 16 significantly decreases as we change opponent to the good player, and then transform its game strategy to match it. We can see that against the random player, the results are little bit better, and against the good player the results in Experiment C are much better: We assume that this is because Experiments C has training against the good player, so the population designs strategies against it. Conclusions and observations: 1. In the current GP configuration we have chosen, we can see that the evolution doesn't help if we don't help it in the form of being specific to the problem, i.e. adding nodes with relation to the specific game rules. 2. It's hard for the evolution to create new functions and complex strategies like planning ahead. But if we give these functions in advance (as for win and block), the evolution succeed in using them and besting the players which using them. 3. In order to achieve high selection pressure, we set the tournament selection constant to be 10. This is to ensure the worst individuals in population won't survive. Generally, the worst individuals are those who make illegal moves. 4. As we can see from Experiment A and B, adding relevant functions (WIN, BLOCK, MINE_IN_ROW\COL, etc.) had crucial effect on the convergence of the population. Without them even the best individual in the population could not learn to avoid making illegal moves.
© Copyright 2026 Paperzz