ChooseMove V = = 16 19 30 points 16 19, or 11 15, or LegalMoves SimulateMove = , …. 16 19 = 12 16, or For all legal moves: - simulate that move - check the value of the result Pick the best move = LegalMoves 11 15, or …. SimulateMove V 12 16 = = 30 points We’ll make our computer program learn the function V - so when it sees a board, it will assign a score to the board - it has to learn how to assign the correct score to a board First, we have to define how our function V is supposed to behave… …but what about boards in the middle of a game? b‘ …. b successor( b ) …. V(b’)=100 …. …but what about boards in the middle of a game? b‘ …. b successor( b ) …. …. V(b)=V(b’)=100 …. V(b)=V(b’)=100 V(b’)=100 NB: if this board state occurs in 50% wins and 50% loses then V(b) will eventually be 0 V(b’)=-100 b‘ …. b successor( b ) …. …. V(b)=V(b’)=0 …. V(b)=V(b’)=100 V(b’)=100 NB: if this board state occurs in 75% wins and 25% loses then V(b) will eventually be 50 V(b’)=-100 b‘ …. b successor( b ) …. …. V(b)=V(b’)=0 …. V(b)=V(b’)=100 V(b’)=100 so each board score will eventually be a mix of the number of games that you can win, lose or draw from that board (assuming an ideal scoring function…in fact we often only approximate this) V(b’)=-100 b‘ …. b successor( b ) …. …. V(b)=V(b’)=0 …. V(b)=V(b’)=100 V(b’)=100 …but what about boards in the middle of a game? …. b that’s our definition of V – this is how b‘ our computer program’s V should learn how to behave… successor( b ) …. …. V(b)=V(b’)=100 …. V(b)=V(b’)=100 V(b’)=100 But in the middle of a game, how can you calculate the path to the final board? - generate all possibilities? tic-tac-toe chess …takes too long! - approximate V and call it V’ (Vhat) V’(b) V(b) …so we’ll learn an approximation to V b‘ …. b successor( b ) …. …. V(b)=V(b’)=100 …. V(b)=V(b’)=100 V(b’)=100 Now we’ll choose our representation of V’ (what is a board anyway…?) 1) big table? input: board score 50 90 …. Now we’ll choose our representation of V’ (what is a board anyway…?) 2) CLIPS rules? IF my piece is near a side edge THEN score = 80 IF my piece is near the opponents edge THEN score = 90 …. Now we’ll choose our representation of V’ (what is a board anyway…?) 3) polynomial function? we define some variables… X1 = number of white pieces on board X2 = number of red pieces X3 = number of white kings X4 = number of red kings X5 = number of white pieces threatened by red (can be captured on red’s next turn) X6 = number of red pieces threatened by white e.g. X1 = 12 X3 = 0 X5 = 1 X2 = 11 X4 = 0 X6 = 0 Now we’ll choose our representation of V’ (what is a board anyway…?) 3) polynomial function? X1 = number of white pieces on board X2 = number of red pieces X3 = number of white kings X4 = number of red kings X5 = number of white pieces threatened by red (can be captured on red’s next turn) X6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) Now we’ll choose our representation of V’ (what is a board anyway…?) 3) polynomial function? X1 = number of white pieces on board X2 = number of red pieces X3 = number of white kings X4 = number of red kings XComputer of white pieces threatened red (can be will change the by values of the 5 = number program captured weights on – itred’s will next learnturn) what the weights should be to correct score for each board Xgive 6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) V’(b) = 23 + 0.5 x1 +… board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves 12 16, or = = LegalMoves 11 15, or For example, one of the boards resulting from a legal move might be …. b = <x1=12, x2=12,…,x6=1> V’(<x1=12, x2=12,…,x6=1>) = 23 + 0.5 x1 +… SimulateMove V 12 16 = = == 30 points = 30 board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves 12 16, or = 11or 15, = …. LegalMoves SimulateMove V 12 16 = = == 30 points First time round V’ is essentially random (because we set the weights as random) – as it learns V’ should pick better successors board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error At this point board 1 = b board 2 = successor(b) red:1216 board 2 board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error red:1216 board 2 board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error error weight to update learning rate modify weight in proportion to size of attribute value board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score At this point the change probably won’t be in a useful (c)direction, evaluate error because the weights are all still (d) modify each weight to correct error random error weight to update learning rate modify weight in proportion to size of attribute value board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game …. red wins board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game …. b is a final board state Vtrain(b) = 100 …because we won red wins board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game now we’re shifting our V’ towards something useful (a little bit) …. red wins board 1 learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w0=23 etc.) start a new game - for each board red:1216 board 2 (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game …. 1 million games later and our V’ might now predict a useful score for any board that we might see red wins
© Copyright 2026 Paperzz