Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht August 20 2001 6th Computer Olympiad 1 Contents • • • • • OM search and PrOM search Learning for PrOM search Off-line Learning On-line Learning Conclusions & Future research August 20 2001 6th Computer Olympiad 2 OM search – MAX player uses evaluation function V0 – Opponent uses different evaluation function (Vop) – At MIN nodes: predict which move the opponent will select (using standard search and Vop) – At MAX nodes, pick the move that maximizes the search value (based on V0) – At leaf nodes: use V0 August 20 2001 6th Computer Olympiad 3 PrOM search • Extended Opponent Model: – a set of opponent types (e.g. evaluation functions) – a probability distribution over this set • Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type. August 20 2001 6th Computer Olympiad 4 PrOM search algorithm • At MIN nodes: determine for every opponent type which move would be selected. • Compute the MAX player’s value for these moves • Use opponent-type probabilities to compute the expected value of the MIN node • at MAX nodes: select maximum child August 20 2001 6th Computer Olympiad 5 Learning in PrOM search • How do we assess the probabilities on the opponent types? – Off line: use games previously played by the opponent, to estimate the probabilities. (lot of time and - possibly - data available) – On line: use the observed moves during a game to adjust the probabilities. (only little time and few observations) prior probabilities are needed. August 20 2001 6th Computer Olympiad 6 Off-Line Learning • Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent. • Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best. August 20 2001 6th Computer Olympiad 7 Off-Line Learning • How to obtain P*(opp)? • Input: a set of positions and the moves that the given opponent and all the given opponent types would select • “Algorithm”: P*(oppi) = Ni / N • But: leave out all ambiguous positions! (e.g. when more than one opponent type agree with the opponent) August 20 2001 6th Computer Olympiad 8 Off-Line Learning • Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types – Effective learning is possible (P*(opp) P# (opp)) – More difficult if the opponent types are not independent August 20 2001 6th Computer Olympiad 9 Not leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 100 - 100,000 runs 100 samples August 20 2001 6th Computer Olympiad 10 Leaving out ambiguous events 5 opponent types P = (a,b,b,b,b) 20 moves 10 - 100,000 runs 100 samples August 20 2001 6th Computer Olympiad 11 Varying number of opponent types 2-20 opponent types P = (a,b,b,b,b) 20 moves 100,000 runs 100 samples August 20 2001 6th Computer Olympiad 12 Off-Line Learning • Case 2: The opponent is using a different strategy. – Opponent types behave random but dependent (distribution of type i depends on type i-1) – Real opponent selects a fixed move August 20 2001 6th Computer Olympiad 13 100% 90% 80% 70% opp4 60% opp3 50% opp2 40% opp1 opp0 30% Learning error 20% Learned probabilities -Log(error) Opponent's selection 18 16 14 12 10 8 6 5 4 0% 2 6 0 10% 10^1 4 10^2 3 10^3 10^4 2 10^5 1 18 16 14 12 10 8 6 4 2 0 0 Opponent's selection August 20 2001 6th Computer Olympiad 14 Fast On-Line Learning • At the principal MIN node, only the best moves for every opponent type are needed • Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities. • Drift to one opponent type is possible. August 20 2001 6th Computer Olympiad 15 Slower On-Line Learning Naive Bayesian (Duda & Hart’73) • Compute the value of every move at the principal MIN node for every opponent type • Transform these values into conditional probabilities P(move | opp). • Compute P(opp | moveobs) using P*(opp) (Bayes rule) • take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs) August 20 2001 6th Computer Olympiad 16 Naïve Bayesian Learning • In the end, drifting to 1-0 probabilities will occur almost always • Parameter a is very important for the actual performance: – amount of change in the probabilities – convergence – drifting speed • It should be tuned in a real setting August 20 2001 6th Computer Olympiad 17 Conclusions • Effective off-line learning of probabilities is possible, when ambiguous events are disregarded. • Off-line learning also works if the opponent does not use a mixed strategy of known opponent types. • On-line learning must be tuned precisely to a given situation August 20 2001 6th Computer Olympiad 18 Future Research • PrOM search and learning in real game playing – Zanzibar Bao (8x4 mancala) – LOA (some experiment with OM-search done) – Chess endgames August 20 2001 6th Computer Olympiad 19
© Copyright 2026 Paperzz