Learning Opponent-Type Probabilities for PrOM Search

Learning Opponent-type
Probabilities for PrOM search
Jeroen Donkers
IKAT Universiteit Maastricht
August 20 2001
6th Computer Olympiad
1
Contents
•
•
•
•
•
OM search and PrOM search
Learning for PrOM search
Off-line Learning
On-line Learning
Conclusions & Future research
August 20 2001
6th Computer Olympiad
2
OM search
– MAX player uses evaluation function V0
– Opponent uses different evaluation function (Vop)
– At MIN nodes: predict which move the opponent
will select (using standard search and Vop)
– At MAX nodes, pick the move that maximizes the
search value (based on V0)
– At leaf nodes: use V0
August 20 2001
6th Computer Olympiad
3
PrOM search
• Extended Opponent Model:
– a set of opponent types (e.g. evaluation functions)
– a probability distribution over this set
• Interpretation: At every move, the opponent
uses a random device to pick one of the
opponent types, and plays using the selected
type.
August 20 2001
6th Computer Olympiad
4
PrOM search algorithm
• At MIN nodes: determine for every opponent
type which move would be selected.
• Compute the MAX player’s value for these
moves
• Use opponent-type probabilities to compute
the expected value of the MIN node
• at MAX nodes: select maximum child
August 20 2001
6th Computer Olympiad
5
Learning in PrOM search
• How do we assess the probabilities on the
opponent types?
– Off line: use games previously played by the
opponent, to estimate the probabilities.
(lot of time and - possibly - data available)
– On line: use the observed moves during a game
to adjust the probabilities.
(only little time and few observations)
prior probabilities are needed.
August 20 2001
6th Computer Olympiad
6
Off-Line Learning
• Ultimate Learning Goal: find P**(opp) for a
given opponent and given opponent types
such that PrOM search plays the best against
that opponent.
• Assumption: PrOM search plays the best if
P** = P*, where P*(opp) is the mixed strategy
that predicts the moves of the opponent the
best.
August 20 2001
6th Computer Olympiad
7
Off-Line Learning
• How to obtain P*(opp)?
• Input: a set of positions and the moves that
the given opponent and all the given
opponent types would select
• “Algorithm”: P*(oppi) = Ni / N
• But: leave out all ambiguous positions!
(e.g. when more than one opponent type agree
with the opponent)
August 20 2001
6th Computer Olympiad
8
Off-Line Learning
• Case I: The opponent is using a mixed strategy
P#(opp) of the given opponent types
– Effective learning is possible (P*(opp)  P# (opp))
– More difficult if the opponent types are not
independent
August 20 2001
6th Computer Olympiad
9
Not leaving out
ambiguous events
5 opponent types
P = (a,b,b,b,b)
20 moves
100 - 100,000 runs
100 samples
August 20 2001
6th Computer Olympiad
10
Leaving out
ambiguous events
5 opponent types
P = (a,b,b,b,b)
20 moves
10 - 100,000 runs
100 samples
August 20 2001
6th Computer Olympiad
11
Varying number of opponent types
2-20 opponent types
P = (a,b,b,b,b)
20 moves
100,000 runs
100 samples
August 20 2001
6th Computer Olympiad
12
Off-Line Learning
• Case 2: The opponent is using a different
strategy.
– Opponent types behave random but dependent
(distribution of type i depends on type i-1)
– Real opponent selects a fixed move
August 20 2001
6th Computer Olympiad
13
100%
90%
80%
70%
opp4
60%
opp3
50%
opp2
40%
opp1
opp0
30%
Learning error
20%
Learned probabilities
-Log(error)
Opponent's selection
18
16
14
12
10
8
6
5
4
0%
2
6
0
10%
10^1
4
10^2
3
10^3
10^4
2
10^5
1
18
16
14
12
10
8
6
4
2
0
0
Opponent's selection
August 20 2001
6th Computer Olympiad
14
Fast On-Line Learning
• At the principal MIN node, only the best
moves for every opponent type are needed
• Increase the probability of an opponent type
slightly if the observed move is the same as
the selected move of this opponent type
only. Normalize all probabilities.
• Drift to one opponent type is possible.
August 20 2001
6th Computer Olympiad
15
Slower On-Line Learning
Naive Bayesian (Duda & Hart’73)
• Compute the value of every move at the principal
MIN node for every opponent type
• Transform these values into conditional
probabilities P(move | opp).
• Compute P(opp | moveobs) using P*(opp)
(Bayes rule)
• take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs)
August 20 2001
6th Computer Olympiad
16
Naïve Bayesian Learning
• In the end, drifting to 1-0 probabilities will
occur almost always
• Parameter a is very important for the actual
performance:
– amount of change in the probabilities
– convergence
– drifting speed
• It should be tuned in a real setting
August 20 2001
6th Computer Olympiad
17
Conclusions
• Effective off-line learning of probabilities is
possible, when ambiguous events are
disregarded.
• Off-line learning also works if the opponent
does not use a mixed strategy of known
opponent types.
• On-line learning must be tuned precisely to
a given situation
August 20 2001
6th Computer Olympiad
18
Future Research
• PrOM search and learning in
real game playing
– Zanzibar Bao (8x4 mancala)
– LOA (some experiment with OM-search done)
– Chess endgames
August 20 2001
6th Computer Olympiad
19