Using Partial Order Bounding in Shogi

Using Partial Order Bounding
in Shogi
Game Programming Workshop 2003
Reijer Grimbergen, Kenji Hadano and Masanao Suetsugu
Department of Information Science
Saga University
1
Contents
Why Partial Order Bounding?
The problems of using a scalar evaluation
function
What is Partial Order Bounding?
Using Partial Order Bounding in Shogi
Implementation issues
Results
Conclusions and Future Work
2
Why Partial Order Bounding?
Scalar evaluation function
Perfect play in two-player perfect information games
Mini-max search until the game theoretical value of the current
position is known
Infeasible for most interesting games
Search needs to be cut off before the game theoretical value is
known
An evaluation function is needed to estimate the probability of
winning when the search is terminated
The evaluation function
Contains most of the domain-dependent knowledge
Generally a weighted sum of feature values:
n
eval  wi f i
i 1
3
Why Partial Order Bounding?
Problems of a scalar evaluation
Unstable positions
Position
Material
Attack
Evaluation
P1
0
0
0
P2
500
–500
0
P3
–500
500
0
Long term strategic features
Large weights will give tactical problems
Small weights make it impossible to follow long term plans
Close to terminal positions
Sometimes a single feature is enough for a conclusion
Possible solution: Partial Order Bounding
4
What is Partial Order Bounding?
Partial Order Evaluation
Partial order evaluation
f1
Keep the complete set of
feature values
Compare the feature
values to decide which
position is better
f2
f3
f4
P1
P2
Comp
f1
f2
f3
f4
f1
f2
f3
f4
10
25
25
25
10
20
20
50
P1 > P2
10
25
25
25
20
20
20
20
P1 < P2
10
10
10
25
10
10
10
30
P1 < P2
5
What is Partial Order Bounding?
The problem
Why is partial order
evaluation not enough?
f1
Pos
f1
f2
f3
f4
P1
10
50
20
0
P2
10
15
30
0
Which is better: P1 or P2?
f2
f3
f4
The problem: Antichains
A subset of the partial order for
which all pairs of distinct
elements are incomparable
Example: {f2, f3} is an antichain
6
What is Partial Order Bounding?
Dealing with antichains
Simple approach: keep partially ordered values in
every node of the search tree
Leads to large sets of incomparable options
Reducing these sets leads to loss of information
Partial Order Bounding
Separate comparison and value back up
Define a target vector with targets for each of the
feature values in the antichain
Use search to determine if the target can be reached
7
What is Partial Order Bounding?
Example of partial order bounding
T1 = {5, 3}
T2 = {6,4}
A
B
T1: +
T2: –
T1: +
T2: –
C
T1: –
T2: –
D
E
F
G
(11, 5)
(5, 7)
(6, 8)
(4, 3)
T 1: +
T 2: +
T1: +
T2: –
T1: +
T2: +
T 1: –
T 2: –
8
Partial Order Bounding in Shogi
Implementation decisions
Which partial order evaluation to use?
How to set the search targets?
What to do if the search target is met or
fails?
What search depth should be used?
9
Partial Order Bounding in Shogi
Partial order evaluation
We have used the following antichain
Material
Strength of attack
Strength of defense
This partial order evaluation is
Representative
Has dominating features
10
Partial Order Bounding in Shogi
Setting the search targets
Setting the target too low
Many moves for which the target is met: which one to
choose?
Setting the target too high
No moves for which the target is met: no move can be
played
Our solution
Perform a shallow α–βsearch and use the result as the
first target
11
Partial Order Bounding in Shogi
Success and failure
POB is a series of searches with different bounds
Move
M1
POB iteration
1
2
T
F
M2
F
M3
T
M4
3
4
5
F
F
T
F
F
F
Problems in this approach
How to set the targets to minimize the number of iterations?
Which targets to increase or decrease?
No general solution: tuning problem
12
Partial Order Bounding in Shogi
Search depth
In POB there is no definite target check
A deeper search can reveal that the target is
unreachable
Optimization
Target is reached if it the player to move has
reached its target
Not very likely to avoid a search explosion
Another tuning problem
13
Results
Implementation schemes
Target settings
Scheme A (equal weight): Increasing or decreasing all three search
targets by 250
Scheme B (more weight to material): Increasing or decreasing the
material feature by 400 and attack and defense by 100
Scheme C (more weight to attack): Increasing or decreasing the
attack feature by 400 and material and defense by 100
Note: the defense feature did not give good results
Really part of the antichain?
If the target fails or succeeds for all moves, the target
changes are halved
14
Results
Search depth
3-ply α–β search to determine the initial search
targets
3, 4 and 5-ply searches for the POB iterations
50 test problems
The first (easiest) problem from Shukan Shogi 750 to
799
15
Results
Test problem results
3-ply POB
A
B
C
Solved
17
17
15
0:07
0:10
0:05
4-ply POB
A
B
C
Solved
23
19
27
1:00
1:47
0:48
5-ply POB
A
B
C
Solved
27
23
25
17:23
26:13
12:19
Avg. Time per problem
Avg. Time per problem
Avg. Time per problem
16
Results
Discussion
4-ply POB using scheme C gives the best results
27 solved problems in 48 seconds on average
Surprisingly, giving more weight to attack gives
better results than giving more weight to material
Increasing by 400 not the best?
Setting the search target has a big impact
For 4-ply POB there are only 6 problems that are
solved by all three implementation schemes
17
Conclusions and Future Work
POB can not be considered a general solution to
the problem of using scalar evaluation functions
Careful tuning is needed to use POB in a specific game
What to do if time runs out without finding a single best
move?
POB is an interesting search method for shogi
Searching different targets in parallel
Combining POB with a normal minimax search
18