A New Test Set for Shogi

What Shogi Programs Still Cannot Do
- A New Test Set for Shogi -
Reijer Grimbergen and Taro Muraoka
Department of Informatics
Yamagata University
2004/11/13
GPW2004
1
Outline
The importance of testing
Test sets for chess
Test sets for shogi
A new test set for shogi
Problem area analysis
Some new results
Differences between humans and computers
Conclusions and future work
2004/11/13
GPW2004
2
The importance of testing
Game programming
A program should play strongly
More common is the reverse approach: minimize the
number of bad moves
Testing can help determine problem areas
Incremental testing
Save positions that the program did not handle well
Drawbacks
• Test set is program-specific
• Positions selected subjectively
2004/11/13
GPW2004
3
The importance of testing
The requirements of a test set
Testing a wide variety of potential problem areas
Not specific for one program
Test design in games
Mainly done for chess
Current test sets for shogi have shortcomings
Shogi research is at a point where focusing the effort
could be a great help
Proposing a new test set for shogi
2004/11/13
GPW2004
4
Test sets for chess
The Bratko-Kopec test set
12 tactical positions and 12 strategic positions
Designed to compare human and computer performance in chess
Thus far, no program can solve all positions
Reinfeld’s Win at chess
300 tactical positions
Used as a first test for new programs
LCT II
35 positions
Good balance between strategic, tactical and endgame positions
An ELO rating can be calculated from the solved positions
The Lindner test set
A set of positions that are considered hard for computers to solve
2004/11/13
GPW2004
5
Test sets for shogi
The Matsubara-Iida test set
48 positions taken from professional games
Selected by an expert player
Aims at judging the strength of shogi programs
First given to human players to establish a connection with playing
strength
Problems with the Matsubara-Iida test set
Judging programming strength can be established more accurately
by playing on the internet
No ELO calculation like in LCT II
Subjective selection leaves doubts about test balance
What is difficult for computers is not necessarily difficult for
humans and vice versa, so connection with playing strength is
unreliable
2004/11/13
GPW2004
6
Test sets for shogi
Other test sets for shogi
Yamashita’s test set (10 positions)
Tanase’s test set (19 positions)
Problems with these test sets
Too small
Program specific
Unclear if there is only one solution
2004/11/13
GPW2004
7
A new test set for shogi
What do we want from a test set?
1. As general as possible
2. Points to as many problem areas as possible
Find positions that can not be solved by the best programs
Finding weaknesses instead of measuring strength
2004/11/13
GPW2004
8
A new test set for shogi
Positions selected from Shukan Shogi
Every week six next-move problems
Middle game positions and endgame positions
Different tactical themes: winning material, attack, defense and
mating
Our goal: create a test set of 100 positions
The programs we used
AI Shogi 2003
Todai Shogi 5
Gekisashi 2
Conditions
30 seconds on 2 GHz Pentium 4
2004/11/13
GPW2004
9
A new test set for shogi
This was not easy!
More than 1500 positions needed to be checked
to find our test set
Additional feature
The percentage of respondents who solved the
problem is given
Differences between what is difficult for
humans and difficult for computers
2004/11/13
GPW2004
10
Problem area analysis
Why are the positions difficult?
Using the analysis tools in Todai Shogi, Gekisashi and AI Shogi to
find problem areas
Our first analysis indicates seven problem areas
Horizon effect due to consecutive checks
Not calling the tsume shogi solver deep in the search tree
Inaccurate evaluation function
Incorrect forward pruning
Mate with unpromoted pieces
Insufficient hardware speed
Problems with time allocation
2004/11/13
GPW2004
11
Problem area analysis
Horizon effect and tsume shogi
Problem 750-3
Solved: 16%
Solution
2四銀、1四玉(同歩、
2三金、同玉、3ニ角
成)、3五金
Program replies
Todai: 1五歩(敗勢)
Gekisashi: 3ニ角成
(後手優勢)
AI Shogi: 3五金
2004/11/13
GPW2004
12
Problem area analysis
Horizon effect and tsume shogi
The problem
Horizon checks
after 2四銀、1四
玉、3五金
The same
position without
horizon checks
can be solved by
all programs
2004/11/13
GPW2004
13
Problem area analysis
Horizon effect and tsume shogi
Another problem: tsume shogi deep in the
search tree
Gekisashi with more time
2四銀、1四玉、3五金、7九銀、同玉、2五桂、1
五歩、同馬、同銀(-1192)
White has mate in 9 after 同玉 and black has a
mate in 3 after 2五桂!
2004/11/13
GPW2004
14
Problem area analysis
Evaluation and forward pruning
Problem 755-3
Solved: 51%
Solution
2二金、同金、2三角
成、3三金、同馬
Program replies
Todai: 2一角成、4一
玉、6一金(勝勢)
Gekisashi: 6八銀、5六
成銀、3七桂、6六銀、
2五桂、5四歩、 2一角
成、4一玉(先手勝勢)
AI Shogi: 6八銀、5八
成銀、 2一角成、4一
玉
2004/11/13
GPW2004
15
Problem area analysis
Evaluation and forward pruning
The problem: an incorrect evaluation
After 2一角成、4一玉 the white king can
escape, but this can not be assessed
Evaluating the chances of escaping an attack is
difficult?
Another problem: forward pruning
Consecutive sacrifices 2二金 and 2三角成
Multiple sacrifices not searched deep enough?
2004/11/13
GPW2004
16
Problem area analysis
Unpromoted pieces
Problem 935-2
Solved: 95%
Solution
1三歩不成、2六銀直、
(1四歩は反則)1四
玉
Program replies
Todai: 5二と(敗勢)
Gekisashi:8四桂(後
手勝勢)
AI Shogi: 投了(!)
2004/11/13
GPW2004
17
Problem area analysis
Unpromoted pieces
The problem here seems a special case of forward pruning
Promoting a major piece or a pawn is almost always better than not
promoting
Non-promotions of these pieces are pruned to improve search
efficiency
Not a high priority problem, but could have consequences
for thinking in opponent time
When there is no difference between promoting and nonpromoting a piece, non-promoting makes thinking in opponent
time useless
My advice : play the non-promotion to win some time!
2004/11/13
GPW2004
18
Problem area analysis
Other problem areas
Insufficient hardware speed
Some positions could be solved by giving the program
more time
Improved hardware speed will automatically solve
these positions
Time allocation
In some positions, the programs would play very
quickly
These positions were deleted from our test set
However, it might be a different problem area: when to
cut off the search?
2004/11/13
GPW2004
19
Problem area analysis
Overview
Problem Area
Positions
Insufficient hardware speed
31
Inaccurate evaluation function
20
Incorrect forward pruning
19
Horizon effect
18
Tsume shogi
11
Mate using unpromoted pieces
6
Reason unclear
7
2004/11/13
GPW2004
20
Some new results
New program versions have been released
Todai Shogi 6 and 7, Gekisashi 3 and AI Shogi 2004
Results of Todai 6 on the test set
Solved 6 positions
The problem areas of these positions was different
•
•
•
•
2004/11/13
Inaccurate evaluation function (2 positions)
Insufficient hardware speed (2 positions)
Horizon effect (1 position)
Reason unclear (1 position)
GPW2004
21
Differences between humans and
computers
How difficult are the
positions for human
players?
Almost half of the positions
(46) can be solved by more
than 50% of the human
respondents
There are 14 positions that
can not be solved by
computers, but by more
than 80% of the humans
2004/11/13
GPW2004
Human
percentage
Positions
0 – 10%
0
11 – 20%
12
21 – 30%
18
31 – 40%
10
41 – 50%
13
51 – 60%
16
61 – 70%
7
71 – 80%
9
81 – 90%
9
91 – 100%
5
22
Conclusions and future work
We have proposed a set of 100 positions that is
general and points to specific problem areas in
computer shogi
As more positions get solved, we intend to replace
them with new positions
Further investigate of the unsolved positions for
which the problem could not be determined
Making further comparisons between what is
difficult for humans and difficult for computers
2004/11/13
GPW2004
23
Finally
Download the test set here
gamelab.yz.yamagata-u.ac.jp/RESEARCH/shogitestset.zip
Let me know about your results
2004/11/13
GPW2004
24