The Alpha-Beta principles (2)

Game Playing
Mini-Max search
Alpha-Beta pruning
General concerns on games
Two suspects are arrested by the police. The police have
insufficient evidence for a conviction, and, having
separated the prisoners, visit each of them to offer the
same deal.
If one testifies for the prosecution against the other
(defects) and the other remains silent (cooperates), the
defector goes free and the silent accomplice receives
the full 10-year sentence.
If both remain silent, both prisoners are sentenced to only
six months in jail for a minor charge.
If each betrays the other, each receives a five-year
sentence.
Each prisoner must choose to betray the other or to
remain silent. Each one is assured that the other would
not know about the betrayal before the end of the
investigation. How should the prisoners act?
2
 5个海盗抢到了100颗宝石,每一颗都一样的大小和价
值连城,他们决定这分:
 1. 抽签决定自己的号码(1,2,3,4,5)
 2. 首先,由1号提出分配方案,然后大家5人进行表决,
当且仅当超过半数的人同意时,按照他的提案进行分配
,否则将被扔入大海喂鲨鱼。
 3. 如果1号死后,再由2号提出分配方案,然后大家4人
进行表决,当且仅当超过半数的人同意时,按照他的提
案进行分配,否则将被扔入大海喂鲨鱼。
 4. 以次类推......
 条件: 1.每个海盗都是极其聪明的人 2.每个海盗都是非
常残忍的人 3.每个海盗都能明确的判断得失然后作出明
智的选择问题: 第一个海盗提出怎样的分配方案才能
3
 标准答案是:1号强盗分给3号1枚金币,4号或5号强盗
2枚,独得97枚。分配方案可写成(97,0,1,2,0
)或(97,0,1,0,2)。
4
 推 理过程是这样的:从后向前推,
 如果1-3号强盗都喂了鲨鱼,只剩4号和5号的话,5号一定投反
对票让4号喂鲨鱼,以独吞全部金币。所以,4号惟有支持3号
才能保命。
 3号知道这一点,就会提(100,0,0)的分配方案,对4号、
5号一毛不拔而将全部金币归为已有,因为他知道4号一无所获
但还是会投赞成票, 再加上自己一票他的方案即可通过。
 不过,2号推知到3号的方案,就会提出(98,0,1,1)的方
案,即放弃3号,而给予4号和5号各一枚金币。由于该方 案对
于4号和5号来说比在3号分配时更为有利,他们将支持他而不
希望他出局而由3号来分配。这样,2号将拿走98枚金币。
 不过, 2号的方案会被1号所洞悉,1号并将提出(97 ,0,1,
2,0)或(97,0,1,0,2)的方案,即放弃2号,而给3号
一枚金币,同时给4号(或5号)2枚金币。由于1号的这一方案
对于3号和4号 (或5号)来说,相比2号分配时更优,他们将
投1号的赞成票,再加上1号自己的票,1号的方案可获通过,97
枚金币可轻松落入囊中。这无疑是1号能够获取 最大收益的方
案了!
5
Why study board games ?
 One of the oldest subfields of AI (Shannon and
Turing, 1950)
 Abstract and pure form of competition that
seems to require intelligence
 Easy to represent the states and actions
 Very little world knowledge required !
 Game playing is a special case of a search problem,
with some new requirements.
6
Types of games
Deterministic
Chance
Perfect
information
Chess, checkers, Backgammon,
go, othello
monopoly
Imperfect
information
Sea battle
Bridge, poker,
scrabble,
nuclear war
7
Why new techniques for games?
 “Contingency” problem:
 We don’t know the opponents move !
 The size of the search space:
Chess : ~15 moves possible per state, 80 ply
 1580 nodes in tree
Go : ~200 moves per state, 300 ply
 200300 nodes in tree
 Game playing algorithms:
 Search tree only up to some depth bound
 Use an evaluation function at the depth bound
 Propagate the evaluation upwards in the tree
8
MINI MAX
 Restrictions:
 2 players: MAX (computer) and MIN (opponent)
 deterministic, perfect information
 Select a depth-bound (say: 2) and evaluation function
MAX
MIN
MAX
Select
this move
3
2
2
3
1
5
3
1
4
4
3
- Construct the tree up till
the depth-bound
- Compute the evaluation
function for the leaves
- Propagate the evaluation
function upwards:
- taking minima in MIN
- taking maxima in MAX
9
The MINI-MAX algorithm:
Initialise depthbound;
Minimax (board, depth) =
IF depth = depthbound
THEN return static_evaluation(board);
ELSE IF maximizing_level(depth)
THEN FOR EACH child child of board
compute Minimax(child, depth+1);
return maximum over all children;
ELSE IF minimizing_level(depth)
THEN FOR EACH child child of board
compute Minimax(child, depth+1);
return minimum over all children;
Call: Minimax(current_board, 0)
10
Alpha-Beta Cut-off
 Generally applied optimization on Mini-max.
 Instead of:
 first creating the entire tree (up to depth-level)
 then doing all propagation
 Interleave the generation of the tree and the
propagation of values.
 Point:
 some of the obtained values in the tree will
provide information that other (non-generated)
parts are redundant and do not need to be
generated.
11
Alpha-Beta idea:
 Principles:
 generate the tree depth-first, left-to-right
 propagate final values of nodes as initial estimates
for their parent node.
MAX
MIN
MAX
2
2
=2
1
- The MIN-value (1) is already
smaller than the MAX-value of
the parent (2)
- The MIN-value can only
decrease further,
- The MAX-value is only allowed
to increase,
- No point in computing further
below this node
2
5
1
12
Terminology:
- The (temporary) values at MAX-nodes are ALPHA-values
- The (temporary) values at MIN-nodes are BETA-values
MAX
MIN
MAX
Alpha-value
2
2
2
=2
1
5
1
Beta-value
13
The Alpha-Beta principles (1):
- If an ALPHA-value is larger or equal than the Beta-value
of a descendant node:
stop generation of the children of the descendant
MAX
MIN
MAX
Alpha-value
2
2
2
=2
1
5
1
Beta-value
14
The Alpha-Beta principles (2):
- If an Beta-value is smaller or equal than the Alpha-value
of a descendant node:
stop generation of the children of the descendant
MAX
MIN
MAX
2
2
2
=2
5
1
3
Beta-value
Alpha-value
15
Mini-Max with  at work:
 4 16
8 6
 5 23
= 4 15
8 2
=8 5
9 8
= 5 30
 2 10  1 18
 4 12  3 20
= 4 14 = 5 22
8 7 3 9 1 6 2 4 1
1 3 4 7
 5 31 = 5 39
9 11 13
MAX
 3 38
 1 33
 2 35
 3 25
 9 27  6 29 = 3 37
1 3 5 3 9 2 6 5 2
17 19 21 24 26
MIN
28
MAX
1 2 3 9 7 2 8 6 4
32 34 36
11 static evaluations saved !!
16
“DEEP” cut-offs
- For game trees with at least 4 Min/Max layers:
the Alpha - Beta rules apply also to deeper levels.
4
4
4
4
4
2
2
17
The Gain: Best case:
- If at every layer: the best node is the left-most one
MAX
MIN
MAX
Only THICK is explored
18
Example of a perfectly
ordered tree
MAX
21
21
21
24
12
27
21 20 19 24 23 22 27 26 25
12
15
3
18
12 11 10 15 14 13 18 17 16
3
6
MIN
9
MAX
3 2 1 6 5 4 9 8 7
19
How much gain ?
- Alpha / Beta : best case :
# (static evaluations
saved) = 2 bd/2 - 1
(if d is even)
b(d+1)/2 + b(d-1)/2 - 1
(if d is odd)
- The proof is by induction.
- In the running example: d=3, b=3 :
11 !
20
Best case gain pictured:
# Static evaluations
b = 10
No pruning
100000
Alpha-Beta
Best case
10000
1000
100
10
1
2
3
4
5 6
7
Depth
- Note: algorithmic scale.
- Conclusion: still exponential growth !!
- Worst case??
For some trees alpha-beta does nothing,
For some trees: impossible to reorder to avoid cut-offs
21
The horinzon effect.
Queen lost
Pawn lost
horizon = depth bound
of mini-max
Queen lost
Because of the depth-bound
we prefer to delay disasters, although we don’t
prevent them !!
 solution: heuristic continuations
22
Heuristic Continuation
In situations that are identifies as strategically crucial
e.g: king in danger, imminent piece loss, pawn to
become as queens, ...
extend the search beyond the depth-bound !
depth-bound
23
Games of chance
Ex.: Backgammon:
Form of the game tree:
24
State of the art
Drawn from an article by Mathew Ginsberg,
Scientific American, Winter 1998, Special Issue
on Exploring Intelligence
25
State of the art (2)
26
State of the art (3)
27