BISC-SIG-ES Short Course Fuzzy Logic and GA

search
A* Examples:
CS 331/531 Dr M M Awais
1
search
8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
3+3
1+5
2+3
3+4
5+2
0+4
3+2
1+3
2+3
4+1
5+0
3+4
1+5
2+4
CS 331/531 Dr M M Awais
2
search
Robot Navigation
CS 331/531 Dr M M Awais
3
search
Robot Navigation
f(N) = h(N), with h(N) = Manhattan distance to the goal
(not A*)
8
7
7
6
5
4
5
4
3
3
2
6
7
6
8
7
3
2
3
4
5
6
5
1
0
1
2
4
5
6
5
4
3
2
3
CS 331/531 Dr M M Awais
4
5
6
4
search
Robot Navigation
f(N) = h(N), with h(N) = Manhattan distance to the goal
(not A*)
8
7
7
6
5
4
5
4
3
3
2
6
77
6
8
7
3
2
3
4
5
6
5
1
00
1
2
4
5
6
5
4
3
2
3
CS 331/531 Dr M M Awais
4
5
6
5
search
Robot Navigation
f(N) = g(N)+h(N), with h(N) = Manhattan distance to goal
(A*)
8+3
6+5
8 7+4
7 6+3
6 5+6
5 4+7
4 3+8
3 2+9
2 3+10
3 4
7+2
7
6+1
6
5
5+6
5 4+7
4 3+8
3
3 2+9
2 1+10
1 0+11
0 1
6
5
2
4
7+0
7 6+1
6
5
8+1
8 7+2
7 6+3
6 5+4
5 4+5
4 3+6
3 2+7
2 3+8
3 4
CS 331/531 Dr M M Awais
5
6
6
search
Adversary Search (Games)
The aim is to move in such a way as to ‘stop’ the
opponent from making a good / winning move.
Game playing can use Tree - Search.
The tree or game - tree alternates between two players.
CS 331/531 Dr M M Awais
7
search
Games?


Games are a form of multi-agent environment
 What do other agents do
 How do they affect our success?
 Cooperative vs. competitive multi-agent
environments.
 Competitive multi-agent environments give rise to
adversarial problems (games)
Why study games?
 Fun; historically entertaining
 Interesting subject of study because they are hard
 Easy to represent and agents restricted to small
number of actions
CS 331/531 Dr M M Awais
8
search
Games vs. Search

Search – no adversary





Solution is (heuristic) method for finding goal
Heuristics and CSP techniques can find optimal solution
Evaluation function: estimate of cost from start to goal
through given node
Examples: path planning, scheduling activities
Games – adversary




Solution is strategy (strategy specifies move for every possible
opponent reply).
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of
game position
Examples: chess, checkers, Othello, backgammon
CS 331/531 Dr M M Awais
9
search
Types of Games
CS 331/531 Dr M M Awais
10
search
Game setup



Two players: MAX and MIN
MAX and MIN take turns until the game is
over. Winner gets award, looser gets penalty.
Games as search:





Initial state: e.g. board configuration of chess
Successor function: list of (move,state) pairs
specifying legal moves.
Terminal test: Is the game finished?
Utility function: Gives numerical value of terminal
states. E.g. win (+1), loose (-1) and draw (0) in
tic-tac-toe (next)
MAX uses search tree to determine next
move.
CS 331/531 Dr M M Awais
11
search
Things to Remember:
1.
Every move is vital
2.
The opponent could win at the next move or
subsequent moves.
3.
Keep track of the safest moves
4.
The opponent is well - informed
5.
How the opponent is likely to response to your
moves.
CS 331/531 Dr M M Awais
12
search
A
Two move win
P1 moves
B
Player 1 = P1
Player 2 = P2
D
C
P2 moves
wins
E
P1
F
P2
G
P1
H I J
P1 P2 P2
Safest move for P1 is always A to C
Safest move for P2 is always A to D (if allowed 1st move)
CS 331/531 Dr M M Awais
13
search
MINIMAX Procedure for Games
Assumption: Opponent has same knowledge of state
space and makes a consistent effort to WIN.
MIN: Label for the opponent trying to minimize other
player’s (MAX) score.
MAX: Player trying to win (maximise advantage)
BOTH MAX AND MIN ARE EQUALLY INFORMED
CS 331/531 Dr M M Awais
14
search
Rules
MAX
1. Label levels MAX and MIN
2. Assign values to leaf nodes:
MIN
0 if MIN wins
1 if MAX wins
MAX
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
MIN
If parent is MIN, assign it
min-value of its children
CS 331/531 Dr M M Awais
15
search
Rules
MAX
1. Label level’s MAX and MIN
2. Assign values to leaf nodes:
MIN
0 if MIN wins
1 if MAX wins
MAX
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
MIN
If parent is MIN, assign it
min-value of its children
CS 331/531 Dr M M Awais
0
1
16
search
Rules
MAX
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
MIN
MAX
1
min-value of its children
Max(0,1) = 1
1
MIN
Max(1) = 1
0
CS 331/531 Dr M M Awais
1
17
search
Rules
MAX
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
MIN
1
1
1
MAX
min-value of its children
Min(1) = 1
1
MIN
Min(1) = 1
CS 331/531 Dr M M Awais
0
1
18
search
Max(1,1)
Rules
MAX
3. Propagate values up the graph.
1
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
MIN
1
1
1
MAX
min-value of its children
Min(1) = 1
1
MIN
Min(1) = 1
CS 331/531 Dr M M Awais
0
1
19
search
Utility Values
• Leaf Nodes represent the result of the
game
• Results could be WIN or LOOSE for any
player
• WIN for MAX is 1, LOOSE for MAX is 0
• These values are known as Utility values
/ functions
• Draw could be another result, in this
case
• WIN for MAX could be
1
• LOOSE for MAX could be
–1
• DRAW could be
0
CS 331/531 Dr M M Awais
20
search
Game tree (2-player, deterministic, turns)
CS 331/531 Dr M M Awais
21
search
MINMAX Unfinished Games
• Apply from the leaf node to the start
node
• Or, Result nodes are necessary to be in
search space
• What if you want to evaluate the game
status at an intermediate level
• E.g.,
• The game finishes at level 5
• We want to find out the relative
advantage of MAX upto level 3.
• Solution: Evaluate intermediate nodes
the
through a heuristic and then apply MINMAX
CS 331/531 Dr M M Awais
22
search
Minimaxing to fixed ply depth
(Complex games)
Strategy: n - move look ahead
- Suppose you start in the middle of the game.
- One cannot assign WIN/LOOSE values at that stage
- In this case some heuristics evaluation is applied
- Values are then projected back to supply indications of
WINNING/LOOSING trend.
CS 331/531 Dr M M Awais
23
search
HEURISTIC FUNCTION: TIC - TAC - TOE
X
X
O
X
O
X
X
O
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
CS 331/531 Dr M M Awais
24
search
HEURISTIC FUNCTION: TIC - TAC - TOE
X
X
O
X
O
M(n)=4
M(n)=5
X
X
O
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
CS 331/531 Dr M M Awais
25
search
HEURISTIC FUNCTION: TIC - TAC - TOE
X
X
O
X
O
M(n)=5
O(n)=1
E(n)=4
X
X
O
M(n)=4
O(n)=2
E(n)=2
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
CS 331/531 Dr M M Awais
26
search
Two-Ply Game Tree
CS 331/531 Dr M M Awais
27
search
Two-Ply Game Tree
CS 331/531 Dr M M Awais
28
search
Two-Ply Game Tree
CS 331/531 Dr M M Awais
29
search
Two-Ply Game Tree
The minimax decision
Minimax maximizes the worst-case outcome for max.
CS 331/531 Dr M M Awais
30
search
Problem of minimax search

Number of games states is exponential
to the number of moves.
CS 331/531 Dr M M Awais
31
search
Solution

Do not examine every node

Alpha-beta pruning
Alpha = value of best choice
found so far at any choice point
along the MAX path

Beta = value of best choice found
so far at any choice point along the
MIN path

CS 331/531 Dr M M Awais
32
search
Alpha - Beta Procedures
• Minimax procedure pursues all
branches in the space. Some of them
could have been ignored or pruned.
• To improve efficiency pruning is
applied to two person games
CS 331/531 Dr M M Awais
33
search
Simple Idea
if A > 5 OR B < 0
If the first condition A > 5 succeeds then B < 0 may not
be evaluated.
if A > 5 AND B < 0
If the first condition A > 5 fails then B < 0 may not be
evaluated.
CS 331/531 Dr M M Awais
34
search
Implementation
FORWARD PASS:
APPLY DEPTH FIRST SEARCH REACH THE LEAF
NODE
BACKWARD PASS:
PROPAGATE THE VALUES TO THE ROOT NODE
CS 331/531 Dr M M Awais
35
search
a
MAX
c
MIN
b = 0.4
e -0.2 (at least)
MAX
MIN
g = -0.2
Why –0.2 is the least value?
CS 331/531 Dr M M Awais
36
search
a
MAX
c
MIN
b = 0.4
e -0.2
MAX
MIN
g = -0.2
Suppose this node takes a value less than –0.2
Value for node e will not change and remains at –0.2
CS 331/531 Dr M M Awais
37
search
a
MAX
c
MIN
b = 0.4
e
v
MAX
MIN
g = -0.2
Suppose this node takes a value greater than –0.2, say v
Value for node e will change to
v
CS 331/531 Dr M M Awais
38
search
a
MAX
c
MIN
b = 0.4
e
v
MAX
MIN
g = -0.2
WHAT IS THE LOWER BOUND ON
v?
Lower bound is the value at node g
CS 331/531 Dr M M Awais
39
search
a
MAX
c
MIN
b = 0.4
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
Minimum advantage for e MAX node is –0.2
This is called the ALPHA Value for MAX Node
CS 331/531 Dr M M Awais
40
search
a
MAX
c -0.2 (at most)
MIN
b = 0.4
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
Why –0.2 is the AT MOST value
For node c ?
CS 331/531 Dr M M Awais
41
search
a
MAX
c
MIN
b = 0.4
v
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
Suppose this node takes a value v less than –0.2
Value for node c will change to
v
CS 331/531 Dr M M Awais
42
search
a
MAX
MIN
b = 0.4
c -0.2
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
Suppose this node takes a value greater than –0.2
Value for node c will not change and will remain at –0.2
CS 331/531 Dr M M Awais
43
search
a
MAX
MIN
b = 0.4
c -0.2
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
WHAT IS THE UPPER BOUND ON
v?
UPPER bound is the value at node e
CS 331/531 Dr M M Awais
44
search
a
MAX
MIN
b = 0.4
c = -0.2 (at most)
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
Maximum advantage for c MIN node is –0.2
This is called the BETA Value for MIN Node
CS 331/531 Dr M M Awais
45
search
a
MAX
MIN
b = 0.4
c = -0.2 (at most)
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
FIND THE ALPHA VALUE FOR NODE a ?
CS 331/531 Dr M M Awais
46
search
a  = 0.4 (at least)
MAX
MIN
b = 0.4
c = -0.2 (at most)
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
The least advantage which MAX can
get in this portion of the game is 0.4
CS 331/531 Dr M M Awais
47
search
a  = 0.4 (at least)
MAX
MIN
b = 0.4
c = -0.2 (at most)
e
 =-0.2 (at least)
MAX
MIN
g = -0.2
IF this least advantage is acceptable, then
Expanding to c and to all the proceeding
nodes can be neglected: Prune away link to c
With ALPHA=0.4
CS 331/531 Dr M M Awais
48
search
- MAX node neglects
MAX
values <= a (atleast it
can score) at MIN
nodes below it.
A
- MIN node neglects
values >= b (almost it
can score) at MAX
nodes below it
C
MIN
B =10
G=0
H
C node can score ATMOST 0 nothing above 0 (beta)
A node can score ATLEAST 10 nothing less than 10 (alpha)
CS 331/531 Dr M M Awais
49
search
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
[-∞,+∞]
[-∞, +∞]
CS 331/531 Dr M M Awais
50
search
Alpha-Beta Example
(continued)
[-∞,+∞]
[-∞,3]
CS 331/531 Dr M M Awais
51
search
Alpha-Beta Example
(continued)
[-∞,+∞]
[-∞,3]
CS 331/531 Dr M M Awais
52
search
Alpha-Beta Example
(continued)
[3,+∞]
[3,3]
CS 331/531 Dr M M Awais
53
search
Alpha-Beta Example
(continued)
[3,+∞]
This node is worse
for MAX
[3,3]
[-∞,2]
CS 331/531 Dr M M Awais
54
search
Alpha-Beta Example
(continued)
[3,14]
[3,3]
[-∞,2]
CS 331/531 Dr M M Awais
,
[-∞,14]
55
search
Alpha-Beta Example
(continued)
[3,5]
[3,3]
[−∞,2]
CS 331/531 Dr M M Awais
,
[-∞,5]
56
search
Alpha-Beta Example
(continued)
[3,3]
[3,3]
[−∞,2]
CS 331/531 Dr M M Awais
[2,2]
57
search
Alpha-Beta Example
(continued)
[3,3]
[3,3]
[-∞,2]
CS 331/531 Dr M M Awais
[2,2]
58