Week 4: A* continued

9/10
Name plates for everyone!
Blog qn. on Dijkstra Algorithm..
• What is the difference between Uniform
Cost Search and Dijkstra algorithm?
• Given the difference, which algorithm is
better (and when)?
• Any ideas on the other question?
“Informing” Uniform search…
A
0.1
Bait &
Switch
Graph
B
0.1
0.1
9
C
N1:B(.1)
N2:G(9)
N3:C(.2)
N4:D(.3)
D
25
No:A (0)
G
N5:G(25.3)
Admissibility
Informedness
Would be nice if we could tell that
N2 is better than N1
--Need to take not just the distance
until now, but also distance to goal
--Computing true distance to goal is as
hard as the full search
--So, try “bounds” h(n)
prioritize nodes in terms of f(n) = g(n) +h(n)
two bounds: h1(n) <= h*(n) <= h2(n)
Which guarantees optimality?
--h1(n) <= h2(n) <= h*(n)
Which is better function?
f(n) is the estimate of
the length of the
shortest path to goal
passing through n
*
A
(if there are multiple goal nodes,
we consider the distance to the
nearest goal node)
No
N’’
N
Several proofs:
1. Based on Branch and bound
--g(N) is better than f(N’’) and f(n’’) <= cost of best path through N’’
2. Based on contours
-- f() contours are more goal directed than g() contours
3. Based on contradiction
A* Search
7
A 79
No:A (0)
.1
20
8.8
25.2
B
N1:B(.1+8.8) N2:G(9+0)
.1
9
28
0 C
25.1
N3:C(max(.2+0),8.8)
No:A (0)
N1:B(.1+25.2)N2:G(9+0)
.1
25
25D
25
N4:D(.3+25)
G
00
This is just enforcing
Triangle law of inequality
That the sum of two sides
Must be greater than the third
f(B)= .1+8.8 = 8.9
PathMax Adjustment
f(C)= .2+0 = 0.2
This doesn’t make sense since we are
reducing the estimate of the actual cost of the path A—B—C—D—G
To make f(.) monotonic along a path, we say
f(n) = max( f(parent), g(n)+h(n))
B
C
f(B)
f(C)
G
Visualizing A* Search
A*
Uniform
cost
search
It will not expand
Nodes with f >f*
(f* is f-value of the
Optimal goal which
is the same as g* since
h value is zero for goals)
(h*-h)/h*
IDA*--do iterative
depth first search but
Set threshold in terms of
f (not depth)
IDA* to handle the A* memory
problem
•
Basicaly IDDFS, except instead of the iterations
being defined in terms of depth, we define it in
terms of f-value
– Start with the f cutoff equal to the f-value of the
root node
– Loop
•
Generate and search all nodes whose f-values are
less than or equal to current cutoff.
– Use depth-first search to search the trees in the
individual iterations
– Keep track of the node N’ which has the smallest fvalue that is still larger than the current cutoff. Let this
f-value be next-largest-f-value
-- If the search finds a goal node, terminate. If not, set
cutoff = next-largest-f-value and go back to Loop
Properties:
Linear memory.
#Iterations in the worst case? = Bd !! 
Very similar to IDDUC
discussed last class
(Happens when all nodes have distinct
f-values. There is such a thing as too much
discrimination…)
Using memory more effectively:
SMA*
• A* can take exponential space in the worst case
• IDA* takes linear space (in solution depth) always
• If A* is consuming too much space, one can argue that
IDA* is consuming too little
• Better idea is to use all the memory that is available, and
start cleaning up as memory starts filling up
– Idea: When the memory is about to fill up, remove the leaf node
with the worst f-value from the search tree
• But remember its f-value at its parent (which is still in the search
tree)
– Since the parent is now the leaf node, it too can get removed to make
space
• If ever the rest of the tree starts looking less promising than the
parent of the removed node, the parent will be picked up and
expanded again.
– Works quite well—but can thrash when memory is too low
• Not unlike your computer with too little RAM..
Different levels of abstraction for shortest path problems on the plane
I
G
The obstacles in the shortest path problem
canbe abstracted in a variety of ways.
--The more the abstraction, the cheaper
it is to solve the problem in abstract space
--The less the abstraction, the more “informed”
the heuristic cost (i.e., the closer the
abstract path length to actual path length)
hD
“disappearing-act abstraction”
I
hC
G
“circular abstraction”
I
hP
Actual
h*
I
G
“Polygonal abstraction”
G
How informed should the
heuristic be?
Total cost
incurred in search
I
G
Cost of computing
the heuristic
hD
I
hC
G
“circular abstraction”
I
hP
Actual
Cost of searching
with the heuristic
h0
hD
hC
hP
h*
h*
I
G
“Polygonal abstraction”
G
Not always clear where the total minimum
occurs
• Old wisdom was that the global min was
closer to cheaper heuristics
• Current insights are that it may well be far
from the cheaper heuristics for many problems
• E.g. Pattern databases for 8-puzzle
• polygonal abstractions for SP
• Plan graph heuristics for planning
9/12
Admissibility/Informedness
h5
h4
h*
Max(h2,h3)
h1
h3
h2
On “predicting” the effectiveness of
Heuristics
Unfortunately, it is not the case that a heuristic
h1 that is more informed than h2 will always
do fewer node expansions than h2.
-We can only gurantee that h1 will expand
less nodes with f-value less than f* than
h2 will
•
Consider the plot on the right… do you think
h1 or h2 is likely to do better in actual search?
– The “differentiation” ability of the
heuristic—I.e., the ability to tell good
nodes from the bad ones-- is also
important. But it is harder to measure.
• Some new work that does a
histogram characterization of the
distribution of heuristic values [Korf,
2000]
Nevertheless, informedness of heuristics is a
reasonable qualitative measure
•
Let us divide the number of nodes expanded n E into
Two parts: nI which is the number of nodes expanded
Whose f-values were strictly less than f* (I.e. the
Cost of the optimal goal), and nG is the # of expanded
Nodes with f-value greater than f*. So, nE=nI+nG
A more informed heuristic is only guaranteed to have
A smaller nI—all bets are off as far as the nG value is
Concerned. In many cases nG may be relatively large
Compared to nI making the nE wind up being higher
For an informed heuristic!
h*
Heuristic value
•
h1
h2
Is h1 better or h2?
Nodes
The lower-bound (optimistic) estimate on the length of the
path to N’ through N’’ is already longer than the path to N.
Proof of Optimality of A* search
Proof of optimality:
Let N be the goal node we output.
Suppose there is another goal node N’
We want to prove that g(N’) >= g(N)
Suppose this is not true.
i.e. g(N’) < g(N) --Assumption A1
No
N’’
When N was picked up for expansion,
Either N’ itself, or some ancestor of N’,
Say N’’ must have been on the search queue
f(n) is the estimate of
the length of the
shortest path to goal
passing through n
If we picked N instead of N’’ for expansion,
It was because
f(N) <= f(N’’) ---Fact f1
i.e. g(N) + h(N) <= g(N’’) + h(N’’)
Since N is goal node, h(N) = 0
So, g(N) <= g(N’’) + h(N’’)
But g(N’) = g(N’’) + dist(N’’,N’)
Given h(N’) <= h*(N’’) = dist(N’’,N’) (lower bound)
So g(N’) = g(N’’)+dist(N’’,N’) >= g(N’’) +h(N’’) ==Fact f2
So from f1 and f2 we have
g(N) <= g(N’)
But this contradicts our assumption A1
N
N’
Holds only because
h(N’’) is a lower bound on
dist(N’’,N’)
Where do heuristics (bounds) come from?
From relaxed problems (the more relaxed, the easier to compute
heuristic, but the less accurate it is)
For path planning on the plane (with obstacles)?
Assume away obstacles. The distance will then be
The straightline distance (see next slide for other abstractions)
For 8-puzzle problem?
Assume ability to move the tile directly to the place
distance= # misplaced tiles
Assume ability to move only one position at a time
distance = Sum of manhattan distances.
For Traveling sales person?
Relax the “circuit” requirement. Minimum spanning tree
Important: “blank” is not counted as a tile..
Performance on 15 Puzzle
• Random 15 puzzle instances were first
solved optimally using IDA* with
Manhattan distance heuristic (Korf, 1985).
• Optimal solution lengths average 53
moves.
• 400 million nodes generated on average.
• Average solution time is about 50
seconds on current machines.
Limitation of Manhattan
Distance
• To solve a 24-Puzzle instance, IDA* with
Manhattan distance would take about
65,000 years on average.
• Assumes that each tile moves
independently
• In fact, tiles interfere with each other.
• Accounting for these interactions is the
key to more accurate heuristic functions.
Getting Fringe Pattern in Shape..
14 7
3
15
12
11
13
7 13
12
15
11
3
14
12
11
14
7
13
3
15
3
7
11
12 13 14 15
M.d. is 19 moves, but
31 moves are needed.
3
7
11
12 13 14 15
M.d. is 20 moves, but
28 moves are needed
3
7
11
12 13 14 15
M.d. is 17 moves, but
27 moves are needed
Heuristics from Pattern Databases
5 10 14 7
8
15
3
6
1
2
3
6
7
1
4
5
12 9
8
9 10 11
2 11 4 13
12 13 14 15
31 moves is a lower bound on the total number
of moves needed to solve this particular state.
Pattern Database Heuristics
• Culberson and Schaeffer, 1996
• A pattern database is a complete set of
such positions, with associated number of
moves.
• The bigger the fringe pattern, the more
informed the heuristic; but the costlier it is
to compute and store..
– e.g. a 7-tile pattern database for the Fifteen
Puzzle contains 519 million entries.
Precomputing Pattern
Databases
• Entire database is computed with one
backward breadth-first search from goal.
• All non-pattern tiles are indistinguishable,
but all tile moves are counted.
• The first time each state is encountered,
the total number of moves made so far is
stored.
• Once computed, the same table is used
for all problems with the same goal state.
How informed should the
heuristic be?
Total cost
incurred in search
Cost of computing
the heuristic
Cost of searching
with the heuristic
h0
h#misp
hmanhatt
*
hpat1 hpat2 h Not always clear where the total minimum
occurs
• Old wisdom was that the global min was
closer to cheaper heuristics
• Current insights are that it may well be far
from the cheaper heuristics for many problems
• E.g. Pattern databases for 8-puzzle
• polygonal abstractions for SP
• Plan graph heuristics for planning