Figure 11.1 An 8-puzzle problem instance

5 2
1 8 3
4 7 6
up
5 2
1 8 3
4 7 6
1 2 3
4 5 6
7 8
(a)
(b)
1 5 2
8 3
4 7 6
up
1 5 2
4 8 3
7 6
left
1 5 2
1 5 2
down
4 8 3
4
3
7
6
7 8 6
left
1 2 3
4 5 6
7 8
up
1 2 3
4 5
7 8 6
Last tile moved
down
up
1 2
4 5 3
7 8 6
left
1
2
4 5 3
7 8 6
Blank tile
(c)
Figure 11.1 An 8-puzzle problem instance: (a) initial configuration; (b) final configuration; and (c)
a sequence of moves leading from the initial to the final configuration.
Terminal node (non-goal)
Non-terminal node
x1 = 1
x1 = 0
Terminal node (goal)
x2 = 1
x2 = 0
x3 = 0
x3 = 0
x3 = 1
x4 = 0
x4 = 1
x4 = 0
f (x) = 0
x3 = 1
x4 = 1
f (x) = 2
Figure 11.2 The graph corresponding to the 0/1 integer-linear-programming problem.
1
2
3
1
4
5
2
4
3
6
5
7
8
6
7
9
8
7
9
8
9
(a)
1
1
2
3
4
5
6
7
8
9
10
2
3
4
4
5
6
5
6
7
7
7
7
8
9
8
9
8
9
8
9
10
10
10
10
10
10
10
10
(b)
Figure 11.3 Two examples of unfolding a graph into a tree.
7 2 3
A 4 6 5
1 8
down
right
Step 1
7 2 3
6 5
4
C
1
8
7 2 3
B 4 6
1 8 5
up
7 2 3
D 4 6 5
7 2
E 4 6 3
F
1 8 5
1 8
Step 3
right
down
Step 2
up
7 2 3
G 4 6
1 8 5
7 2 3
4
6
1 8 5
right
7
Blank tile
The last tile moved
2
H 4 6 3
1 8 5
Figure 11.4 States resulting from the first three steps of depth-first search applied to an instance
of the 8-puzzle.
Bottom of the stack
1
2
3
4
7
8
9
5
5
1
4
5
3
8
9
7
11
10
14
13
16
15
19
4
6
9
10
11
12
13
14
15
16
17
8
11
14
17
17
18
19
20
21
22
23
16
19
18
24
24
21
23
24
23
Current State
(a)
Top of the stack
(b)
(c)
Figure 11.5 Representing a DFS tree: (a) the DFS tree; successor nodes shown with dashed lines
have already been explored; (b) the stack storing untried alternatives only; and (c) the stack storing
untried alternatives along with their parent. The shaded blocks represent the parent state and the
block to the right represents successor states that have not been explored.
7 2 3
4 6 5
1 8
1 2 3
4 5 6
7 8
(a)
(b)
6
Blank Tile
The last tile moved
7 2 3
4 6 5
1 8
7 2 3
4 6 5
1 8
6
Step 1
Step 1
7 2 3
4 6
1 8 5
7
7 2 3
4 6 5
1
8
7
7
7 2 3
4 6
1 8 5
7 2 3
4 6 5
1
8
7
Step 2
8
7 2 3
4
6
1 8 5
7 2
4 6 3
1 8 5
7 2 3
4 6 5
1 8
6
7 2 3
4 6 5
1 8
6
Step 1
Step 1
7
7 2 3
4 6
1 8 5
7 2 3
4 6 5
1
8
7
7
Step 2
8
7 2
4 6 3
1 8 5
7 2 3
4 8 6
1
5
7 2 3
4 6
1 8 5
7 2 3
4 6 5
1
8
7
Step 2
7 2 3
4
6
1 8 5
6
8
7 2
4 6 3
1 8 5
Step 3
7
6
3
7
4 2 6
1 8 5
7 2 3
4 6
1 8 5
7
7
7
7 2 3
4 8 6
1
5
Step 4
7 2 3
4
6
1 8 5
6
7 2 3
4 6 5
1 8
8
3
7
4 2 6
1 8 5
7 2 3
4 6
1 8 5
7 2 3
5
4
1 6 8
8
7
7
(c)
Figure 11.6 Applying best-first search to the 8-puzzle: (a) initial configuration; (b) final configuration; and (c) states resulting from the first four steps of best-first search. Each state is labeled with
its h-value (that is, the Manhattan distance from the state to the final state).
A
B
C
(a)
D
E
F
(b)
Figure 11.7 The unstructured nature of tree search and the imbalance resulting from static partitioning.
Service any pending
messages
Do a fixed amount of work
Finished
available
work
Got
work
Processor active
Processor idle
Select a processor and
request work from it
Service any pending
messages
Got a reject
Issued a request
Figure 11.8 A generic scheme for dynamic load balancing.
1
1
3
3
5
4
5
7
4
7
9
8
8
9
10
10
11
14
11
13
13
14
16
17
15
16
17
19
18
19
Cutoff depth
24
21
23
22
23
24
Current State
(a)
(b)
Figure 11.9 Splitting the DFS tree in Figure 11.5. The two subtrees along with their stack representations are shown in (a) and (b).
w0 = 0.5
w0 = 0.25
w0 = 0.5
w1 = 0.25
w1 = 0.5
w2 = 0.25
Step 1
Step 2
Step 4
w3 = 0.25
w3 = 0.25
w2 = 0.25
Step 3
w0 = 0.5
w0 = 0.25
w1 = 0.5
w1 = 0.25
w0 = 1.0
w1 = 0.5
Step 5
Step 6
Figure 11.10 Tree-based termination detection. Steps 1–6 illustrate the weights at various processors after each work transfer.
700
Speedup
600
ARR
GRR
RP
500
400
300
200
100
0
0
200
400
600
800
1000
1200
p
Figure 11.11 Speedups of parallel DFS using ARR, GRR and RP load-balancing schemes.
900000
Number of work requests
700000
500000
GRR
Expected (GRR)
RP
Expected (RP)
300000
100000
0
0
200
400
600
800
1000
1200
p
Figure 11.12 Number of work requests generated for RP and GRR and their expected values
(O( p log2 p) and O( p log p) respectively).
2.5e+07
W
2e+07
1.5e+07
1e+07
E = 0.64
E = 0.74
E = 0.85
E = 0.90
5e+06
0
0
20000
40000
60000
80000
100000
120000
2
p log p
Figure 11.13 Experimental isoefficiency curves for RP for different efficiencies.
Number of busy
Processors
Time
(a)
Number of busy
Processors
Time
(b)
Number of busy
Processors
Time
(c)
Figure 11.14 Three different triggering mechanisms: (a) a high triggering frequency leads to high
load-balancing cost, (b) the optimal frequency yields good performance, and (c) a low frequency
leads to high idle times.
Global pointer
1
2
3
4
5
6
Idle
5
6
7
1
2
3
4
Global pointer
Busy
Figure 11.15 Mapping idle and busy processors with the use of a global pointer.
Global list maintained
at designated processor
Put expanded
nodes
Get
current
best node
Lock the list
Place generated
nodes in the list
Pick the best node
from the list
Unlock the list
Expand the node to
generate successors
P0
Lock the list
Lock the list
Place generated
nodes in the list
Pick the best node
from the list
Unlock the list
Expand the node to
generate successors
Place generated
nodes in the list
Pick the best node
from the list
Unlock the list
Expand the node to
generate successors
Pp−1
P1
Figure 11.16 A general schematic for parallel best-first search using a centralized strategy. The
locking operation is used here to serialize queue access by various processors.
Exchange
best nodes
Local list
Local list
Local list
Exchange
best nodes
Exchange
best nodes
P0
Pp−1
P1
Figure 11.17 A message-passing implementation of parallel best-first search using the ring communication strategy.
blackboard
Exchange
best nodes
Exchange
best nodes
Exchange
best nodes
Local list
Local list
Local list
Pp−1
P0
P1
Figure 11.18 An implementation of parallel best-first search using the blackboard communication
strategy.
Start node S
Start node S
R1
1
2
3
10
11
4
12
5
6
R2
9
R3
L1
R4
R5
L3
13
L4
Goal node G
7
L2
Goal node G
8
Total number of nodes generated by
sequential formulation = 13
Total number of nodes generated by
two-processor formulation of DFS = 9
(a)
(b)
Figure 11.19 The difference in number of nodes searched by sequential and parallel formulations
of DFS. For this example, parallel DFS reaches a goal node after searching fewer nodes than sequential DFS.
Start node S
1
Start node S
R1
2
R2
3
4
R3
5
L1
R4
L2
R5
6
L3
R6
7
L4
R7
Goal node G
L5
Goal node G
Total number of nodes generated by
sequential DFS = 7
(a)
Total number of nodes generated by
two-processor formulation of DFS = 12
(b)
Figure 11.20 A parallel DFS formulation that searches more nodes than its sequential counterpart.
Initially, target = x
After Increment, target = x+5
000
x
x
000
1
1
x+1
1
001
3
x+2
010
010
100
x+3
1
2
1
011
100
110
x+3
x+2
1
011
010
100
x+1
x
111
x+2
2
000
000
110
100
x+4
101
11
101
110
111
000
001
Figure 11.21 Message combining and a sample implementation on an eight-processor hypercube.
Subtask generator
(manager)
Subtasks
Search tree
Work request
Processors
Figure 11.22 The single-level work-distribution scheme for tree search.