Dynamic Programming - Wayne State University

Dynamic Programming (DP)
Richard Bellman (1920 - 1984)
Technique for solving optimization problems:
Scheduling
Bioinformatics (Smith-Waterman algorithm)
Packaging and inventory management
Problem = set of interdependent subproblems
Solves subproblems and uses the results to solve larger
subproblems until the entire problem is solved.
Similar to divide and conquer except that there may be
interrelations across subproblems.
Solution to a subproblem is a function of solutions to one or
more subproblems at the preceding levels.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Example: Shortest-path problem
Problem: Find the shortest path between a pair of vertices in
an acyclic graph
G contains n nodes {0, 1, . . . , n − 1} and has an edge (i, j)
only if i < j.
Find the least-cost path between nodes 0 and n − 1
f (x) - cost of the least-cost path from 0 to x
DP formulation:
f (x) =
0
x =0
min0≤j≤x {f (j) + c(j, x)} 1 ≤ x ≤ n − 1
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Example: Shortest-path problem
Find f (4)
f (4) = min{f (3) + c(3, 4), f (2) + c(2, 4)}
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Composition function
Cost of a solution composed of subproblems x1 , x2 , . . ., xl
r = g (f (x1 ), f (x2 ), . . . , f (xl ))
g - composition function (depends on the problem)
DP formulation if the optimal solution is determined by
composing optimal solutions to subproblems and selecting min
or max.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Example: Composition of subproblem solutions
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Functional equation
Functional equation (optimization equation)
Recursive equation that represents the solution
Left side: an unknown quantity
Right side: a min or max expression.
Monadic DP formulation: functional equation contains a
single recursive term
Polyadic DP formulation: functional equation contains
multiple recursive terms
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial & Nonserial DP
Dependencies between subproblems can be represented by a
graph.
If the graph is acyclic then the nodes can be organized into
levels such that subproblems at a particular level depend only
on subproblems at previous levels.
Serial DP formulation: subproblems at all levels depend only
on the results at the immediately preceding levels
Nonserial DP formulation: dependencies on other than
immediately preceding levels.
Four categories of DP formulations (not an exhaustive
classification!):
Serial monadic
Serial polyadic
Nonserial monadic
Nonserial polyadic
Difficult to develop generic parallel algorithms!
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
Problem: Find the shortest path from S to R.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
vil - i-th node at level l
l - cost of the edge connecting v l to v l+1
ci,j
i
j
Cil - cost of reaching R from any node vil
l
C l = [C0l , C1l , . . . , Cn−1
]T - cost of reaching R from level l
nodes
Shortest-path problem: compute C 0
The graph has only one starting node, thus C 0 = [C00 ]
Any path from vil to R includes a node vjl+1 , (0 ≤ j ≤ n − 1)
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
The cost of the shortest path between vil and R:
l
Cil = min{(ci,j
+ Cjl+1 )| j is a node at level l + 1}
r −1
At the last level: Cjr −1 = cj,R
Monadic: only one recursive term in its right hand side
Serial: require solutions to subproblems at the immediate
preceding level
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
The cost of reaching R from any node at level l
(0 ≤ l ≤ r − 1):
l+1
l
l
l
C0l = min{(c0,0
+ C0l+1 ), (c0,1
+ C1l+1 ), . . . , (c0,n−1
+ Cn−1
)}
l+1
l
l
l
C1l = min{(c1,0
+ C0l+1 ), (c1,1
+ C1l+1 ), . . . , (c1,n−1
+ Cn−1
)}
..
.
l+1
l
l
l
l
)}
Cn−1
= min{(cn−1,0
+C0l+1 ), (cn−1,1
+C1l+1 ), . . . , (cn−1,n−1
+Cn−1
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
× is the matrix-vector operation where + is replaced by min
and · is replaced by +.
C l = Ml,l+1 × C l+1

Ml,l+1
l
c0,0
l
c1,0
..
.
l
c0,1
l
c1,1
..
.
...
...
..
.
l
c0,n−1
l
c1,n−1
..
.



=
 l
l
l
 cn−1,0 cn−1,1
. . . cn−1,n−1
Daniel Grosu, Wayne State University







CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The Shortest-Path Problem
Sequential algorithm: a sequence of matrix-vector
multiplications
1
2
3
Compute C r −1
Compute C r −k−1 for k = 1, 2, . . . , r − 2
Compute C 0
Running time: Θ(rn2 )
Parallel algorithm:
Use the parallel matrix-vector multiplication algorithm
Matrix vector multiplication on Θ(n) processors takes Θ(n)
Parallel running time: Θ(rn)
In case of sparse graphs use sparse matrix-vector
multiplication algorithms
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
0/1 Knapsack Problem:
Given a knapsack of capacity c and a set of n objects
Object i has weight wi and profit pi
Find a set of objects to put in the knapsack such that:
n
X
w i vi ≤ c
i=1
and
n
X
pi vi
i=1
is maximized.
vi = 1 if object i is in the knapsack, and vi = 0 otherwise.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
F [i, x] - maximum profit for a knapsack of capacity x using
only objects {1, 2, . . . , i}.
F [c, n] is the solution.
DP formulation:

x ≥ 0, i = 0
 0
−∞
x ≤ 0, i = 0
F [i, x] =

max{F [i − 1, x], (F [i − 1, x − wi ] + pi )} 1 ≤ x ≤ n
Two choices:
Object not included, capacity and profit do not change
Object included, capacity becomes x − wi , profit increases by pi
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
Sequential algorithm: construct the table in row-major order.
Running time: Θ(nc)
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
CREW PRAM algorithm:
c processors: P0 , . . ., Pc .
Pr −1 computes the r -th column of matrix F .
Computing F [j, r ] requires constant time.
Parallel runtime: Θ(n).
Cost: Θ(nc) =⇒ cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
Distributed memory machine:
c processors: P0 , . . ., Pc .
F distributed column-wise.
Computing F [j, r ] requires F [j − 1, r − wj ] from another
processor.
Need to perform a wj -circular shift, takes time (ts + tw ) log c
Parallel runtime: Θ(n log c).
Cost: Θ(nc log c) =⇒ not-cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Monadic DP: The 0/1 Knapsack Problem
Distributed memory machine with more columns per processor:
p processors each computing c/p elements of the table in
each iteration.
Need to perform a circular shift; total message size of c/p
Total time for each iteration: 2ts + tw c/p + tc c/p
Parallel runtime: Θ(nc/p).
Cost: Θ(nc) =⇒ cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Longest-Common-Subsequence Problem (LCS)
LCS Problem:
Given two sequences
A =< a1 , a2 , . . . , an >
B =< b1 , b2 , . . . , bm >
Find the longest sequence that is a subsequence of both A
and B.
Example:
A =< c, a, d, b, r , z >
B =< a, s, b, z >
LCS =< a, b, z >
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Monadic DP: LCS Problem
F [i, j] - length of the longest common subsequence of the first
i elements of A and the first j elements of B.
Objective: determine F [n, m].
DP formulation:

i = 0 or j = 0
 0
F [i − 1, j − 1] + 1
i, j ≥ 0 and xi = yj
F [i, j] =

max{F [i, j − 1], F [i − 1, j]} i, j ≥ 0 and xi 6= yj
Sequential Algorithm: compute F table in row-major order.
Running time: Θ(nm)
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Monadic DP: LCS Problem
Nonserial: each node depends on two subproblems at the
preceding level and one problem two levels earlier.
Monadic: equation has a single recursive term
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Computing LCS of two amino-acid sequences
A: Alanine, E: Glutamic acid, G: Glycine, H: Histidine, P: Proline, W: Tryptophan
LCS: A W H E E
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
LCS: Parallel formulation
CREW PRAM algorithm:
For simplicity assume m = n
CREW PRAM with n processors
Pi computes the i-th column of matrix F .
Entries computed in a diagonal sweep.
Computing F [i, j] requires constant time.
Parallel runtime: Θ(n).
Cost: Θ(n2 ) =⇒ cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
LCS: Parallel formulation
Linear array algorithm:
For simplicity assume m = n
Linear array with n processors
Pi computes the i-th column of matrix F .
Entries computed in a diagonal sweep.
Computing F [i, j] requires constant time.
Parallel runtime: Tp = (2n − 1)(ts + tw + tc ).
n2 tc
n(2n−1)(ts +tw +tc ) .
1
efficiency: Emax = 2−1/n
,
Efficiency: E =
Maximum
Daniel Grosu, Wayne State University
bounded above by 0.5
CSC 7220 / ECE 7610: Parallel Computing II
LCS: Linear array parallel formulation
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Serial Polyadic DP: Floyd’s Algorithm
All-pairs shortest paths algorithm
k - minimum cost of a path from i to j using only nodes
di,j
v0 , v1 , . . . , vk−1 .
DP formulation:
k
di,j
=
ci,j
k=0
k−1
k−1
k−1
min{di,j
, (di,k
+ dk,j
)} 0 ≤ k ≤ n − 1
Polyadic formulation: two recursive terms in the equation.
Sequential algorithm running time: Θ(n3 ).
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Floyd’s APSP: Parallel formulation
CREW PRAM algorithm:
CREW PRAM with n2 processors
k , for k = 1, 2, . . . , n.
Pi,j computes di,j
k requires constant time.
Computing di,j
Parallel runtime: Θ(n).
Cost: Θ(n3 ) =⇒ cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Multiplying n matrices, A1 , A2 , . . . , An
Ai has ri−1 rows and ri columns.
Problem: Determine a parenthesization that minimizes the
number of operations.
Example: A1 : 10 × 20; A2 : 20 × 30; A3 : 30 × 40;
(A1 × A2 ) × A3 :
A1 × A2 requires 10 × 20 × 30 operations
The result matrix is 10 × 30.
Multiplying by A3 requires additional 10 × 30 × 40 operations
Total number of operations: 18000
A1 × (A2 × A3 ):
requires 20 × 30 × 40 + 10 × 20 × 40 = 32000 operations.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
C [i, j] - optimal cost of multiplying matrices Ai , . . . , Aj
Product of two smaller chains: Ai , Ai+1 , . . . , Ak and
Ak+1 , Ak+2 , . . . , Aj
First chain =⇒ ri−1 × rk matrix;
second chain =⇒ rk × rj matrix
Cost of multiplying these two: ri−1 rk rj
DP formulation:
C [i, j] =
0
mini≤k≤j {C [i, k] + C [k + 1, j] + ri−1 rk rj }
j = i, 0 ≤ i ≤ n
1≤i ≤j ≤n−1
Problem: Find C [1, n]
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Sequential algorithm:
Fills the table diagonally.
Computing C [i, j] requires evaluating j − i terms and selecting
their minimum.
Each entry in diagonal l can be computed in time ltc .
The algorithm computes n − 1 chains of length 2, taking time
(n − 1)tc
n − 2 chains of length 3, taking time (n − 2)2tc , and so on.
P
3
TS = n−1
i=1 (n − i)itc ≈ (n /6)tc
TS = Θ(n3 )
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Parallel formulation:
Logical ring of n processors
Step l: each processor computes a single element belonging to
the l-th diagonal.
After computing its value a processor sends it to all others by
all-to-all broadcast.
Total time to compute the entries in diagonal l:
ltc + ts log n + tw (n − 1)
P
TP = n−1
l=1 (ltc + ts log n + tw (n − 1))
(n−1)n
= 2 tc + ts (n − 1) log n + tw (n − 1)2
TP = Θ(n2 )
Cost: Θ(n3 ) =⇒ cost optimal.
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II
Nonserial Polyadic DP: Optimal Matrix-Parenthesization
Parallel formulation (using p processors):
Logical ring of p processors; each processor storing n/p nodes.
After computing its corresponding C [i, j] values a processor
sends them to all others by all-to-all broadcast.
Time required for all-to-all broadcast of n/p words:
ts log p + tw n(p − 1)/p ≈ ts log p + tw n
P
TP = n−1
l=1 (ltc n/p + ts log p + tw n)
(n−1)n2
= 2p tc + ts (n − 1) log p + tw n(n − 1)
TP = Θ(n3 /p) + Θ(n2 )
Daniel Grosu, Wayne State University
CSC 7220 / ECE 7610: Parallel Computing II