Dynamic Programming

Dynamic Programming
Presenters:
Michal Karpinski
Eric Hoffstetter
Background
•
“Dynamic programming” originates with Richard Bellman (1940s) in multistage
decision process problems.
– While at RAND Corp, he wanted his work to appear more practical (“real work”) as
opposed to theoretical. To shield himself from scrutiny, Bellman chose the word
“programming,” which implies fruitful, deliberate effort and embellished it with “dynamic.”
As he puts it “it’s impossible to use dynamic in a pejorative sense.”
•
Applications:
– String alignments / problems
– Pattern recognition:
• Image matching / image recognition (2D & 3D)
• Speech recognition (Viterbi algorithm)
– Manufacturing – find fastest way through factory
– Order of matrices in matrix multiplication to minimize cost
– Build optimal binary search tree – minimize number of nodes visited during search
• Language translator – most common words near root of tree
Used to solve problems exhibiting:
– Overlapping Subproblems: “they occur as a
subproblem of different problems”
– Optimal Substructure: “An optimal solution
to the problem contains within it optimal
solutions to subproblems.”
– Subproblem Independence: “the solution to
one subproblem does not affect the solution to
another subproblem, i.e., they do not share
resources”
Tops Down and Bottoms Up
– Top-down: problem is broken down to subproblems then solved using
memoization to remember the solutions to subproblems already solved.
Top down:
function fib(n)
if n = 0 return 0
if n = 1 return 1
else return fib(n − 1) + fib(n − 2)
Top down with memoization (not memorization)
var m := map(0 → 1, 1 → 1)
function fib(n)
if map m does not contain key n
m[n] := fib(n − 1) + fib(n − 2)
return m[n]
– Bottom-up: all subproblems must be solved in advance to build solutions to
larger problems
function fib(n)
var previousFib := 0, currentFib := 1
repeat n − 1 times
var newFib := previousFib + currentFib
previousFib := currentFib
currentFib := newFib
return currentFib
Biological Sequence Matching Problems 1
• DNA
– Two strands
– Four letter alphabet (four bases)
– Base pairing rules
– Strands are directional and, within a gene, only one
strand is translated
• RNA
– Functional or intermediate step of protein
manufacturing
– Four letter alphabet
• Proteins
– 20 letter alphabet
Biological Sequence Matching Problems 2
• Applications
– Identify strains of viruses, bacteria
– Identify genes (hair, skin, eye color, height) and
genetic basis for diseases (lethal or susceptibility to
cancer, etc.)
– Identify evolutionary relationships
• Dynamic programming is the basis of BLAST
(Basic Local Alignment Search Tool) – in top 3
of most cited papers in recent bioscience
history (was #1 in 1990s)
Sequence Alignment Algorithm 1
AGGCGGATC
TAGCATCTAC
-AGGCGGATC--TAG-C--ATCTAC
Given two strings
x = x1x2...xM,
y = y1y2…yN,
Find the alignment with maximum score
F = (# matches)  m - (# mismatches)  s – (#gaps)  d
Sequence Alignment Algorithm 2
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
There are > 2N possible alignments.
Sequence Alignment Algorithm 3
Note:
The score of aligning
x1……xM
y1……yN
is additive
Say that
aligns to
x1…xi
y1…yj
xi+1…xM
yj+1…yN
Add the two scores:
F(x1…xM, y1…yN) = F(x1…xi, y1...yj) + F(xi+1…xm, yj+1…yN)
Sequence Alignment Algorithm 4
• Original problem
– Align x1…xM to y1…yN
• Divide into a finite number of subproblems (non-overlapping for efficiency)
– Align x1…xi to y1…yj
• Subdivide the subproblem and construct the solution from smaller
subproblems
• Classic problem type for dynamic programming
Let
•
F(i, j) = optimal score of aligning
x1……xi
y1……yj
F is the “matrix” or “table” or “program.”
Hence the term “dynamic programming.”
Sequence Alignment Algorithm 5
F = (# matches)  m - (# mismatches)  s – (# gaps)  d
F(i, j) calculated with scoring function s(xi, yj) or gap function g
Three cases:
Scoring function s(xi, yj)
diagonal move
1.
2.
3.
xi aligns to yj
x1……xi-1 xi
y1……yj-1 yj
m, if xi = yj
F(i, j) = F(i – 1, j – 1) +
-s, if not
xi aligns to a gap
x1……xi-1 xi
y1……yj -
horizontal move
yj aligns to a gap
x1……xi y1……yj-1 yj
vertical move
F(i, j) = F(i – 1, j) – d
F(i, j) = F(i, j – 1) – d
Gap function
Sequence Alignment Algorithm 6
How do we choose the case for each matrix position?
Assume that the subproblems are solved:
F(i, j – 1), F(i – 1, j), F(i – 1, j – 1)
are optimal
Therefore,
F(i, j) = max
Where
F(i – 1, j – 1) + s(xi, yj)
F(i – 1, j) – d
F(i, j – 1) – d
s(xi, yj) = m, if xi = yj;
-s, if not
Sequence Alignment Algorithm 7
Set d = 1, m = 1, s = -0.5
F(i, j) = max
Where
F(i – 1, j – 1) + s(xi, yj)
F(i – 1, j) – 1
F(i, j – 1) – 1
s(xi, yj) = 1, if xi = yj
-0.5, if not
A
T
C
G
0
-1
-2
-3
-4
A
A
A
-1
1
0
-1
-2
T
T
T
-2
0
2
1
0
—
C
G
-3
-1
1
1.5
2
G
G
Needleman-Wunsch Algorithm 1:
Finds Global Optimal Alignment
1.
Initialization
a.
b.
c.
2.
F(0, 0) = 0
F(0, j) = - j  d
F(i, 0) = - i  d
Main Iteration Filling-in partial alignments
a.
For each i = 1……M
For each j = 1……N
F(i, j)
Ptr(i, j)
3.
= max
=
F(i – 1,j – 1) + s(xi, yj) [case 1]
F(i – 1, j) – d
[case 2]
F(i, j – 1) – d
[case 3]
%
!
#
if [case 1]
if [case 2]
if [case 3]
Termination
F(M, N) is the optimal score, and from Ptr(M, N) can trace back
optimal alignment
Needleman-Wunsch Algorithm 2
Initialization
F(0, 0)
F(0, j)
F(i, 0)
F(i, j) = max
%
Ptr(i, j) = !
#
= 0
=-jd
=-id
(1) F(i – 1,j – 1) + s(xi, yj)
(2) F(i – 1, j) – d
(3) F(i, j – 1) – d
(1)
(2)
(3)
A
T
C
G
A
T
C
G
0
-1
-2
-3
-4
A
-1
1
0
-1
-2
T
-2
0
2
1
0
G
-3
-1
1
1.5
2





A

%
#
#
#
T

!
%
#
#
G

!
!
%
%
A
A
T
T
—
C
G
G
Smith-Waterman Algorithm 1:
Finds local optimal alignment(s)
Ignore poorly aligned regions
1.
Initialization
a.
b.
c.
2.
F(0, 0) = 0
F(0, j) = 0
F(i, 0) = 0
Main Iteration Filling-in partial alignments
a.
For each i = 1……M
For each j = 1……N
F(i, j)
Ptr(i, j)
3.
= max
=
0
F(i – 1,j – 1) + s(xi, yj) [case 1]
F(i – 1, j) – d
[case 2]
F(i, j – 1) – d
[case 3]
%
!
#
if [case 1]
if [case 2]
if [case 3]
Termination
F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment
Smith-Waterman Algorithm 2
Initialization
F(0, 0)
F(0, j)
F(i, 0)
F(i, j) = max
%
Ptr(i, j) = !
#
= 0
= 0
= 0
(1) F(i – 1,j – 1) + s(xi, yj)
(2) F(i – 1, j) – d
(3) F(i, j – 1) – d
(1)
(2)
(3)
A
T
C
G
A
T
C
G
0
0
0
0
0
A
0
1
0
-0.5
-1
T
0
0
2
1
0
G
0
-0.5
1
1.5
2





A

%
#
%
!
T

!
%
#

G

%
!
%
%
A
A
T
T
—
C
G
G
Smith-Waterman Algorithm 3
G
G
C
G
A
C
C
T
A
C
0
0
0
0
0
0
0
0
0
0
0
A
G
G
G
G
C
C
G
T
A
A
C
T
C
C
T
A
A
C
C
C
T
A
0
0
0
0
0
1
0
0
0
1
0
G
0
1
1
0
1
0
0
0
0
0
0
G
0
1
2
1
1
0
0
0
0
0
0
C
0
0
1
3
2
1
1
1
0
0
1
T
0
0
0
2
2
1
0
0
2
1
0
A
0
0
0
1
1
3
2
1
1
3
2
T
0
0
0
0
0
2
2
1
2
2
2
C
0
0
0
1
0
1
3
3
2
1
3
A
0
0
0
0
0
1
2
2
2
3
2
C
0
0
0
1
0
0
2
3
2
2
4
C
0
0
0
1
0
0
1
3
2
1
3
T
0
0
0
0
0
0
0
2
4
3
2
Smith-Waterman Algorithm 4
G
G
C
G
A
C
C
T
A
C
0
0
0
0
0
0
0
0
0
0
0
A
0
0
0
0
0
1
0
0
0
1
0
G
0
1
1
0
1
0
0
0
0
0
0
G
0
1
2
1
1
0
0
0
0
0
0
C
0
0
1
3
2
1
1
1
0
0
1
T
0
0
0
2
2
1
0
0
2
1
0
A
0
0
0
1
1
3
2
1
1
3
2
T
0
0
0
0
0
2
2
1
2
2
2
C
0
0
0
1
0
1
3
3
2
1
3
A
0
0
0
0
0
1
2
2
2
3
2
C
0
0
0
1
0
0
2
3
2
2
4
C
0
0
0
1
0
0
1
3
2
1
3
T
0
0
0
0
0
0
0
2
4
3
2
A
—
G
G
G
G
C
C
T
—
A
—
T
—
C
G
A
A
C
C
C
C
T
T
—
A
—
C
Overlap Detection 1
• When searching for matches of a short string in database of
long strings, we don’t want to penalize overhangs
x
x
y
y
x1 …………………… xM
y1 ………… yN
y1 ………………… yN
x1 …………………… xM
Overlap Detection 2
x
y
y1 ………… yN
y1 ………………… yN
x 1 …………………… x M
F(i, 0) = max
F(i, j) = max
x
y
x 1 …………………… x M
F(i – 1, 0)
F(i – 1, m) – T
F(i – 1,j – 1) + s(xi, yj)
F(i – 1, j) – d
F(i, j – 1) – d
x
Overlap Detection 3
y
y1 ………… yN
x 1 …………………… x M
F(i, 0) = max
Needleman-Wunsch G
with
G
Overlap Detection T
T
Smith-Waterman
with
Overlap Detection
F(i – 1, 0)
F(i – 1, m) – T
G
G
T
T
T
G
0
0
-0.5 1
-1 0.5
-0.5 -0.5
-1.5 -1
G
0
1
2
1
0
T
0
0
1
3
2
0
F(i – 1,j – 1) + s(xi, yj)
F(i – 1, j) – d
F(i, j – 1) – d
F(i, j) = max
0
-1
-2
-3
-4
A
0
-0.5
-1.5
-2.5
-3.5
A
T
A
0 0.5 1
-0.5 -0.5 0
0
-1 -1
2
1
0
2.5 3
2
G
1
2
1
0
1
G
1
2
3
2
1
T
1
1
2
4
3
T
1
0.5
1
3
5
A
3
2
1
2
4
A
3
2.5
1.5
1
3
0
0
0
0
0
A
0
0
0
0
0
T
0
0
0
1
1
G
0
1
1
0
0
G
0
1
2
1
0
T
0
0
1
3
2
A
0
0
0
2
2
T
0
0
0
1
3
A
1
0
0
0
2
G
1
2
1
0
1
G
1
2
3
2
1
T
1
1
2
4
3
T
1
0
1
3
5
A
3
2
1
2
4
A
3
2
1
1
3
A
T
G
G
G
G
T
T
A
T
T
A
G
G
G
G
T
T
T
T
A
A
Bounded Dynamic Programming
y1 ……………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
F(i – 1, j – 1)+ s(xi, yj)
F(i, j) = max F(i, j – 1) – d, if j > i – k(N)
F(i – 1, j) – d, if j < i + k(N)
k(N)
Termination:
same
Largest Common Subsequence 1
1.
Initialization
a.
b.
c.
2.
F(0, 0) = 0
F(0, j) = 0
F(i, 0) = 0
Main Iteration
a.
For each i = 1……M
For each j = 1……N
F(i, j)
Ptr(i, j)
3.
= max
F(i – 1,j – 1) + 1, if xi = yj [case 1]
F(i – 1, j), if not(xi = yj)
[case 2]
F(i, j – 1), if not(xi = yj)
[case 3]
=
%
!
#
if [case 1]
if [case 2]
if [case 3]
Termination
F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal
alignment
Largest Common Subsequence 2
Initialization
F(0, 0)
F(0, j)
F(i, 0)
F(i, j) = max
%
Ptr(i, j) = !
#
= 0
= 0
= 0
(1) F(i – 1,j – 1) + 1, if xi = yj
(2) F(i – 1, j), if not(xi = yj)
(3) F(i, j – 1), if not(xi = yj)
(1)
(2)
(3)
A
T
C
G
A
T
C
G
0
0
0
0
0
A
0
1
1
1
1
T
0
1
2
2
2
G
0
1
2
2
3





A

%
#
#
#
T

!
%
#
#
G

!
!
#
%
A
A
T
T
—
C
G
G
Largest Common Subsequence 3
Cormen: error on page 353
0
0
0
0
0
0
0
0
B
0
0
1
1
1
1
1
1
D
0
0
1
1
1
2
2
2
C
0
0
1
2
2
2
2
2
A
0
1
1
2
2
2
3
3
B
0
1
2
2
3
3
3
4
A
0
1
2
2
3
3
4
4








B

#
%
#
%
#
#
%
D

#
!
#
#
%
#
#
C

#
!
%
#
#
#
#
A

%
#
!
#
#
%
#
B

!
%
#
%
#
#
%
A

%
!
#
!
#
%
#
Corrected (to obtain figure 15.6)
m = length[X]
n = length[Y]
for i = 1 to m
do c[i,0] = 0
for j = 0 to n
do c[0,j] = 0
for i = 1 to m
for j = 1 to n
if xi = yj then
c[i,j] = c[i-1, j-1] + 1]
b[i,j] = “%”
else if c[i-1, j] > c[i, j-1] then
c[i,j] = c[i-1, j]
b[i,j] = “!”
else
c[i,j] = c[i, j-1]
b[i,j] = “#”
return c and b
A
B
C
B
D
A
B
A
B
C
B
D
A
B
Performance
• Running Time: O(mn) + O(m+n) for output
• Storage: O(mn)
– Possible to eliminate backpointer matrix for some problems
• Improvements
–
–
–
–
–
Overlap detection
Partitioning: Find local alignments to seed global alignment
Bounded DP
Gap opening vs. gap extension
Biochemically significant scoring function
Sources
Altschul, S.F., et al. Basic Local Alignment Search Tool. J. Molec. Biol. 215(3): 403-10,
1990.
Bellman, Richard. Dynamic Programming. Princeton University Press, Princeton: 1957.
Cormen et al. Introduction to Algorithms. MIT Press, Cambridge: 2001.
Dreyfus, Stuart. 2002. Richard Bellman on the birth of dynamic programming.
Operations Research 50: 48-51.
Durbin et al. Biological Sequence Analysis: Probabilistic models of proteins and nucleic
acids. Cambridge University Press, New York: 1998.
Gotoh, O. 1982. An improved algorithm for matching biological sequences. Journal of
Molecular Biology 162: 705-708.
Gusfield, Dan. Algorithms on Strings, Trees, and Sequences, Cambridge University
Press, New York: 1997.
Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search
for similarities in the amino acid sequence of two proteins. Journal of Molecular
Biology 48: 443-453.
Preiss. B.R. Data Structures and Algorithms with Object-Oriented Design Patterns in
C#.
Smith, T. F. and Waterman, M.S. 1981. Identification of common molecular
subsequences. Journal of Molecular Biology 147: 195-197.
Wikipedia
Sequence Alignment Algorithm X
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Given two strings
x = x1x2...xM,
y = y1y2…yN,
Find the alignment with maximum score
F = (# matches)  m - (# mismatches)  s – (#gaps)  d