Dynamic Programming

Dynamic Programming
• Break up a problem into a series of overlapping
sub-problems, and build up solutions to larger
and larger sub-problems.
• History:
 Bellman pioneered the systematic study of dynamic
programming in the 1950s.
• Dynamic programming = planning over time.
• Secretary of Defense was hostile to mathematical research.
• Bellman sought an impressive name to avoid confrontation.
• – "it's impossible to use dynamic in a pejorative sense”
“something not even a Congressman could object to”
• Fibonnaci sequence
 F(n)=F(n-1)+F(n-2)
• F(0)=0; F(1)=1
Sequence Alignment
Evolution at the DNA level
Deletion
Mutation
…ACGGTGCAGTTACCA…
…AC----CAGTCCACCA…
REARRANGEMENTS
Inversion
Translocation
Duplication
SEQUENCE EDITS
Sequence conservation implies function
Alignment is the key to
• Finding important regions
• Determining function
• Uncovering the evolutionary forces
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Definition
Given two strings
x = x1x2...xM,
y = y1y2…yN,
an alignment is an assignment of gaps to positions
0,…, N in x, and 0,…, N in y, so as to line up each
letter in one sequence with either a letter, or a gap
in the other sequence
Scoring Function
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
Scoring Function:
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
Scoring Function:
Match:
+m
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
Scoring Function:
Match:
+m
Mismatch: -s
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
Scoring Function:
Match:
+m
Mismatch: -s
Gap:
-d
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
Scoring Function:
Match:
+m
Mismatch: -s
Gap:
-d
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function
•
Sequence edits:
 Mutations
 Insertions
 Deletions
AGGCCTC
AGGACTC
AGGGCCTC
AGG.CTC
Scoring Function:
Match:
+m
Mismatch: -s
Gap:
-d
Score F = (# matches) × m - (# mismatches) × s – (#gaps) × d
How do we compute the best alignment?
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
How do we compute the best alignment?
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
How do we compute the best alignment?
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
Too many possible
alignments:
O( 2M+N)
Dynamic Programming
Dynamic Programming
• We will now describe a dynamic programming
algorithm
Suppose we wish to align
Dynamic Programming
• We will now describe a dynamic programming
algorithm
Suppose we wish to align
x1……xM
y1……yN
Let
F(i,j) = optimal score of aligning
Dynamic Programming
• We will now describe a dynamic programming
algorithm
Suppose we wish to align
x1……xM
y1……yN
Let
F(i,j) = optimal score of aligning
x1……xi
Dynamic Programming
• We will now describe a dynamic programming
algorithm
Suppose we wish to align
x1……xM
y1……yN
Let
F(i,j) = optimal score of aligning
x1……xi
y1……yj
Dynamic Programming (cont’d)
Notice three possible cases:
1.
xi aligns to yj
x1……xi-1 xi
y1……yj-1 yj
2.
xi aligns to a gap
x1……xi-1 xi
y1……yj
3.
-
yj aligns to a gap
x1……xi
-
y1……yj-1 yj
Dynamic Programming (cont’d)
Notice three possible cases:
1.
xi aligns to yj
m, if xi = yj
x1……xi-1 xi
y1……yj-1 yj
2.
xi aligns to a gap
x1……xi-1 xi
y1……yj
3.
-
yj aligns to a gap
x1……xi
-
y1……yj-1 yj
F(i,j) = F(i-1, j-1) +
-s, if not
Dynamic Programming (cont’d)
Notice three possible cases:
1.
xi aligns to yj
m, if xi = yj
x1……xi-1 xi
F(i,j) = F(i-1, j-1) +
-s, if not
y1……yj-1 yj
2.
xi aligns to a gap
x1……xi-1 xi
y1……yj
3.
-
yj aligns to a gap
x1……xi
-
y1……yj-1 yj
F(i,j) = F(i-1, j) - d
Dynamic Programming (cont’d)
Notice three possible cases:
1.
xi aligns to yj
m, if xi = yj
x1……xi-1 xi
F(i,j) = F(i-1, j-1) +
-s, if not
y1……yj-1 yj
2.
xi aligns to a gap
x1……xi-1 xi
y1……yj
3.
-
yj aligns to a gap
x1……xi
F(i,j) = F(i-1, j) - d
F(i,j) = F(i, j-1) - d
-
y1……yj-1 yj
Dynamic Programming (cont’d)
Dynamic Programming (cont’d)
• How do we know which case is correct?
Dynamic Programming (cont’d)
• How do we know which case is correct?
Inductive assumption:
F(i, j-1), F(i-1, j), F(i-1, j-1)
are optimal
Then,
F(i-1, j-1) + s(xi, yj)
Dynamic Programming (cont’d)
• How do we know which case is correct?
Inductive assumption:
F(i, j-1), F(i-1, j), F(i-1, j-1)
are optimal
Then,
F(i-1, j-1) + s(xi, yj)
F(i, j) = max
F(i-1, j) – d
Dynamic Programming (cont’d)
• How do we know which case is correct?
Inductive assumption:
F(i, j-1), F(i-1, j), F(i-1, j-1)
are optimal
Then,
F(i-1, j-1) + s(xi, yj)
F(i, j) = max
F(i-1, j) – d
F( i, j-1) – d
Dynamic Programming (cont’d)
• How do we know which case is correct?
Inductive assumption:
F(i, j-1), F(i-1, j), F(i-1, j-1)
are optimal
Then,
F(i-1, j-1) + s(xi, yj)
F(i, j) = max
F(i-1, j) – d
F( i, j-1) – d
Dynamic Programming (cont’d)
• How do we know which case is correct?
Inductive assumption:
F(i, j-1), F(i-1, j), F(i-1, j-1)
are optimal
Then,
F(i-1, j-1) + s(xi, yj)
F(i, j) = max
F(i-1, j) – d
F( i, j-1) – d
Where
s(xi, yj) = m, if xi = yj;
-s, if not
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
1
A
-1
1
0
-1
-2
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Example
x = AGTA
m= 1
y = ATA
s = -1
F(i,j)
i=0
j=0
1
2
3
4
A
G
T
A
0
-1
-2
-3
-4
-1
1
0
-1
-2
1
A
2
T
-2
0
0
1
0
3
A
-3
-1
-1
0
2
d = -1
Optimal Alignment:
F(4,3) = 2
AGTA
A - TA
The Needleman-Wunsch Matrix
x1 ………………………………… xM
y1 ……………………………… yN
The Needleman-Wunsch Matrix
x1 ………………………………… xM
y1 ……………………………… yN
Every nondecreasing
path
from (0,0) to (M, N)
corresponds to
an alignment
of the two sequences
The Needleman-Wunsch Matrix
x1 ………………………………… xM
y1 ……………………………… yN
Every nondecreasing
path
from (0,0) to (M, N)
corresponds to
an alignment
of the two sequences
The Needleman-Wunsch Matrix
x1 ………………………………… xM
y1 ……………………………… yN
Every nondecreasing
path
from (0,0) to (M, N)
corresponds to
an alignment
of the two sequences
The Needleman-Wunsch Matrix
x1 ………………………………… xM
y1 ……………………………… yN
Every nondecreasing
path
from (0,0) to (M, N)
corresponds to
an alignment
of the two sequences
An optimal alignment is composed
of optimal subalignments
The Needleman-Wunsch Algorithm
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
i = 1……M
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
= max
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
[case 2]
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
= max
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
F(i, j-1) – d
[case 2]
[case 3]
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
Ptr(i,j)
= max
=
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
F(i, j-1) – d
[case 2]
[case 3]
DIAG,
LEFT,
if [case 1]
if [case 2]
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
Ptr(i,j)
= max
=
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
F(i, j-1) – d
[case 2]
[case 3]
DIAG,
LEFT,
if [case 1]
if [case 2]
UP,
if [case 3]
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
Ptr(i,j)
3.
= max
=
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
F(i, j-1) – d
[case 2]
[case 3]
DIAG,
LEFT,
if [case 1]
if [case 2]
UP,
if [case 3]
Termination. F(M, N) is the optimal score, and
The Needleman-Wunsch Algorithm
1.
Initialization.
a.
b.
c.
2.
F(0, 0) = 0
F(0, j)
F(i, 0)
=-j×d
=-i×d
Main Iteration. Filling-in partial alignments
a.
For each
For each
i = 1……M
j = 1……N
F(i, j)
Ptr(i,j)
3.
= max
=
F(i-1,j-1) + s(xi, yj)
[case 1]
F(i-1, j) – d
F(i, j-1) – d
[case 2]
[case 3]
DIAG,
LEFT,
if [case 1]
if [case 2]
UP,
if [case 3]
Termination. F(M, N) is the optimal score, and
from Ptr(M, N) can trace back optimal alignment
Performance
• Time:
O(NM)
• Space:
O(NM)
Bounded Dynamic
Programming
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
yj
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
yj
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
yj
We can align x and y more efficiently:
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
yj
We can align x and y more efficiently:
Bounded Dynamic
Programming
Assume we know that x and y are very similar
Assumption: # gaps(x, y) < k(N)
xi
Then,
|
implies | i – j | < k(N)
yj
We can align x and y more efficiently:
Time, Space: O(N × k(N)) << O(N2)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
k(N)
Initialization:
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
k(N)
F(i,0), F(0,j) undefined for i, j > k
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
k(N)
F(i – 1, j – 1)+ s(xi, yj)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
k(N)
F(i – 1, j – 1)+ s(xi, yj)
F(i, j) = max
F(i, j – 1) – d, if j > i – k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
k(N)
F(i, j) = max
F(i, j – 1) – d, if j > i – k(N)
F(i – 1, j – 1)+ s(xi, yj)
F(i – 1, j) – d, if j < i + k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
k(N)
F(i, j) = max
F(i, j – 1) – d, if j > i – k(N)
F(i – 1, j – 1)+ s(xi, yj)
F(i – 1, j) – d, if j < i + k(N)
Bounded Dynamic Programming
y1 ………………………… yN
x1 ………………………… xM
Initialization:
F(i,0), F(0,j) undefined for i, j > k
Iteration:
For i = 1…M
For j = max(1, i – k)…min(N, i+k)
F(i, j) = max
F(i, j – 1) – d, if j > i – k(N)
F(i – 1, j – 1)+ s(xi, yj)
F(i – 1, j) – d, if j < i + k(N)
k(N)
Termination:
same
A variant of the basic algorithm:
• Maybe it is OK to have an unlimited # of gaps in
the beginning and end:
----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC
GCGAGTTCATCTATCAC--GACCGC--GGTCG--------------
• Then, we don’t want to penalize gaps in the
ends
The local alignment problem
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum
e.g. x = aaaacccccgggg
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum
e.g. x = aaaacccccgggg
y = cccgggaaccaacc
The local alignment problem
Given two strings
x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum
e.g. x = aaaacccccgggg
y = cccgggaaccaacc
Why local alignment
• Genes are shuffled between genomes
• Portions of proteins (domains) are often conserved
The Smith-Waterman algorithm
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
0
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
Iteration:
F(i, j) = max
0
F(i – 1, j) – d
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
Iteration:
F(i, j) = max
0
F(i – 1, j) – d
F(i, j – 1) – d
The Smith-Waterman algorithm
Idea: Ignore badly aligning regions
Modifications to Needleman-Wunsch:
Initialization: F(0, j) = F(i, 0) = 0
Iteration:
F(i, j) = max
0
F(i – 1, j) – d
F(i, j – 1) – d
F(i – 1, j – 1) + s(xi, yj)
Local Alignment example
s
-
A
G
C
T
-
-2
-2
-2
-2
T
A
-2
2
-1
-1
-1
G
G
-2
-1
2
-1
-1
G
C
-2
-1
-1
2
-1
C
T
-2
-1
-1
-1
2
A
Scoring Matrix
A
t
0
A
G
C
A
T
Scoring the gaps more
accurately γ(n)
Scoring the gaps more
accurately γ(n)
Current model:
Scoring the gaps more
accurately γ(n)
Current model:
Gap of length n
incurs penalty
n×d
Scoring the gaps more
accurately γ(n)
Current model:
Gap of length n
incurs penalty
n×d
However, gaps usually occur in bunches
Scoring the gaps more
accurately γ(n)
Current model:
Gap of length n
incurs penalty
n×d
However, gaps usually occur in bunches
Convex gap penalty function:
Scoring the gaps more
accurately γ(n)
Current model:
Gap of length n
incurs penalty
n×d
However, gaps usually occur in bunches
Convex gap penalty function:
γ(n):
Scoring the gaps more
accurately γ(n)
Current model:
Gap of length n
incurs penalty
n×d
However, gaps usually occur in bunches
Convex gap penalty function:
γ(n):
for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1)
γ(n)
Scoring schemes
• Commonly used scoring schemes
 BLAST – match +5; mismatch -4
• Handling gaps
 Affine gap scores  Cost-free end gaps
• Used in shotgun sequence assembly
• Scoring matrices for protein alignment
 PAM; BLOSUM
Compromise: affine gaps
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
d
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
G(i, j):
score if xi aligns to a gap after yj
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
G(i, j):
H(i, j):
score if xi aligns to a gap after yj
score if yj aligns to a gap after xi
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
G(i, j):
H(i, j):
score if xi aligns to a gap after yj
score if yj aligns to a gap after xi
e
Compromise: affine gaps
γ(n)
γ(n) = d + (n – 1)×e
|
|
gap
gap
open
extend
d
To compute optimal alignment,
At position i, j, need to “remember” best score if gap is open
best score if gap is not open
F(i, j):
score of alignment x1…xi to y1…yj
if xi aligns to yj
G(i, j):
H(i, j):
score if xi aligns to a gap after yj
score if yj aligns to a gap after xi
V(i, j) = best score of alignment x1…xi to y1…yj
e
Needleman-Wunsch with affine gaps
Why do we need matrices F, G, H?
•
xi aligns to yj
x1……xi-1 xi xi+1
y1……yj-1 yj
•
-
xi aligns to a gap after yj
x1……xi-1 xi xi+1
y1……yj …-
-
Needleman-Wunsch with affine gaps
Why do we need matrices F, G, H?
•
xi aligns to yj
x1……xi-1 xi xi+1
y1……yj-1 yj
•
Add -d
-
xi aligns to a gap after yj
x1……xi-1 xi xi+1
y1……yj …-
-
Add -e
Needleman-Wunsch with affine gaps
Why do we need matrices F, G, H?
•
xi aligns to yj
x1……xi-1 xi xi+1
y1……yj-1 yj
•
-
Add -d
G(i+1, j) = F(i, j) – d
xi aligns to a gap after yj
x1……xi-1 xi xi+1
y1……yj …-
-
Add -e
Needleman-Wunsch with affine gaps
Why do we need matrices F, G, H?
•
xi aligns to yj
x1……xi-1 xi xi+1
y1……yj-1 yj
•
-
Add -d
G(i+1, j) = F(i, j) – d
xi aligns to a gap after yj
x1……xi-1 xi xi+1
y1……yj …-
-
Add -e
G(i+1, j) = G(i, j) – e
Needleman-Wunsch with affine gaps
Because, perhaps
Why do we need matrices F, G, H?
G(i, j) < V(i, j)
•
•
(it is best to align xi to yj if we were aligning
xi aligns
to yj
only x1…xi to y1…yj and not the rest of x, y),
x1……xi-1
but on the contrary
xi xi+1
……y
G(i, j) – ey>1V(i,
j) – dj-1
yj
-
Add -d
G(i+1, j) = F(i, j) – d
(i.e., had we “fixed” our decision that xi aligns
to yj, we could regret it at the next step when
xi aligns
to a gap after yj
aligning x1…xi+1 to y1…yj)
x1……xi-1 xi xi+1
y1……yj …-
-
Add -e
G(i+1, j) = G(i, j) – e
Needleman-Wunsch with affine gaps
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
H(i, j) = max
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
H(i, j) = max
H(i, j – 1) – e
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
H(i, j) = max
H(i, j – 1) – e
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
H(i, j) = max
H(i, j – 1) – e
Termination:
V(i, j) has the best alignment
Needleman-Wunsch with affine gaps
Initialization:
V(i, 0) = d + (i – 1)×e
V(0, j) = d + (j – 1)×e
Iteration:
V(i, j) = max{ F(i, j), G(i, j), H(i, j) }
F(i, j) =
V(i – 1, j – 1) + s(xi, yj)
V(i – 1, j) – d
G(i, j) = max
G(i – 1, j) – e
V(i, j – 1) – d
H(i, j) = max
H(i, j – 1) – e
Termination:
V(i, j) has the best alignment
Time?
Space?
Comments on Optimal Alignment
Algorithms
• Runs in quadratic time
• Using improved algorithm by Hirschfield
can perform optimal alignment in linear
space
• Using Gotoh’s (1982) modification, S-W
and N-W algorithms can handle affine gap
scores
• Sensitive to choice of scoring matrix

Download Report

Dynamic Programming

Paperzz.com

Your Paperzz