Document

Algorithms
Fundamental Techniques2
Dynamic Programming
A Gentle Introduction to Dynamic
Programming – An Interesting History
Invented by a mathematician, Richard
Bellman, in 1950s as a method for optimizing
multistage decision processes.
So the word programming refers to planning
(as in programming for KC), not computer
programming.
Later computer scientists realized it was a
technique that could be applied to problems
that were not special types of optimization
problems.
3
The Basic Idea
The technique solves problems with
overlapping subproblems.
Typically the subproblems arise through a
recursive solution to a problem.
Rather than solve the subproblems
repeatedly, we solve the smaller subproblem
and save the results in a table from which we
can form a solution to the original problem.
Although this suggests a space-time tradeoff,
in reality when you see a dynamic
programming solution you often find you do
not need much extra space if you are careful.
4
Dynamic Programming
An algorithm design technique (like divide and
conquer)
Divide and conquer

Partition the problem into independent
subproblems

Solve the subproblems recursively

Combine the solutions to solve the original
problem
5
A Simple Example
Consider the calculation of the Fibonacci
numbers using the simple recurrence
F(n) = F(n-1) + F(n-2) for n ≥ 2 and the two
initial conditions
F(0) = 0 and
F(1) = 1.
If we blindly use recursion to solve this, we
will be recomputing the same values many
times.
In fact, the recursion tree suggests a simpler
solution:
6
F(5)
F(4)
F(3)
F(2)
F(1)
F(3)
F(2)
F(1) F(1)
F(0)
F(2)
F(1)
F(1)
F(0)
F(0)
So, one solution, a dynamic programming one, would be to keep
an array and record each F(k) as it is computed.
But, we notice we don’t even need to maintain all of the entries,
only the last two. So, in truth, looking at the solution this other
way provides us with a very efficient solution that utilizes only 2
variables for the storage.
Not all problems that fall to dynamic programming are this
simple, but this is a good one to remember of how the technique
7
works.
Dynamic Programming
Used for optimization problems

A set of choices must be made to get an
optimal solution

Find a solution with the optimal value
(minimum or maximum)

There may be many solutions that lead to an
optimal value

Our goal: find an optimal
solution
8
Dynamic Programming Algorithm
1. Characterize the structure of an optimal
solution
2. Recursively define the value of an optimal
solution
3. Compute the value of an optimal solution
in a bottom-up fashion
4. Construct an optimal solution from
computed information (not always
necessary)
9
Longest Common Subsequence (LCS)
Application: comparison of two DNA strings
Ex: X= {A B C B D A B }, Y= {B D C A B A}
Longest Common Subsequence:
X= AB
C
BDAB
Y=
BDCAB A
Brute force algorithm would compare each
subsequence of X with the symbols in Y
10
LCS Algorithm
First we’ll find the length of LCS. Later we’ll modify
the algorithm to find LCS itself.
Define Xi, Yj to be the prefixes of X and Y of length i
and j respectively
Define c[i,j] to be the length of LCS of Xi and Yj
Then the length of LCS of X and Y will be c[m,n]
if x[i]  y[ j ],
c[i  1, j  1]  1
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
11
LCS recursive solution
if x[i]  y[ j ],
c[i  1, j  1]  1
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
When we calculate c[i,j], we consider two cases:
First case: x[i]=y[j]: one more symbol in strings X
and Y matches, so the length of LCS Xi and Yj equals
to the length of LCS of smaller strings Xi-1 and Yi-1 ,
plus 1
12
LCS recursive solution
if x[i]  y[ j ],
c[i  1, j  1]  1
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
Second case: x[i] != y[j]
As symbols don’t match, our solution is not improved,
and the length of LCS(Xi , Yj) is the same as before (i.e.
maximum of LCS(Xi, Yj-1) and LCS(Xi-1,Yj)
Why not just take the length of LCS(Xi-1, Yj-1) ?
13
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n
c[0,j] = 0 // special case: X0
5. for i = 1 to m
// for all Xi
6.
for j = 1 to n
// for all Yj
7.
if ( Xi == Yj )
8.
c[i,j] = c[i-1,j-1] + 1
9.
else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c
14
LCS Example
We’ll see how LCS algorithm works on the following
example:
X = ABCB
Y = BDCAB
What is the Longest Common Subsequence
of X and Y?
LCS(X, Y) = BCB
X=AB
C
B
Y=
BDCAB
15
i
0
1
2
3
4
j
Xi
A
LCS Example (0)
0
1
2
Yj B D C A
B
ABCB
BDCAB
3
4
B
C
B
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
16
i
0
1
2
3
4
LCS Example (1)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0
B 0
C
B
B
0
ABCB
BDCAB
3
4
0
0
for i = 1 to m
for j = 1 to n
c[i,0] = 0
c[0,j] = 0
17
i
0
1
2
3
4
LCS Example (2)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0
B 0
C
B
B
0
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
18
i
0
1
2
3
4
LCS Example (3)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0
B 0
C
B
B
0
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
19
i
0
1
2
3
4
LCS Example (4)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0
C
B
B
0
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
20
i
0
1
2
3
4
LCS Example (5)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0
C
B
B
0
1
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
21
i
0
1
2
3
4
LCS Example (6)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1
C
B
B
0
1
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
22
i
0
1
2
3
4
LCS Example (7)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
B
0
1
ABCB
BDCAB
3
4
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
23
i
0
1
2
3
4
LCS Example (8)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
B
0
1
ABCB
BDCAB
3
4
2
0
0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
24
i
0
1
2
3
4
LCS Example (10)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
B
0
1
ABCB
BDCAB
3
4
2
1
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
25
i
0
1
2
3
4
LCS Example (11)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
B
0
1
ABCB
BDCAB
3
4
2
2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
26
i
0
1
2
3
4
LCS Example (12)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
2
2
B
0
1
ABCB
BDCAB
3
4
2
2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
27
i
0
1
2
3
4
LCS Example (13)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
1
2
2
B
0
1
ABCB
BDCAB
3
4
2
2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
28
i
0
1
2
3
4
LCS Example (14)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
1
1
2
2
2
2
B
0
1
ABCB
BDCAB
3
4
2
2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
29
i
0
1
2
3
4
LCS Example (15)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
1
1
2
2
2
2
B
0
1
ABCB
BDCAB
3
4
2
2
3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1]
30
How to find actual LCS
So far, we have just found the length of LCS, but
not LCS itself.
We want to modify this algorithm to make it output
Longest Common Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
2
2
2
3
For example, here
c[i,j] = c[i-1,j-1] +1 = 2+1=3
31
i
0
1
2
3
4
Finding LCS
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
1
1
2
2
2
2
B
0
1
3
4
2
2
3
32
i
0
1
2
3
4
Finding LCS (2)
j
0
1
2
Yj B D C A
Xi 0 0 0 0 0
A 0 0 0 0 1
B 0 1 1 1 1
C
B
0
0
1
1
1
1
2
2
2
2
B
0
1
3
2
2
3
LCS (reversed order):B C B
LCS (straight order):
B C B
(this string turned out to be a palindrome)33
4
Other examples for using dynamic
programming are: Edit Distance
When a spell checker encounters a possible
misspelling or google is given words it doesn't
recognize, they look in their dictionaries for other
words that are close by.
What is an appropriate note of closeness in this
case?
The edit distance is the minimum number of
edits (insertions, deletions, and substitutions) of
characters needed to transform one string into a
second one.
34
Edit Distance
Define the cost of an alignment to be the number of
columns where the strings differ. We can place a gap, _,
in any string which is like a wildcard.
Example 1: Cost is 3 (insert U, substitute O with N,
delete W.
 S _ N O W Y
 S U N N _ Y
Example 2: Cost is 5
 _ S N O W _ Y
 S U N _ _ N Y
35
The General Dynamic Programming
Technique
Applies to a problem that at first seems to
require a lot of time (possibly exponential),
provided we have:
 Simple subproblems: the subproblems can
be defined in terms of a few variables, such as
j, k, l, m, and so on.
 Subproblem optimality: the global optimum
value can be defined in terms of optimal
subproblems
 Subproblem overlap: the subproblems are
not independent, but instead they overlap
(hence, should be constructed bottom-up).
36
The Knapsack Problem
The 0-1 knapsack problem

A thief rubbing a store finds n items: the i-th item is
worth vi dollars and weights wi pounds (vi, wi
integers)

The thief can only carry W pounds in his knapsack

Items must be taken entirely or left behind

Which items should the thief take to maximize the
value of his load?
The fractional knapsack problem

Similar to above

The thief can take fractions of items
The 0/1 Knapsack Problem
Given: A set S of n items, with each item i having
 bi - a positive benefit
 wi - a positive weight
Goal: Choose items with maximum total benefit but with
weight at most W.
If we are not allowed to take fractional amounts, then
this is the 0/1 knapsack problem.
 In this case, we let T denote the set of items we take
 Objective: maximize
b

iT

Constraint:
i
w  W
iT
i
38
Fractional Knapsack Problem
Knapsack capacity: W
There are n items: the i-th item has value vi and
weight wi
Goal:

find xi such that for all 0  xi  1, i = 1, 2, .., n
 wixi  W and
 xivi is maximum
Fractional Knapsack Problem
Greedy strategy:
 Pick the item with the maximum value per pound vi/wi
 If the supply of that element is exhausted and the thief
can carry more: take as much as possible from the item
with the next greatest value per pound
 It is good to order items based on their value per pound
v1 v2
vn

 ... 
w1 w2
wn
Fractional Knapsack - Example
E.g.:
20
--30
Item 3
50
Item 2
Item 1
10
$60
20
$100
30
$6/pound $5/pound $4/pound
+
50
20 $100
+
10
$120
$80
$60
$240
Example
Given: A set S of n items, with each item i having
 bi - a positive benefit
 wi - a positive weight
Goal: Choose items with maximum total benefit but with
weight at most W.
“knapsack”
Solution:
Items:
Weight:
Benefit:
1
2
3
4
5
4 in
2 in
2 in
6 in
2 in
$20
$3
$6
$25
$80
• 5 ($80, 2 in)
• 3 ($6, 2 in)
• 1 ($20, 4 in)
9 in
42
0-1 Knapsack - Dynamic
Programming
P(i, w) – the maximum profit that can be
obtained from items 1 to i, if the
knapsack has size w
Case 1: thief takes item i
vi + P(i - 1, w-wi)
P(i, w) =
Case 2: thief does not take item i
P(i - 1, w)
P(i, w) =
0-1 Knapsack - Dynamic
Programming
Item i was taken
Item i was not take
P(i, w) = max {vi + P(i - 1, w-wi), P(i - 1, w) }
0
0: 1
0
0
0
i-1 0
i 0
0
n 0
0
0
0
w - wi
0
0
0
w
0
0
0
W
0
first
second
0-1 Knapsack problem:
a picture
Weight
Items
This is a knapsack
Max weight: W = 20
W = 20
Benefit value
wi
bi
2
3
4
3
4
5
5
8
9
10
W = 5Item
Example:
P(i, w) = max {vi + P(i - 1, w-wi), P(i - 1, w) }
0
1
2
3
4
0 1
2 3 4 5
Weight
Value
1
2
12
2
1
10
3
3
20
4
2
15
P(1, 1)
P(0,
= 1) = 0
0
0 12 12 12 12 P(1, 2)
=
max{12+0,
0} = 12
0
10 12 22 22 22 P(1, 3)
max{12+0,
=
0} = 12
0
10 12 22 30 32 P(1, 4)
max{12+0,
=
0}
=
12
0
10 15 25 30 37 P(1, 5)
max{12+0,
0} = 12
=
P(2, 1)=
max{10+0, 0}
P(3,
= 1)=
P(2,1)
10
= 10P(4, 1)=
P(3,1) = 10
P(2, 2)=
P(3,=2)=
P(2,2)
max{10+0, 12}
12 = 12P(4, 2)=
max{15+0,
P(4,
3)=
max{15+10
P(2, 3)=
max{10+12,P(3,
12} 3)=
max{20+0,
= 22
22}=22
P(2, 4)=
max{10+12,P(3,
12} 4)=
max{20+10,22}=30
= 22
P(4, 4)=
max{15+12
max{20+12,22}=32
P(4, 5)=
max{15+22
P(2, 5)=
max{10+12,P(4,
12} 5)=
= 22
0
0
0
0
0
0
Reconstructing the Optimal Solution
0
1
2
3
4
0 1
0
0
0
0
0
0
2 3 4 5
0
0 12
10 12
10 12
10 15
0
12
22
22
25
0
12
22
30
30
0
12
22
32
37
• Item 4
• Item 2
• Item 1
• Start at P(n, W)
• When you go left-up  item i has been taken
• When you go straight up  item i has not been
taken
Huffman Codes
Widely used technique for data compression
Assume the data to be a sequence of
characters
Looking for an effective way of storing the
data
Huffman Codes
Idea:

Use the frequencies of occurrence of characters to
build an optimal way of representing each character
Frequency (thousands)
a
45
b
13
c
12
d
16
e
9
f
5
Binary character code

Uniquely represents a character by a binary string
Fixed-Length Codes
E.g.: Data file containing 100,000 characters
Frequency (thousands)
a
45
b
13
c
12
d
16
e
9
f
5
3 bits needed
a = 000, b = 001, c = 010, d = 011, e = 100, f =
101
Requires: 100,000  3 = 300,000 bits
Variable-Length Codes
E.g.: Data file containing 100,000 characters
Frequency (thousands)
a
45
b
13
c
12
d
16
e
9
f
5
Assign short codewords to frequent characters and
long codewords to infrequent characters
a = 0, b = 101, c = 100, d = 111, e = 1101, f =
1100
(45  1 + 13  3 + 12  3 + 16  3 + 9  4 + 5  4)
1,000
= 224,000 bits
Encoding with Binary Character
Codes
Encoding

Concatenate the codewords representing
each character in the file
E.g.:

a = 0, b = 101, c = 100, d = 111, e =
1101, f = 1100

abc = 0  101  100 = 0101100
Decoding with Binary Character
Codes
Prefix codes simplify decoding
 No codeword is a prefix of another  the codeword
that begins an encoded file is unambiguous
Approach
 Identify the initial codeword
 Translate it back to the original character
 Repeat the process on the remainder of the file
E.g.:

a = 0, b = 101, c = 100, d = 111, e = 1101, f =
1100

0 0
001011101 =

101 1101
=
aabe
Optimal Codes
An optimal code is always represented by a full
binary tree
 Every non-leaf has two children
 Fixed-length code is not optimal, variablelength is B(T )   f (c)dT (c)
cC
How many bits are required to encode a file?
 Let C be the alphabet of characters
 Let f(c) be the frequency of character c
 Let dT(c) be the depth of c’s leaf in the tree T
corresponding to a prefix code the cost of tree T
Prefix Code Representation
Binary tree whose leaves are the given characters
Binary codeword

the path from the root to the character, where 0 means “go to
the left child” and 1 means “go to the right child”
Length of the codeword

Length of the path from root to the character leaf (depth of
node)
86
0
100
1
14
1
0
0
58
28
14
0 1 0 1 0 1
a: 45
b: 13c: 12d: 16e: 9 f: 5
100
0
1
a: 45 55
0
1
30
25
0 1 0 1
c: 12b: 13 14 d: 16
0 1
f: 5 e: 9
Constructing a Huffman Code
A greedy algorithm that constructs an optimal prefix code
called a Huffman code
Assume that:

C is a set of n characters

Each character has a frequency f(c)

The tree T is built in a bottom up manner
Idea:



Start with a set of |C| leaves
f: 5e: 9c: 12
b: 13
d: 16
a: 45
At each step, merge the two least frequent objects: the frequency
of the new node = sum of two frequencies
Use a min-priority queue Q, keyed on f to identify the two least
frequent objects
Example
b: 13 14 d: 16
a: 45
f: 5e: 9c: 12
b: 13
d: 16
a: 45 c: 12
0 1
f: 5e: 9
14 d: 16 25 a: 45
30 a: 45
25
0 1
0 1
1
0
0
1
f: 5e: 9 c: 12
b: 13
c: 12
b: 13 14 d: 16
0 1
f: 5e: 9
a: 45
55
0 100 1
1
0
a: 45
55
30
25
0
1
0 1
0 1
30
25
c: 12
b: 13 14 d: 16
0 1
0 1
d: 16
1
c:
12
b
:
13
0
14
f: 5e: 9
0 1
f: 5e: 9
Building a Huffman Code
Running time: O(nlgn)
Alg.: HUFFMAN(C)
1. n  C 
O(n)
2. Q  C
3. for i  1 to n – 1
4.
do allocate a new node z
5.
left[z]  x  EXTRACT-MIN(Q)
O(nlgn)
6.
right[z]  y  EXTRACT-MIN(Q)
7.
f[z]  f[x] + f[y]
8.
INSERT (Q, z)
9. return EXTRACT-MIN(Q)
Greedy Choice Property
Lemma: Let C be an alphabet in which each
character c  C has frequency f[c]. Let x and
y be two characters in C having the lowest
frequencies.
Then, there exists an optimal prefix code for
C in which the codewords for x and y have
the same length and differ only in the last bit.
Discussion
Greedy choice property:

Building an optimal tree by mergers can begin
with the greedy choice: merging the two
characters with the lowest frequencies

The cost of each merger is the sum of frequencies
of the two items being merged

Of all possible mergers, HUFFMAN chooses the
one that incurs the least cost