Data Structures and Algorithm Analysis

Data Structures and
Algorithm Analysis
Algorithm Design Techniques
Lecturer: Ligang Dong
Email: [email protected]
Tel: 28877721,13306517055
Office: SIEE Building 305
The Greedy Approach

Greedy algorithms are usually designed to solve
optimization problems in which a quantity is to be
minimized or maximized.


Greedy algorithms typically consist of a n iterative
procedure that tries to find a local optimal solution.
In some instances, these local optimal solutions
translate to global optimal solutions. In others, they
fail to give optimal solutions.
The Greedy Approach



A greedy algorithm makes a correct guess on the
basis of little calculation without worrying about the
future. Thus, it builds a solution step by step. Each
step increases the size of the partial solution and is
based on local optimization.
The choice make is that which produces the largest
immediate gain while maintaining feasibility.
Since each step consists of little work based on a
small amount of information, the resulting algorithms
are typically efficient.
The Fractional Knapsack
Problem

Given n items of sizes s1, s2, …, sn, and values v1,
v2, …, vn and size C, the knapsack capacity, the
objective is to find nonnegative real numbers x1,
x2, …, xn that maximize the sum
n
xv
i 1
i i
subject to the constraint
n
x s
i 1
i i
C
The Fractional Knapsack
Problem

This problem can easily be solved using the
following greedy strategy:
For each item compute yi=vi/si, the ratio of its
value to its size.
 Sort the items by decreasing ratio, and fill the
knapsack with as much as possible from the first
item, then the second, and so forth.


This problem reveals many of the characteristics
of a greedy algorithm discussed above: The
algorithm consists of a simple iterative procedure
that selects that item which produces that largest
immediate gain while maintaining feasibility.
File Compression


Suppose we are given a file, which is a string of
characters. We wish to compress the file as much as
possible in such a way that the original file can easily
be reconstructed.
Let the set of characters in the file be C={c1, c2, …,
cn}. Let also f(ci), 1in, be the frequency of
character ci in the file, i.e., the number of times ci
appears in the file.
File Compression


Using a fixed number of bits to represent each
character, called the encoding of the character, the
size of the file depends only on the number of
characters in the file.
Since the frequency of some characters may be
much larger than others, it is reasonable to use
variable length encodings.
File Compression



Intuitively, those characters with large frequencies
should be assigned short encodings, whereas long
encodings may be assigned to those characters with
small frequencies.
When the encodings vary in length, we stipulate that
the encoding of one character must not be the prefix
of the encoding of another character; such codes
are called prefix codes.
For instance, if we assign the encodings 10 and 101
to the letters “a” and “b”, there will be an ambiguity
as to whether 10 is the encoding of “a” or is the
prefix of the encoding of the letter “b”.
File Compression


Once the prefix constraint is satisfied, the decoding
becomes unambiguous; the sequence of bits is
scanned until an encoding of some character is
found.
One way to “parse” a given sequence of bits is to
use a full binary tree, in which each internal node
has exactly two branches labeled by 0 an 1. The
leaves in this tree corresponding to the characters.
Each sequence of 0’s and 1’s on a path from the
root to a leaf corresponds to a character encoding.
File Compression


The algorithm presented is due to Huffman.
The algorithm consists of repeating the following
procedure until C consists of only one character.



Let ci and cj be two characters with minimum
frequencies.
Create a new node c whose frequency is the sum
of the frequencies of ci and cj, and make ci and cj
the children of c.
Let C=C-{ci, cj}{c}.
File Compression

Example:
frequencies ={0.4,0.3,0.15,0.05,0.04,0.03,0.03}
0.4
0.3
0.4
0.15
0.3
0.05
0.15
0.06
0.03
0.03
0.04
0.05
0.03
0.04
0.03
0.06
File Compression

Example:
0.4
0.4
0.3
0.15
0.09
0.3
0.15
0.15
0.06
0.15
0.06
0.09
0.06
0.03
0.03
0.04
0.09
0.05
0.03
0.03 0.04
0.05
File Compression

0.4
Example:
0.3
0.4
0.3
0.6
0.6
0.3
0.3
0.15
0.15
0.15
0.06
0.15
0.09
0.09
0.06
0.03
0.3
0.03 0.04
0.05
0.03
0.03 0.04
0.05
File Compression

Example:
1
0.6
0.3
0.15
0.3
0.15
0.09
0.06
0.03
0.4
0.03 0.04
0.05
1
2
3
4
5
6
7
8
9
10
11
12
13
frequency
parent
LChild
RChild
0.4
0.3
0.15
0.05
0.04
0.03
0.03
0.06
0.09
0.15
0.3
0.6
1
13
12
11
9
9
8
8
10
10
11
12
13
0
0
0
0
0
0
0
0
6
4
8
3
2
1
0
0
0
0
0
0
0
7
5
9
10
11
12
1
File Compression
Example:

指令
I1
I2
0.6
0.4
0
0.3
0.3
1
0
0.15
0.15
0
0.09
0.06
0.03
0
1
1
1
1
0
1
0.03 0.04
0
0.05
I3
I4
I5
I6
I7
频率
0.4
0.3
0.15
0.05
0.04
0.03
0.03
编码
0
10
110
11100
11101
11110
11111
The Greedy Approach

Other examples:


Dijkstra’s algorithm for the shortest path problem
Kruskal’s algorithm for the minimum cost
spanning tree problem
Divide and Conquer

A divide-and-conquer algorithm divides the
problem instance into a number of subinstances
(in most cases 2), recursively solves each
subinsance separately, and then combines the
solutions to the subinstances to obtain the
solution to the original problem instance.
Divide and Conquer


Consider the problem of finding both the
minimum and maximum in an array of integers
A[1…n] and assume for simplicity that n is a
power of 2.
Divide the input array into two halves A[1…n/2]
and A[(n/2)+1…n], find the minimum and
maximum in each half and return the minimum of
the two minima and the maximum of the two
maxima.
Divide and Conquer
















Input: An array A[1…n] of n integers, where n is a power of 2;
Output: (x, y): the minimum and maximum integers in A;
1. minmax(1, n);
minmax(low, high)
1. if high-low=1 then
2.
if A[low]<A[high] then return (A[low], A[high]);
3.
else return(A[high], A[low]);
4.
end if;
5. else
6.
mid(low+high)/2;
7.
(x1, y1)minmax(low, mid);
8.
(x2, y2)minmax(mid+1, high);
9.
xmin{x1, x2};
10. y max{y1, y2};
11. return(x, y);
12. end if;
The Divide and Conquer
Paradigm



The divide step: the input is partitioned into p1
parts, each of size strictly less than n.
The conquer step: performing p recursive call(s) if
the problem size is greater than some predefined
threshold n0.
The combine step: the solutions to the p recursive
call(s) are combined to obtain the desired output.
The Divide and Conquer
Paradigm

(1) If the size of the instance I is “small”, then solve the
problem using a straightforward method and return the
answer. Otherwise, continue to the next step;

(2) Divide the instance I into p subinstances I1, I2, …, Ip of
approximately the same size;

(3) Recursively call the algorithm on each subinstance Ij,
1jp, to obtain p partial solutions;

(4) Combine the results of the p partial solutions to obtain
the solution to the original instance I. Return the solution of
instance I.
Divide and Conquer

example:

Quicksort
Dynamic Programming


Dynamic programming resorts to evaluating the
recurrence in a bottom-up manner, saving
intermediate results that are used later on to
compute the desired solution.
This technique applies to many combinatorial
optimization problems to derive efficient
algorithms.
The Dynamic
Programming Paradigm


The idea of saving solutions to subproblems in order
to avoid their recomputation is the basis of this
powerful method.
This is usually the case in many combinatorial
optimization problems in which the solution can be
expressed in the form of a recurrence whose direct
solution causes subinstances to be computed more
than once.
The Dynamic
Programming Paradigm


An important observation about the working of
dynamic programming is that the algorithm computes
an optimal solution to every subinstance of the original
instance considered by the algorithm.
This argument illustrates an important principle in
algorithm design called the principle of optimality:
Given an optimal sequence of decisions, each
subsequence must be an optimal sequence of
decisions by itself.
Dynamic Programming

example:

Floyd Algorithm for the All-Pairs Shortest Path
Problem
Backtracking


In many real world problems, a solution can be
obtained by exhaustively searching through a large
but finite number of possibilities. Hence, the need
arose for developing systematic techniques of
searching, with the hope of cutting down the search
space to possibly a much smaller space.
Here, we present a general technique for organizing
the search known as backtracking. This algorithm
design technique can be described as an organized
exhaustive search which often avoids searching all
possibilities.
Backtracking

Example:


Depth-First Search
Breadth-First Search
Homework #11

11.1 A file contains only colons, spaces, newline,
commas, and digits in the following frequency:
colon (100), space (605), newline (100), commas
(705), 0 (431), 1 (242), 2 (176), 3 (59), 4 (185), 5
(250), 6 (174), 7 (199), 8 (205), 9 (217). Construct
the Huffman code.