Data Structures and Algorithm Analysis

Data Structures and
Algorithm Analysis
Algorithm Analysis
Lecturer: Ligang Dong
Email: [email protected]
Tel: 28877721,13306517055
Office: SIEE Building 305
Analysis of Algorithms


Analysis of an algorithm gives insight into how long
the program runs and how much memory it uses
 time complexity
 space complexity
Why useful?
 Show the efficiency of an algorithm
Machine Independent Analysis


We assume that every basic operation takes
constant time:
 Example Basic Operations: Addition,
Subtraction, Multiplication, Memory Access
 Non-basic Operations: Sorting, Searching
Efficiency of an algorithm is the number of basic
operations it performs
 We do not distinguish between the basic
operations.
Machine Independent Analysis


Input size is indicated by a number n
 sometimes have multiple inputs, e.g. m and n
Running time is a function of n
n, n2, n log n, 18 + 3n(log n2) + 5n3
Simplifying the Analysis

Eliminate low order terms
4n + 5  4n
0.5 n log n - 2n + 7  0.5 n log n
2n + n3 + 3n  2n

Eliminate constant coefficients
4n  n
0.5 n log n  n log n
log n2 = 2 log n  log n
log3 n = (log3 2) log n  log n
Order Notation

BIG-O





Upper bound
Exist constants c and n0 such that
T(n)  c f(n) for all n  n0
OMEGA


T(n) =  (f(n))
Lower bound
Exist constants c and n0 such that
T(n)  c f(n) for all
n  n0
THETA

T(n) = O(f(n))
T(n) = θ (f(n))
Tight bound
θ(n) = O(n) =  (n)
Examples
n2 + 100 n = O(n2) = (n2) =
(n2)
( n2 + 100 n )  2 n2
( n2 + 100 n )  1 n2
n log n = O(n2)
n log n = (n log n)
n log n = (n)
for n  10
for n  0
More on Order Notation

Order notation is not symmetric; write
2n2 + 4n = O(n2)
but never
O(n2) = 2n2 + 4n
right hand side is a crudification of the left
Likewise
O(n2)
(n3)
= O(n3)
= (n2)
A Few Comparisons
Function #1
Function #2
n3 + 2n2
100n2 + 1000
n0.1
log n
n + 100n0.1
2n + 10 log n
5n5
n!
n-152n/100
1000n15
82log n
3n7 + 7n
Race I
n3 + 2n2
vs.
100n2+1000
Race II
n0.1
vs.
log n
Race III
n+100n0.1
vs.
2n+10logn
Race IV
5n5
vs.
n!
Race V
n-152n/100
vs.
1000n15
Race VI
82log(n)
vs.
3n7 + 7n
The Losers Win
Function #1
Function #2
Better algorithm!
n3 + 2n2
100n2 + 1000
O(n2)
n0.1
log n
O(log n)
n + 100n0.1
2n + 10 log n
TIE O(n)
5n5
n!
O(n5)
n-152n/100
1000n15
O(n15)
82log n
3n7 + 7n
O(n6)
Common Names
constant:
logarithmic:
linear:
log-linear:
superlinear:
quadratic:
polynomial:
exponential:
O(1)
O(log n)
O(n)
O(n log n)
O(n1+c)
(c is a constant > 0)
O(n2)
O(nk)
(k is a constant)
O(cn)
(c is a constant > 1)
Kinds of Analysis


Running time may depend on actual data
input, not just length of input
Distinguish

worst case



best case
average case


your worst enemy is choosing input
assumes some probabilistic distribution of inputs
amortized

average time over many operations
Analyzing Code






C operations
consecutive stmts conditionals
loops
function calls
recursive functions
constant time
sum of times
sum of branches, condition
sum of iterations
cost of function body
- solve recursive equation
Above all, use your head!
Nested Loops
for i = 1 to n do
for j = 1 to n do
sum = sum + 1
n

i 1
n
j 1
n
1  n  n
i 1
2
Nested Dependent Loops
for i = 1 to n do
for j = i to n do
sum = sum + 1
n
n
n
i 1
j i
i 1
  1 
n
n
i 1
i 1
(n  i  1)   (n  1)   i 
n(n  1) n(n  1)
n(n  1) 

 n2
2
2
Conditionals

Conditional
if C then S1 else S2
time  time(C) + Max( time(S1), time(S2) )
Recursion

Recursion



Iteration


A subroutine which calls itself, with different
parameters.
Example: factorial(n) = nfactorial(n-1)
Example:
prod = 1
For j=1 to m
prod prod j
In general, iteration is more efficient than
recursion because of maintenance of state
information.
Recursion


A recursive procedure can often be
analyzed by solving a recursive equation
Basic form:
T(n) = if (base case) then some constant
else ( time to solve subproblems +
time to combine solutions )

Result depends upon


how many subproblems
how costly to combine solutions
Example: Sum of Integer Queue
sum_queue(Q){
if (Q.length == 0 ) return 0;
else return Q.dequeue() +
sum_queue(Q); }



One subproblem
Linear reduction in size (decrease by 1)
Combining: constant c (+), 1×subproblem
Equation:
T(0)  b
T(n)  c + T(n – 1)
for n>0
Example: Sum of Integer Queue
Equation: T(0)  b
T(n)  c + T(n – 1)
Solution:
T(n)  c + c + T(n-2)
 c + c + c + T(n-3)
 kc + T(n-k) for all k
 nc + T(0) for k=n
 cn + b = O(n)
for n>0
Example: Binary Search
7
12 30 35 75 83 87 90 97 99
One subproblem, half as large
Equation:
T(1)  b
T(n)  T(n/2) + c
for n>1
Solution:
T(n)  T(n/2) + c
 T(n/4) + c + c
 T(n/8) + c + c + c
 T(n/2k) + kc
 T(1) + c log n where k = log n
 b + c log n = O(log n)
Example: MergeSort
Split array in half, sort each half, merge together
 2 subproblems, each half as large
 linear amount of work to combine
T(1)  b
T(n)  2T(n/2) + cn
for n>1
T(n)  2T(n/2)+cn
 2(2(T(n/4)+cn/2)+cn
= 4T(n/4) +cn +cn
 4(2(T(n/8)+c(n/4))+cn+cn
= 8T(n/8)+cn+cn+cn 
 2kT(1) + cn log n
= O(n log n)
2kT(n/2k)+kcn
where k = log n
Example: Recursive Fibonacci




Recursive Fibonacci:
int Fib(n){
if (n == 0 or n == 1) return 1 ;
else return Fib(n - 1) + Fib(n - 2); }
Running time: Lower bound analysis
T(0), T(1)  1
T(n)  T(n - 1) + T(n - 2) + c
Note: T(n)  Fib(n)
Fact: Fib(n)  (3/2)n
O( (3/2)n )
Why?
if n > 1
Proof of Recursive Fibonacci

Recursive Fibonacci:
int Fib(n)
if (n == 0 or n == 1) return 1
else return Fib(n - 1) + Fib(n - 2)



Lower bound analysis
T(0), T(1) >= b
T(n) >= T(n - 1) + T(n - 2) + c
if n > 1
Analysis
let  be (1 + 5)/2 which satisfies 2 = 2( + 1)
show by induction on n that T(n) >= bn - 1
Direct Proof Continued


Basis: T(0)  b > b-1 and T(1)  b
= b0
Inductive step: Assume T(m)  bm - 1
for all m < n
T(n) 


=

T(n - 1) + T(n - 2) + c
bn-2 + bn-3 + c
bn-3( + 1) + c
2bn-32 + c
bn-1
Fibonacci Call Tree
5
3
4
2
1
0
0
3
2
1
1
2
1
0
1
Learning from Analysis

To avoid recursive calls
 store all basis values in a table
 each time you calculate an answer, store it in
the table
 before performing any calculation for a value n
 check if a valid answer for n is in the table
if so, return it
Memoization
 a form of dynamic programming
How much time does memoized version take?



Kinds of Analysis



So far we have considered worst case analysis
We may want to know how an algorithm performs
“on average”
Several distinct senses of “on average”
 amortized
 average time per operation over a sequence
of operations
 average case
 average time over a random distribution of
inputs
 expected case
 average time for a randomized algorithm
over different random seeds for any input
Amortized Analysis

Consider any sequence of operations
applied to a data structure



your worst enemy could choose the
sequence!
Some operations may be fast, others
total time for n operations
slow
n
Goal: show that the average time per
operation is still good
Stack ADT
A

Stack operations




push
pop
is_empty
EDCBA
B
C
D
E
F
F
Stack property: if x is on the stack
before y is pushed, then x will be
popped after y is popped
What is biggest problem with an
array implementation?
Stretchy Stack Implementation
int data[];
int maxsize;
int top;
Best case Push = O( )
Worst case Push = O( )
Push(e){
if (top == maxsize){
temp = new int[2*maxsize];
copy data into temp;
deallocate data;
data = temp; }
else { data[++top] = e; }
Stretchy Stack Amortized Analysis

Consider sequence of n operations
push(3); push(19); push(2); …


What is the max number of stretches? log n
What is the total time?

let’s say a regular push takes time a, and stretching an
array contain k elements takes time kb, for some
constants a and b.
logn
an  b(1  2  4  8  ...  n)  an  b  2i
i o
 an  b(21logn  1)  an  b(2n  1)

Amortized time =
(an+b(2n-1))/n = O(1)
Homework #3



3.1 Design the algorithm of Towers of Hanoi.
 Source peg, Destination peg, Auxilliary peg
 k disks on the source peg, a bigger disk can never
be on top of a smaller disk
 Need to move all k disks to the destination peg
using the auxilliary peg, without ever keeping a
bigger disk on the smaller disk.
3.2 Analyze the time complex of your algorithm in 3.1.
3.3 Do a program to resolve Towers of Hanoi.
Homework #3