Binary-search trees

Analysis of Algorithms
CS 477/677
Instructor: Monica Nicolescu
Lecture 10
Bucket Sort
• Assumption:
– the input is generated by a random process that distributes
elements uniformly over [0, 1)
• Idea:
–
–
–
–
Divide [0, 1) into n equal-sized buckets
Distribute the n input values into the buckets
Sort each bucket
Go through the buckets in order, listing elements in each one
• Input: A[1 . . n], where 0 ≤ A[i] < 1 for all i
• Output: elements in A sorted
• Auxiliary array: B[0 . . n - 1] of linked lists, each list
initially empty
CS 477/677 - Lecture 10
2
BUCKET-SORT
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
do insert A[i] into list B[⎣nA[i]⎦]
for i ← 0 to n - 1
do sort list B[i] with insertion sort
concatenate lists B[0], B[1], . . . , B[n -1]
together in order
return the concatenated lists
CS 477/677 - Lecture 10
3
Correctness of Bucket Sort
• Consider two elements A[i], A[ j]
• Assume without loss of generality that A[i] ≤ A[j]
• Then ⎣nA[i]⎦ ≤ ⎣nA[j]⎦
– A[i] belongs to the same group as A[j] or to a group
with a lower index than that of A[j]
• If A[i], A[j] belong to the same bucket:
– insertion sort puts them in the proper order
• If A[i], A[j] are put in different buckets:
– concatenation of the lists puts them in the proper
order
CS 477/677 - Lecture 10
4
Analysis of Bucket Sort
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
𝝤(n)
do insert A[i] into list B[⎣nA[i]⎦]
for i ← 0 to n - 1
do sort list B[i] with insertion sort
Θ(n)
concatenate lists B[0], B[1], . . . , B[n -1]
together in order
𝝤(n)
return the concatenated lists
CS 477/677 - Lecture 10
Θ(n)
5
Conclusion
• Any comparison sort will take at least nlgn to sort an
array of n numbers
• We can achieve a better running time for sorting if
we can make certain assumptions on the input
data:
– Counting sort: each of the n input elements is an integer in
the range 0 to k
– Radix sort: the elements in the input are integers
represented with d digits
– Bucket sort: the numbers in the input are uniformly
distributed over the interval [0, 1)
CS 477/677 - Lecture 10
6
A Job Scheduling Application
• Job scheduling
– The key is the priority of the jobs in the queue
– The job with the highest priority needs to be
executed next
• Operations
– Insert, remove maximum
• Data structures
– Priority queues
– Ordered array/list, unordered array/list
CS 477/677 - Lecture 10
7
PQ Implementations & Cost
Worst-case asymptotic costs for a PQ with N
items
Insert
Remove max
ordered array
N
1
ordered list
N
1
unordered array
1
N
unordered list
1
N
Can we implement both operations efficiently?
CS 477/677 - Lecture 10
8
Background on Trees
• Def: Binary tree = structure composed of a
finite set of nodes that either:
– Contains no nodes, or
– Is composed of three disjoint sets of nodes: a root
node, a left subtree and a right subtree
root
4
Left subtree
1
2
14
3
16 9
Right subtree
10
8
CS 477/677 - Lecture 10
9
Special Types of Trees
• Def: Full binary tree = a
binary tree in which each
node is either a leaf or has
degree (number of
children) exactly 2.
• Def: Complete binary tree =
a binary tree in which all
leaves have the same
depth and all internal
nodes have degree 2.
CS 477/677 - Lecture 10
4
1
3
2
14
16 9
8
10
7
12
Full binary tree
4
1
2
3
16 9
10
Complete binary tree
10
The Heap Data Structure
• Def: A heap is a nearly complete binary tree
with the following two properties:
– Structural property: all levels are full, except
possibly the last one, which is filled from left to right
– Order (heap) property: for any node x
Parent(x) ≥ x
8
7
5
4
2
It doesn’t matter that 4 in
level 1 is smaller than 5 in
level 2
Heap
CS 477/677 - Lecture 10
11
Definitions
• Height of a node = the number of edges on a
longest simple path from the node down to a leaf
• Depth of a node = the length of a path from the
root to the node
• Height of tree = height of root node
= ⎣lgn⎦, for a heap of n elements
Height of root = 3
4
1
Height of (2)= 1
2
14
3
16 9
10
Depth of (10)= 2
8
CS 477/677 - Lecture 10
12
Array Representation of Heaps
• A heap can be stored as
an array A.
– Root of tree is A[1]
– Left child of A[i] = A[2i]
– Right child of A[i] = A[2i + 1]
– Parent of A[i] = A[ ⎣i/2⎦]
– Heapsize[A] ≤ length[A]
• The elements in the
subarray A[(⎣n/2⎦ + 1) .. n]
are leaves
• The root is the maximum
element of the heap
CS 477/677 - Lecture 10
13
Heap Types
• Max-heaps (largest element at root), have
the max-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]
• Min-heaps (smallest element at root), have
the min-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]
CS 477/677 - Lecture 10
14
Operations on Heaps
• Maintain the max-heap property
– MAX-HEAPIFY
• Create a max-heap from an unordered array
– BUILD-MAX-HEAP
• Sort an array in place
– HEAPSORT
• Priority queue operations
CS 477/677 - Lecture 10
15
Operations on Priority Queues
• Max-priority queues support the following
operations:
– INSERT(S, x): inserts element x into set S
– EXTRACT-MAX(S): removes and returns element of
S with largest key
– MAXIMUM(S): returns element of S with largest key
– INCREASE-KEY(S, x, k): increases value of element
x’s key to k (assume k ≥ current key value at x)
CS 477/677 - Lecture 10
16
Building a Heap
• Convert an array A[1 … n] into a max-heap
(n = length[A])
• The elements in the subarray A[(⎣n/2⎦+1) .. n] are leaves
• Apply MAX-HEAPIFY on elements between 1 and ⎣n/2⎦
1
Alg: BUILD-MAX-HEAP(A)
1.
2.
3.
4
n = length[A]
2
1
4
for i ← ⎣n/2⎦ downto 1
8
do MAX-HEAPIFY(A, i, n)
A:
4
2
14
1
CS 477/677 - Lecture 10
3
2
3
5
6
3
7
16 9
9
10
8
7
16
9
10 14
10
8
17
7
Example:
A
4
8
2
14
8
7
8
2
14
4
6
3
16 9
2
7
10 8
2
14
5
9
10
8
7
8
7
3
16 9
2
7
1
4
10 8
14
2
3
5
9
10
8
7
6
i=1
1
1
4
4
16
6
16 9
10
2
7
3
4
8
2
14
3
16
9
10
8
1
5
6
7
9
10
CS 477/677 - Lecture 10
2
7
3
4
8
2
8
3
16 9
i=2
5
10
6
7
1
3
1
4
3
9
8
4
2
1
10 14
4
1
4
9
i=3
5
10
16
1
3
9
2
i=4
2
1
3
i=5
1
4
1
7
10
3
14
9
10
4
1
5
6
7
9
10
7
3
18
Correctness of BUILD-MAX-HEAP
• Loop invariant:
– At the start of each iteration of the for loop, each
node i + 1, i + 2,…, n is the root of a max-heap
• Initialization:
– i = ⎣n/2⎦: Nodes ⎣n/2⎦ + 1, ⎣n/2⎦ + 2, …, n are leaves
⇒ they are the root of trivial max-heaps
1
4
2
1
4
8
14
2
3
5
9
10
8
7
6
16 9
CS 477/677 - Lecture 10
3
7
10
19
Correctness of BUILD-MAX-HEAP
• Maintenance:
– MAX-HEAPIFY makes node i a maxheap root and preserves the property
that nodes i + 1, i + 2, …, n are roots of
max-heaps
4
– Decrementing i in the for loop
2
8
reestablishes the loop invariant
14
• Termination:
1
4
2
3
1
5
9
10
8
7
6
3
16 9
10
– i = 0 ⇒ each node 1, 2, …, n is the
root of a max-heap (by the loop
invariant)
CS 477/677 - Lecture 10
7
20
Running Time of BUILD MAX HEAP
Alg: BUILD-MAX-HEAP(A)
1.
n = length[A]
2.
for i ← ⎣n/2⎦ downto 1
3.
do MAX-HEAPIFY(A, i, n)
O(lgn)
O(n)
⇒ It would seem that running time is O(nlgn)
• This is not an asymptotically tight upper
bound
CS 477/677 - Lecture 10
21
Running Time of BUILD MAX HEAP
• HEAPIFY takes O(h) ⇒ the cost of HEAPIFY on a node i
is proportional to the height of the node i in the tree
Height
h
h
i 0
i 0
 T (n)   ni hi   2i h  i   O (n)
Level
No. of nodes
h0 = 3 (⎣lgn⎦)
i=0
20
h1 = 2
i=1
21
h2 = 1
i=2
22
h3 = 0
i = 3 (⎣lgn⎦)
23
hi = h – i height of the heap rooted at level i
ni = 2i
number of nodes at level i
CS 477/677 - Lecture 10
22
Running Time of BUILD MAX HEAP
h
T (n)   ni hi
Cost of HEAPIFY at level i × number of nodes at that level
i 0
h
  2i h  i 
Replace the values of ni and hi computed before
i 0
hi h
2
h i
i 0 2
h
k
h
2  k
k 0 2
h


k
k
2
k 0
Multiply by 2h both at the nominator and denominator
and write 2i as 1i
2
Change variables: k = h - i
 n
The sum above is smaller than the sum of all elements to ∞
and h = lgn
 O (n)
The sum above is smaller than 2
Running time of BUILD-MAX-HEAP: T(n) = O(n)
CS 477/677 - Lecture 10
23
Binary Search Trees
• Tree representation:
– A linked data structure in which
each node is an object
• Node representation:
–
–
–
–
–
L
parent
Key field
key
Satellite data
Left: pointer to left child
Left child
Right: pointer to right child
p: pointer to parent (p [root [T]] =
NIL)
data
R
Right child
• Satisfies the binary search tree
property
CS 477/677 - Lecture 10
24
Binary Search Tree Example
• Binary search tree property:
– If y is in left subtree of x,
then key [y] ≤ key [x]
– If y is in right subtree of x,
then key [y] ≥ key [x]
CS 477/677 - Lecture 10
5
3
2
7
5
9
25
Binary Search Trees
• Support many dynamic set operations
– SEARCH, MINIMUM, MAXIMUM, PREDECESSOR,
SUCCESSOR, INSERT, DELETE
• Running time of basic operations on binary
search trees
– On average: Θ(lgn)
• The expected height of the tree is lgn
– In the worst case: Θ(n)
• The tree is a linear chain of n nodes
CS 477/677 - Lecture 10
26
Red-Black Trees
• “Balanced” binary trees guarantee an
O(lgn) running time on the basic dynamicset operations
• Red-black tree
– Binary tree with an additional attribute for its
nodes: color which can be red or black
– Constrains the way nodes can be colored on
any path from the root to a leaf
• Ensures that no path is more than twice as long as
another  the tree is balanced
– The nodes inherit all the other attributes from the
binary-search trees: key, left, right, p
CS 477/677 - Lecture 10
27
Red-Black Trees Properties
1. Every node is either red or black
2. The root is black
3. Every leaf (NIL) is black
4. If a node is red, then both its children are black
• No two red nodes in a row on a simple path from the
root to a leaf
5. For each node, all paths from the node to
descendant leaves contain the same number
of black nodes
CS 477/677 - Lecture 10
28
Example: RED-BLACK TREE
26
17
NIL
41
NIL
NIL
30
47
38
NIL
NIL
NIL
50
NIL
NIL
• For convenience we use a sentinel NIL[T] to
represent all the NIL nodes at the leafs
– NIL[T] has the same fields as an ordinary node
– Color[NIL[T]] = BLACK
– The other fields may be set to arbitrary values
CS 477/677 - Lecture 10
29
Black-Height of a Node
26
h=1
bh = 1
NIL
h=4
bh = 2
17
41
NIL
NIL
h=2
30 bh = 1
h=3
bh = 2
h=1
bh = 1
38
NIL
NIL
NIL
47
h=2
bh = 1
50
NIL
h=1
bh = 1
NIL
• Height of a node: the number of edges in a longest
path to a leaf
• Black-height of a node x: bh(x) is the number of
black nodes (including NIL) on a path from x to leaf,
not counting x
CS 477/677 - Lecture 10
30
Readings
• Chapter 6, 7, 8
CS 477/677 - Lecture 10
31