Lecture Notes for Chapter 21: Data Structures for Disjoint Sets

Lecture Notes for Chapter 21:
Data Structures for Disjoint Sets
Chapter 21 overview
Disjoint-set data structures
•
•
•
Also known as “union Þnd.”
Maintain collection S = {S1, . . . , Sk } of disjoint dynamic (changing over time)
sets.
Each set is identiÞed by a representative, which is some member of the set.
Doesn’t matter which member is the representative, as long as if we ask for the
representative twice without modifying the set, we get the same answer both
times.
[We do not include notes for the proof of running time of the disjoint-set forest
implementation, which is covered in Section 21.4.]
Operations
•
•
M AKE -S ET (x): make a new set Si = {x}, and add Si to S.
U NION (x, y): if x ∈ Sx , y ∈ S y , then S ← S − Sx − S y ∪ {Sx ∪ S y }.
•
•
•
Representative of new set is any member of Sx ∪ S y , often the representative
of one of Sx and Sy .
Destroys Sx and Sy (since sets must be disjoint).
F IND -S ET (x): return representative of set containing x.
Analysis in terms of:
•
•
n = # of elements = # of M AKE -S ET operations,
m = total # of operations.
21-2
Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
Analysis:
•
•
•
Since M AKE -S ET counts toward total # of operations, m ≥ n.
Can have at most n − 1 U NION operations, since after n − 1 U NIONs, only 1
set remains.
Assume that the Þrst n operations are M AKE -S ET (helpful for analysis, usually
not really necessary).
Application: dynamic connected components.
For a graph G = (V, E), vertices u, v are in same connected component if and
only if there’s a path between them.
•
Connected components partition vertices into equivalence classes.
C ONNECTED -C OMPONENTS (V, E)
for each vertex v ∈ V
do M AKE -S ET (v)
for each edge (u, v) ∈ E
do if F IND -S ET (u) = F IND -S ET (v)
then U NION (u, v)
S AME -C OMPONENT (u, v)
if F IND -S ET (u) = F IND -S ET (v)
then return TRUE
else return FALSE
Note: If actually implementing connected components,
•
•
each vertex needs a handle to its object in the disjoint-set data structure,
each object in the disjoint-set data structure needs a handle to its vertex.
Linked list representation
•
•
Each set is a singly linked list.
Each list node has Þelds for
•
•
•
•
the set member
pointer to the representative
next
List has head (pointer to representative) and tail.
M AKE -S ET: create a singleton list.
F IND -S ET: return pointer to representative.
U NION: a couple of ways to do it.
1. U NION (x, y): append x’s list onto end of y’s list. Use y’s tail pointer to Þnd
the end.
Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
•
•
21-3
Need to update the representative pointer for every node on x’s list.
If appending a large list onto a small list, it can take a while.
Operation
U NION (x 1 , x2 )
U NION (x 2 , x3 )
U NION (x 3 , x4 )
U NION (x 4 , x5 )
..
.
U NION (x n−1 , xn )
# objects updated
1
2
3
4
..
.
n−1
(n 2 ) total
Amortized time per operation = (n).
2. Weighted-union heuristic: Always append the smaller list to the larger list.
A single union can still take (n) time, e.g., if both sets have n/2 members.
Theorem
With weighted union, a sequence of m operations on n elements takes
O(m + n lg n) time.
Sketch of proof Each M AKE -S ET and F IND -S ET still takes O(1). How many
times can each object’s representative pointer be updated? It must be in the
smaller set each time.
times updated size of resulting set
1
≥2
2
≥4
3
≥8
..
..
.
.
k
..
.
lg n
≥ 2k
..
.
≥n
Therefore, each representative is updated ≤ lg n times.
Seems pretty good, but we can do much better.
Disjoint-set forest
Forest of trees.
•
•
1 tree per set. Root is representative.
Each node points only to its parent.
(theorem)
21-4
Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
f
c
f
c
UNION(e,g)
h
e
d
b
•
•
•
h
g
d
e
g
b
M AKE -S ET: make a single-node tree.
U NION: make one root a child of the other.
F IND -S ET: follow pointers to the root.
Not so good—could get a linear chain of nodes.
Great heuristics
•
Union by rank: make the root of the smaller tree (fewer nodes) a child of the
root of the larger tree.
•
•
•
•
Don’t actually use size.
Use rank, which is an upper bound on height of node.
Make the root with the smaller rank into a child of the root with the larger
rank.
Path compression: Find path = nodes visited during F IND -S ET on the trip to
the root. Make all nodes on the Þnd path direct children of root.
d
c
d
b
a
a
M AKE -S ET (x)
p[x] ← x
rank[x] ← 0
U NION (x, y)
L INK (F IND -S ET (x), F IND -S ET (y))
b
c
Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
21-5
L INK(x, y)
if rank[x] > rank[y]
then p[y] ← x
else p[x] ← y
If equal ranks, choose y as parent and increment its rank.
if rank[x] = rank[y]
then rank[y] ← rank[y] + 1
F IND -S ET (x)
if x = p[x]
then p[x] ← F IND -S ET ( p[x])
return p[x]
F IND -S ET makes a pass up to Þnd the root, and a pass down as recursion unwinds
to update each node on Þnd path to point directly to root.
Running time
If use both union by rank and path compression, O(m α(n)).
n
0–2
3
4–7
8–2047
2048–A4 (1)
α(n)
0
1
2
3
4
What’s A4 (1)? See Section 21.4, if you dare. It’s 1080 ≈ # of atoms in observable universe.
This bound is tight—there is a sequence of operations that takes (m α(n)) time.