Disjoint Sets

Disjoint Sets
Chapter 21
CPTR 430 Algorithms
Disjoint Sets
1
Disjoint Sets
S 1 S2
Sk
❚
❚
Each set has a designated representative which is an element of the
set
■
A disjoint-set data structure maintains a collection
of disjoint dynamic sets
■
For some applications, the representative may be arbitrary
For others, the “smallest” element is the representative (if the
elements can be ordered)
CPTR 430 Algorithms
Disjoint Sets
2
Disjoint Set Operations
■
makeSet(x)—creates a new set whose only member is x
■
union(x,y)—combines the sets that contain elements x and y
If x
Sx and y
Sy, then union(x,y) returns a new set equal to
❚
Sx Sy
Sx and Sy are disjoint before the union() operation
❚
❚
■
The representative of the resulting set can be any element in Sx
but usually we choose either the representative of Sx or Sy
The original sets, Sx and Sy are removed from
❚
Sy ,
findSet(x)—returns a reference to the representative of the set
containing x
CPTR 430 Algorithms
Disjoint Sets
3
Skeleton Implementation
public class DisjointSet {
public static void makeSet(DSElement element) { /* To be determined */ }
public static void union(DSElement x, DSElement y) { /* To be determined */ }
public static DSElement findSet(DSElement element) { /* To be determined */ }
}
CPTR 430 Algorithms
Disjoint Sets
4
Disjoint Set Analysis
■
n—the number makeSet() operations
■
m—the total number of makeSet(), and union(), and findSet()
operations
■
The sets in
are disjoint, so
by 1
1 union() operations
1
1
m
■
The number of union() operations is at most n
After n
Each union() operation reduces
n
For the purposes of analysis, assume the first n operations are
makeSet() operations
CPTR 430 Algorithms
Disjoint Sets
5
Applications of Disjoint Sets
■
Determining the connected components of an undirected graph
■
Kruskal’s minimum spanning tree algorithm
■
In FORTRAN, handling the EQUIVALENCE(X,Y) statement
■
Type unification by compilers and interpreters of dynamically typed
programming languages
■
Image processing—blob coloring
■
Colorizing old movies
CPTR 430 Algorithms
Disjoint Sets
6
Sample Application—Detecting the
Connected Components of an Undirected
Graph
■
This undirected graph has four connected components:
CPTR 430 Algorithms
a
b
e
c
d
g
f
h
j
i
Disjoint Sets
7
Connectivity Algorithm
public class ConnectedGraph {
public static void connectedComponents(Graph g) {
Vertex[] vertices = g.getVertices();
Edge[] edges = g.getEdges();
for ( int i = 0; i < vertices.length; i++ ) {
DisjointSet.makeSet(vertices[i]);
}
for ( int i = 0; i < edges.length; i++ ) {
DSElement fromSetRep = DisjointSet.findSet(edges[i].from),
toSetRep
= DisjointSet.findSet(edges[i].to);
if ( fromSetRep != toSetRep ) {
DisjointSet.union(fromSetRep, toSetRep);
}
}
}
public static boolean sameComponent(Vertex v1, Vertex v2) {
return DisjointSet.findSet(v1) == DisjointSet.findSet(v2);
}
}
CPTR 430 Algorithms
Disjoint Sets
8
Connectivity Algorithm
■
connectedComponents() initially places each vertex into its own set
■
Next, all edges are examined; an edge connecting two vertices implies
that the two vertices are to be unioned into one set
■
After all edges have been examined, two vertices are within the same
connected component if sameComponent() returns true
■
For things to work, a vertex object must reference an associated disjoint
set object and vice-versa
CPTR 430 Algorithms
Disjoint Sets
9
Linked List Implementation
■
The linked list implementation is simple
■
The first element in the list is the set’s representative
■
Each element in the list contains:
❚
❚
❚
a data object
a pointer to the next element in
the list
a pointer to the representative
CPTR 430 Algorithms
data
next
rep
Disjoint Sets
10
Linked List Implementation (cont.)
■
Pointers head and tail refer, respectively, to the first and last
elements in the list
❚
❚
head points to the representative
tail points to the position where a new element can be added and
another set can be unioned
a
c
b
d
head
tail
CPTR 430 Algorithms
data
next
rep
data
next
rep
data
next
rep
Disjoint Sets
data
next
rep
11
Efficiency of the Linked Implementation
makeSet()
❚
Create a new list with one element
❚
O1
■
findSet()
❚
Return the pointer to the representative stored in each node
❚
O1
■
■
union()
❚
❚
❚
Attach one list to end of the other
The end can be found quickly via the tail pointer
Updating the representative pointers in every node in the attached
list takes time proportional to the length of the attached list
❚
On
CPTR 430 Algorithms
Disjoint Sets
12
The Amortized Analysis
In the worst case, a sequence of m operations requires O n2 time
■
Take objects x1 x2
xn perform the operations
Number of Objects Updated
1
1
..
1
..
1
2
3
Operation
makeSet(x1)
makeSet(x2)
..
makeSet(xn)
..
union(x1 x2)
union(x2 x3)
union(x3 x4)
..
union(xn 1 xn)
..
n
CPTR 430 Algorithms
Disjoint Sets
■
1
13
The Amortized Analysis (cont.)
The operation sequence is n makeSet()s following by n 1 union()s
such that the longer list is always appended to the shorter list
■
■
Total number of objects updated by all n
∑i
Θ n2
1 union() operations is
n 1
■
The n makeSet() operations take Θ n time
The ith union() operation updates i objects
■
i 1
The total number of operations is 2n
■
Each operation on average requires Θ n time
■
By aggregate analysis, then, the amortized cost of each operation is
1
Θn
■
CPTR 430 Algorithms
Disjoint Sets
14
Weighted-union Heuristic
■
Fewer representative pointers to update
■
Maintain the length of each list (easy—add an extra integer field)
■
A union can still require Ω n if both lists have Ω n elements
■
Helps a little?
CPTR 430 Algorithms
Ensure that the shorter list is always appended to the longer list
■
Disjoint Sets
15
It Does Better than Θ n2
Given: linked list representation with the weighted-union heuristic
■
Any sequence of m makeSet(), findSet(), and union() operations,
n of which are makeSet() operations, takes O m n lg n time
(Theorem 21.1)
■
Why?
CPTR 430 Algorithms
Disjoint Sets
16
Consider each object in a set of size n
■
The first time it was updated it originally had to have been an element
in the smaller set, since the weighted-union heuristic always appends
the smaller list to the larger one
■
After x’s representative was updated the first time, x’s resulting set must
have had at least two elements (Why?)
■
The next time x’s representative is updated, x’s set must have at least
four elements (Again, why?)
■
For all k
n, the resulting set has at least k elements after x’s
representative has been updated lg n times
CPTR 430 Algorithms
For a given object, x, how many times has its representative been
updated?
■
Disjoint Sets
17
Proof of Theorem 21.1 (cont.)
■
The largest set has at most n elements
For all k
n, the resulting set has at least k elements after x’s
representative has been updated lg n times
(Why?)
■
Each element in that largest set has been updated at most lg n times
The time to update the n elements is O n lg n
■
The time to adjust the head and tail pointers, as well as the length
field, is constant
CPTR 430 Algorithms
Disjoint Sets
18
Proof of Theorem 21.1 (cont.)
The makeSet() and findSet() operations take O 1 time
■
There are O m makeSet() and findSet() operations
■
Om
CPTR 430 Algorithms
The time for the entire sequence of m operations is
n lg n
Disjoint Sets
19
Disjoint-set Forests
■
A set is represented by a rooted tree
■
The root is the set’s representative
■
Each node points to its parent (the root points to itself)
■
So, unlike trees we are used to seeing, the pointers point “up” instead
of “down”
c
h
b
f
f
e
d
union(e,g)
g
c
h
d
e
g
b
CPTR 430 Algorithms
Disjoint Sets
20
Operation Implementations
■
makeSet()—create a tree containing one node
■
findSet()—follow parent pointers until the root is found
The path is called the find path
■
union()—redirect the parent pointer of one of the roots to point to the
other root
c
h
b
f
f
e
d
union(e,g)
g
c
h
d
e
g
b
CPTR 430 Algorithms
Disjoint Sets
21
Efficiency of Disjoint-set Forests
The straightforward approach is no better than the linked list version
■
A sequence of n
■
A couple of heuristics can tweak the implementation into the
asymptotically fastest disjoint-set data structure known
❚
❚
■
1 union() operations can create a tree of height n
Union by rank
Path compression
CPTR 430 Algorithms
Disjoint Sets
22
Union by Rank
■
Same idea as weighted-union for linked lists
■
The root of the tree with fewer nodes points to the root of the tree with
more nodes
■
We could have each node keep track of the number of nodes in its
subtree
■
Instead, each node maintains a rank that is an upper bound on its
height
■
union() then redirects the pointer of the root of the tree with smaller
rank to the root of the tree with larger rank
CPTR 430 Algorithms
Disjoint Sets
23
Path Compression
■
Simple in concept and to implement but very effective
■
Alter the findSet() operation so that each node on the find path points
directly to the root instead of its immediate parent
■
Path compression does not affect any ranks
(Why?)
c
e
h
b
c
d
findSet( g)
g
b
h
e
d
f
f
g
CPTR 430 Algorithms
Disjoint Sets
24
Disjoint-set Forest Implementation
The node definition is slightly different:
public class Node {
public DSElement data;
//
public Node parent;
//
public int rank;
public Node(DSElement d) {
data = d;
parent = this;
// No
rank = 0;
// No
data.setNode(this);
}
}
CPTR 430 Algorithms
Element to store
Pointer to the parent node
parent node yet
subtree for this new node
Disjoint Sets
25
Disjoint-set Forest Implementation (cont.)
public class DisjointSet {
public static void makeSet(DSElement element) {
new Node(element);
}
public static void union(DSElement x, DSElement y) {
link(findSet(x), findSet(y));
}
private static void link(DSElement x, DSElement y) {
Node nx = findSet(x).getNode(),
ny = findSet(y).getNode();
if ( nx.rank >= ny.rank ) {
ny.parent = nx;
} else {
nx.parent = ny;
if ( nx.rank == ny.rank ) {
ny.rank++;
}
}
}
. . .
}
CPTR 430 Algorithms
Disjoint Sets
26
Disjoint-set Forest Implementation (cont.)
public class DisjointSet {
. . .
public static DSElement findSet(DSElement element) {
Node node = element.getNode();
if ( node.parent != node ) {
node.parent = findSet(node.parent.data).getNode();
}
return node.parent.data;
}
}
■ Note the recursion in findSet()
■
findSet() uses two passes:
❚ The first pass up the tree to find the root (representative)
❚ The second pass is down the tree to update the parents of all the nodes in the find
path (to point directly to the root)
❚ The recursive calls make up the first pass
❚ The returns from the recursive calls make up the second pass
CPTR 430 Algorithms
Disjoint Sets
27
Do the Heuristics Help?
Union by rank, by itself, yields a running time of
■
O m lg n
log2
f nn
1
f
Θn
Path compression, by itself, yields a running time of
■
where
❚
n is the number of makeSet() operations (which means at most n 1
union() operations)
f is the number of findSet() operations
CPTR 430 Algorithms
❚
Disjoint Sets
28
The Combined Effect
Om αn
Together, they yield a running time of
where
α n is a very slowly growing function
■
For all practical applications of disjoint sets, α n
■
Thus, the running time is linear in m, for all practical purposes
CPTR 430 Algorithms
Disjoint Sets
■
4
29