Chapter 21: Disjoint Sets

Chapter 21:
Disjoint Sets
Pierre Flener
(version of 2015-11-25)
DISJOINT SETS
Information Technology

Assume we have a collection of disjoint
sets (of empty pairwise intersections),
and we need an efficient implementation
of only the following two operations:
 Given
an element, find the set it belongs to.
 Given two sets, replace them by their union
(so that all sets remain disjoint: this is a
representation invariant).

Disjoint sets are used for computing the
equivalence classes of an equivalence
relation.
Course 1DL231: Algorithms and Data Structures II
RELATIONS
Given a binary relation R over a set S,
the infix notation a R b indicates that
a is related to b under R.
 Some relations:
Information Technology

 is-the-mother-of
 are-connected-by-wire
 are-in-the-same-country
 is-less-than-or-equal-to
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology

An equivalence relation R over set S
satisfies the following properties:
 Reflexivity:
 a R a, for all a in S
 Symmetry:
 a R b if and only if b R a, for all a, b in S
 Transitivity:
 a R b and b R c implies a R c, for all a, b, c in S
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology

is-the-mother-of:
 not
reflexive
 not symmetric
 not transitive
It is not an equivalence relation, for any
single one of the three reasons above.
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology

are-connected-by-wire:
 reflexive
 symmetric
 transitive
It is an equivalence relation, for all three
reasons above.
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology

is-less-than-or-equal-to:
 reflexive
 not
symmetric
 transitive
It is not an equivalence relation, for the
second reason above.
Course 1DL231: Algorithms and Data Structures II
DYNAMIC EQUIVALENCE PROBLEM
Given an equivalence relation, it is easy
to decide whether two items a and b are
equivalent: look up the value equiv[a,b]
in a 2D-array equiv of Boolean values.
 But the relation is often only intensionally
or dynamically defined!
 For example, given a set of four
elements, {a1, a2, a3, a4}, and the explicit
relationships a1 R a2, a2 R a3, and a3 R a4,
we want to infer quickly that a1 R a4.
Information Technology

Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE CLASSES
Information Technology

An equivalence relation over a set S
defines a partition into equivalence classes:
–
Each element of S belongs to a single
equivalence class.
–
The union of all equivalence classes is S.
–
The equivalence classes are disjoint sets.
a4
a1
a3 a5
a2
S
a7
a6
a8
Course 1DL231: Algorithms and Data Structures II
a9
COMPUTING EQUIVALENCE CLASSES
Information Technology

Very important applications:
 Graph
Theory:
 Finding connected components of (dynamically
evolving) graphs.
 Finite
State Machines:
 Minimising finite state machines:
• If M is a finite state machine with k states for recognising
a given language, is there a simpler version of M with
(much) fewer than k states?
Course 1DL231: Algorithms and Data Structures II
OPERATIONS
We start with n disjoint sets,
each containing a different element of S.
 We have two permissible operations:
Information Technology

 Find
the set a given element belongs to.
 Union two given sets with the result replacing
the two given sets.
Note that we do not need to compare the
elements to each other.
 So we will assume the elements of S are
labelled from 0 to n−1
(otherwise we can use a hash function).

Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Information Technology

We can make the find operation very fast:
 Keep
an array of n set identifiers,
one for each item.
 The set that an element belongs to can
always be found in Θ(1) time.
Set[i]
1
Item
0
2
1
1
2
3
1
2
1
2
3
3
3
3
3
4
5
6
7
8
9
10
11
Set1 = {0, 2, 4, 6}
Course 1DL231: Algorithms and Data Structures II
Set2 = {1, 5, 7}
Set3 = {3, 8, 9, 10, 11}
STRATEGIES
Information Technology

We can make the find operation very fast:
 Keep
an array of n set identifiers,
one for each item.
 The set that an element belongs to can
always be found in Θ(1) time.

To union sets i and j, we traverse the
array and replace all j by i (or vice-versa).
 This

always takes Θ(n) time.
A sequence of m find & union operations
thus takes O(mn) time.
Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Can we do any better?
 Constant time on average for
both find and union?
 On average linear time in m for a
sequence of m find & union operations,
including the n initial creations of
singleton sets?
Information Technology

Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Can we do any better? Yes!
 Constant time on average for
both find and union? No...
 On average linear time in m for a
sequence of m find & union operations,
including the n initial creations of
singleton sets? Yes, in practical cases!
Information Technology

Course 1DL231: Algorithms and Data Structures II
BASIC DATA STRUCTURE
The basic data structure to represent the
equivalence classes is a forest of trees.
 Instead of each tree node pointing to its
arbitrarily many children, each non-root
tree node here points to its unique parent:
Information Technology

0

1
2
3
4
From now on, each set is identified by the
label of the root element of its tree.
Course 1DL231: Algorithms and Data Structures II
Information Technology
BASIC DATA STRUCTURE
0
Course 1DL231: Algorithms and Data Structures II
1
2
3
4
Information Technology
BASIC DATA STRUCTURE
0
1
2
3
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
4
Information Technology
BASIC DATA STRUCTURE
0
1
2
4
Union 1 and 3
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
3
Information Technology
BASIC DATA STRUCTURE
0
1
2
4
3
Union 0 and 1
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
BASIC DATA STRUCTURE
Information Technology

We represent a forest using an array of
parent labels:
 p[i]
= −1 if i is a root
 p[i] = label of the parent of i if i is not a root

We need no pointers as we know that only the
elements 0 to n−1 will ever belong to the forest.
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
-1
0
-1
1
0
0
1
2
3
4
UNION
Information Technology


The destructive UNION of two distinct
sets specified by their root elements can
be performed by making the parent of one
tree’s root be the root of the other tree.
-1
0
-1
1
0
0
1
2
3
4
Union of 0 and 2
-1
0
0
1
0
0
1
2
3
4
This operation takes Θ(1) time.
Course 1DL231: Algorithms and Data Structures II
FIND
FIND-SET(x) on element x is performed
by returning the root of the tree x is in.
 Time is proportional to the depth of the
node for x:
Information Technology

 Worst
case: A tree of height n−1 can result,
so it takes O(n) time.
Optimisations are possible: see below.
 Reminder:

 Depth
of a node = #edges from the root
 Height of a node = max #edges to a leaf
Course 1DL231: Algorithms and Data Structures II
Information Technology
NAIVE ALGORITHMS
for i ← 0 to n−1
do MAKE-SET(i)
MAKE-SET(x)
p[x] ← −1 {CLRS uses x instead of −1}
UNION(x, y) {pre: x and y are distinct roots}
p[y] ← x
FIND-SET(x)
if p[x] = −1
then return x
else return FIND-SET(p[x])
Course 1DL231: Algorithms and Data Structures II
ANALYSIS
A sequence of m find & union operations
takes O(mn) time.
 It is very hard to define the average case.
Information Technology

Course 1DL231: Algorithms and Data Structures II
SMART UNION ALGORITHMS
Information Technology

The union operation above was
implemented rather arbitrarily:
 We
always made the second tree a subtree
of the root of the first tree!

Simple improvements:
 Make
the smaller tree a subtree of the root of
the larger tree (break ties arbitrarily):
 Union-by-size {not in CLRS}
 Make
the (apparently) lower tree a subtree of
the root of the (apparently) higher tree:
 Union-by-rank (rank = overestimate of the height)
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
0
1
2
3
2
3
4
5
6
7
5
6
7
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
0
1
2
3
5
6
7
3
5
6
7
4
Union 3 and 2
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
UNION-BY-SIZE
Information Technology
0
1
4
3
5
6
7
6
7
2
Union 6 and 5
0
1
4
Course 1DL231: Algorithms and Data Structures II
3
2
5
Information Technology
UNION-BY-SIZE
0
1
4
3
2
6
7
6
7
5
Union 1 and 3
0
3
4
Course 1DL231: Algorithms and Data Structures II
2
1
5
UNION-BY-SIZE
Information Technology
0
3
4
2
6
1
7
5
Union 6 and 0
3
6
7
0
2
1
5
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
3
6
7
0
2
1
5
Union 3 and 6
4
3
6
7
0
2
1
5
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology

If unions are done by size, then the depth
of any node is never more than log n:
 Initially,
the depth of a node is 0.
 If the depth of a node increases (by 1) as the
result of a union, then the resulting tree is at
least twice as large as the tree of that node.
 The depth of a node is thus increased (by 1)
at most log n times.

FIND-SET thus takes O(log n) time.
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology

The implementation is very simple.
We use the same array p as before, but:
 p[i]
= label of the parent of i if i is not a root
 p[i] = −size of the tree with root i:
 The − indicates that i is a root
 size is the number of nodes of the tree rooted at i
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
3
6
7
0
2
1
5
4
6
3
3
-3
0
6
-4
-1
0
1
2
3
4
5
6
7
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
UNION(x, y) {pre: x and y are distinct roots}
if p[y] < p[x]
then p[y] ← p[y] + p[x]
p[x] ← y
{make y the new root}
else p[x] ← p[x] + p[y]
p[y] ← x
{make x the new root}
This operation takes Θ(1) time.
●
A sequence of m find & union-by-size
operations takes O(m log n) time.
●
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-RANK
0
1
2
3
2
3
4
5
6
7
5
6
7
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-RANK
Information Technology
0
1
2
3
5
6
7
3
5
6
7
4
Union 3 and 2
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
Information Technology
UNION-BY-RANK
0
1
4
3
5
6
7
2
Union 1 and 3
0
3
4
Course 1DL231: Algorithms and Data Structures II
2
5
1
6
7
UNION-BY-RANK
Information Technology
0
3
4
2
5
6
7
1
Union 0 and 3
0
5
3
4
2
Course 1DL231: Algorithms and Data Structures II
1
6
7
This is different from the
result of “union-by-size 0
and 3”, which would
actually be obtained with
the union-by-rank
algorithm hereafter for
“union 3 and 0”!
UNION-BY-RANK
Information Technology

If the union operations are done by rank,
then the height of a tree increases (by 1)
only when equally high trees are unioned:
 This

can only happen log n times.
FIND-SET thus takes O(log n) time.
Course 1DL231: Algorithms and Data Structures II
UNION-BY-RANK
Information Technology

The implementation is very simple.
We use the same array p as before, but:
 p[i]
= label of the parent of i if i is not a root
 p[i] = −rank−1 of the tree with root i:
 The − indicates that i is a root
 rank is an upper bound on the height of the tree
rooted at i
 The −1 is necessary for trees of height 0
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-RANK
0
5
6
3
4
2
1
-3
3
3
0
0
-1
-1
-1
0
1
2
3
4
5
6
7
Course 1DL231: Algorithms and Data Structures II
7
Information Technology
UNION-BY-RANK
UNION(x, y) {pre: x and y are distinct roots}
if p[y] < p[x]
then p[x] ← y
{make y the new root}
else if p[y] = p[x]
then p[x] ← p[x]−1 {update rank}
p[y] ← x
{make x the new root}
This operation takes Θ(1) time.
●
A sequence of m find & union-by-rank
operations takes O(m log n) time.
●
Course 1DL231: Algorithms and Data Structures II
WHAT ELSE CAN WE DO?
The union operation takes Θ(1) time, so
there is not much we can do to make it
any faster.
 Can we do something with the find
operation? Yes!
Information Technology

Course 1DL231: Algorithms and Data Structures II
PATH COMPRESSION
Path compression is a technique for
dynamically changing the data structure
during a find operation.
 When we perform FIND-SET(x), the
parent of every node from x to the root is
changed to the root.
 So subsequent find operations run faster.
We are speculating though!
Information Technology

Course 1DL231: Algorithms and Data Structures II
Information Technology
PATH COMPRESSION
find(5)
6
6
5
0
3
0
3
4
2
1
4
2
5
1
1 step closer
2 steps closer
Course 1DL231: Algorithms and Data Structures II
PATH COMPRESSION
Information Technology

The basic idea (and hope) behind path
compression thus is:
 We
do some extra work during a find.
 We hope that this will speed up future find
operations.
Course 1DL231: Algorithms and Data Structures II
Information Technology
FIND with Path-Compression
FIND-SET(x)
if p[x] ≥ 0
then p[x] ← FIND-SET(p[x]); return p[x]
else return x
The ranks are not updated during path
compression, hence the ranks really are
overestimates of the heights of the trees.
A sequence of m find & union-by-rank
operations now takes O(m α(n)) time,
where α(n) ≤ 4 in practical cases.
Course 1DL231: Algorithms and Data Structures II
APPLICATION OF DISJOINT SETS
Information Technology

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II


Generating maze
puzzles.
Cells reachable from
each other constitute
an equivalence class.
Initially, all cells are in
equivalence classes
by themselves.
APPLICATION OF DISJOINT SETS
Information Technology

Randomly select a
wall to knock down:
0
1
2
3
4

5
6
7
8
9

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Randomly select a cell.
Randomly select one of
its remaining walls.
APPLICATION OF DISJOINT SETS
Information Technology

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Randomly select a
wall to knock down:




Randomly select a cell.
Randomly select one of
its remaining walls.
Do not select any wall
on the outer boundary.
Do not select a wall if
the cells on both sides
are already connected
(are in the same
equivalence class).
APPLICATION OF DISJOINT SETS
Information Technology

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Keep on knocking
down walls until all
cells are reachable
from each other.
OTHER APPLICATIONS
Information Technology

Finding connected components in
(dynamically evolving) graphs:
 Electrical
circuit analysis:
 Design VLSI circuit layout using a CAD tool.
 Perform circuit extraction from layout.
 Check if the right places are connected.
Course 1DL231: Algorithms and Data Structures II
OTHER APPLICATIONS
Information Technology

Finding minimal finite state machines:
 Design
a finite state machine from a regular
expression.
 Find a minimal equivalent machine (by
merging equivalent states), so that it runs
faster and uses less memory.
Course 1DL231: Algorithms and Data Structures II