Chapter 21:
Disjoint Sets
Pierre Flener
(version of 2015-11-25)
DISJOINT SETS
Information Technology
Assume we have a collection of disjoint
sets (of empty pairwise intersections),
and we need an efficient implementation
of only the following two operations:
Given
an element, find the set it belongs to.
Given two sets, replace them by their union
(so that all sets remain disjoint: this is a
representation invariant).
Disjoint sets are used for computing the
equivalence classes of an equivalence
relation.
Course 1DL231: Algorithms and Data Structures II
RELATIONS
Given a binary relation R over a set S,
the infix notation a R b indicates that
a is related to b under R.
Some relations:
Information Technology
is-the-mother-of
are-connected-by-wire
are-in-the-same-country
is-less-than-or-equal-to
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology
An equivalence relation R over set S
satisfies the following properties:
Reflexivity:
a R a, for all a in S
Symmetry:
a R b if and only if b R a, for all a, b in S
Transitivity:
a R b and b R c implies a R c, for all a, b, c in S
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology
is-the-mother-of:
not
reflexive
not symmetric
not transitive
It is not an equivalence relation, for any
single one of the three reasons above.
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology
are-connected-by-wire:
reflexive
symmetric
transitive
It is an equivalence relation, for all three
reasons above.
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE RELATIONS
Information Technology
is-less-than-or-equal-to:
reflexive
not
symmetric
transitive
It is not an equivalence relation, for the
second reason above.
Course 1DL231: Algorithms and Data Structures II
DYNAMIC EQUIVALENCE PROBLEM
Given an equivalence relation, it is easy
to decide whether two items a and b are
equivalent: look up the value equiv[a,b]
in a 2D-array equiv of Boolean values.
But the relation is often only intensionally
or dynamically defined!
For example, given a set of four
elements, {a1, a2, a3, a4}, and the explicit
relationships a1 R a2, a2 R a3, and a3 R a4,
we want to infer quickly that a1 R a4.
Information Technology
Course 1DL231: Algorithms and Data Structures II
EQUIVALENCE CLASSES
Information Technology
An equivalence relation over a set S
defines a partition into equivalence classes:
–
Each element of S belongs to a single
equivalence class.
–
The union of all equivalence classes is S.
–
The equivalence classes are disjoint sets.
a4
a1
a3 a5
a2
S
a7
a6
a8
Course 1DL231: Algorithms and Data Structures II
a9
COMPUTING EQUIVALENCE CLASSES
Information Technology
Very important applications:
Graph
Theory:
Finding connected components of (dynamically
evolving) graphs.
Finite
State Machines:
Minimising finite state machines:
• If M is a finite state machine with k states for recognising
a given language, is there a simpler version of M with
(much) fewer than k states?
Course 1DL231: Algorithms and Data Structures II
OPERATIONS
We start with n disjoint sets,
each containing a different element of S.
We have two permissible operations:
Information Technology
Find
the set a given element belongs to.
Union two given sets with the result replacing
the two given sets.
Note that we do not need to compare the
elements to each other.
So we will assume the elements of S are
labelled from 0 to n−1
(otherwise we can use a hash function).
Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Information Technology
We can make the find operation very fast:
Keep
an array of n set identifiers,
one for each item.
The set that an element belongs to can
always be found in Θ(1) time.
Set[i]
1
Item
0
2
1
1
2
3
1
2
1
2
3
3
3
3
3
4
5
6
7
8
9
10
11
Set1 = {0, 2, 4, 6}
Course 1DL231: Algorithms and Data Structures II
Set2 = {1, 5, 7}
Set3 = {3, 8, 9, 10, 11}
STRATEGIES
Information Technology
We can make the find operation very fast:
Keep
an array of n set identifiers,
one for each item.
The set that an element belongs to can
always be found in Θ(1) time.
To union sets i and j, we traverse the
array and replace all j by i (or vice-versa).
This
always takes Θ(n) time.
A sequence of m find & union operations
thus takes O(mn) time.
Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Can we do any better?
Constant time on average for
both find and union?
On average linear time in m for a
sequence of m find & union operations,
including the n initial creations of
singleton sets?
Information Technology
Course 1DL231: Algorithms and Data Structures II
STRATEGIES
Can we do any better? Yes!
Constant time on average for
both find and union? No...
On average linear time in m for a
sequence of m find & union operations,
including the n initial creations of
singleton sets? Yes, in practical cases!
Information Technology
Course 1DL231: Algorithms and Data Structures II
BASIC DATA STRUCTURE
The basic data structure to represent the
equivalence classes is a forest of trees.
Instead of each tree node pointing to its
arbitrarily many children, each non-root
tree node here points to its unique parent:
Information Technology
0
1
2
3
4
From now on, each set is identified by the
label of the root element of its tree.
Course 1DL231: Algorithms and Data Structures II
Information Technology
BASIC DATA STRUCTURE
0
Course 1DL231: Algorithms and Data Structures II
1
2
3
4
Information Technology
BASIC DATA STRUCTURE
0
1
2
3
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
4
Information Technology
BASIC DATA STRUCTURE
0
1
2
4
Union 1 and 3
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
3
Information Technology
BASIC DATA STRUCTURE
0
1
2
4
3
Union 0 and 1
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
BASIC DATA STRUCTURE
Information Technology
We represent a forest using an array of
parent labels:
p[i]
= −1 if i is a root
p[i] = label of the parent of i if i is not a root
We need no pointers as we know that only the
elements 0 to n−1 will ever belong to the forest.
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
3
-1
0
-1
1
0
0
1
2
3
4
UNION
Information Technology
The destructive UNION of two distinct
sets specified by their root elements can
be performed by making the parent of one
tree’s root be the root of the other tree.
-1
0
-1
1
0
0
1
2
3
4
Union of 0 and 2
-1
0
0
1
0
0
1
2
3
4
This operation takes Θ(1) time.
Course 1DL231: Algorithms and Data Structures II
FIND
FIND-SET(x) on element x is performed
by returning the root of the tree x is in.
Time is proportional to the depth of the
node for x:
Information Technology
Worst
case: A tree of height n−1 can result,
so it takes O(n) time.
Optimisations are possible: see below.
Reminder:
Depth
of a node = #edges from the root
Height of a node = max #edges to a leaf
Course 1DL231: Algorithms and Data Structures II
Information Technology
NAIVE ALGORITHMS
for i ← 0 to n−1
do MAKE-SET(i)
MAKE-SET(x)
p[x] ← −1 {CLRS uses x instead of −1}
UNION(x, y) {pre: x and y are distinct roots}
p[y] ← x
FIND-SET(x)
if p[x] = −1
then return x
else return FIND-SET(p[x])
Course 1DL231: Algorithms and Data Structures II
ANALYSIS
A sequence of m find & union operations
takes O(mn) time.
It is very hard to define the average case.
Information Technology
Course 1DL231: Algorithms and Data Structures II
SMART UNION ALGORITHMS
Information Technology
The union operation above was
implemented rather arbitrarily:
We
always made the second tree a subtree
of the root of the first tree!
Simple improvements:
Make
the smaller tree a subtree of the root of
the larger tree (break ties arbitrarily):
Union-by-size {not in CLRS}
Make
the (apparently) lower tree a subtree of
the root of the (apparently) higher tree:
Union-by-rank (rank = overestimate of the height)
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
0
1
2
3
2
3
4
5
6
7
5
6
7
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
0
1
2
3
5
6
7
3
5
6
7
4
Union 3 and 2
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
UNION-BY-SIZE
Information Technology
0
1
4
3
5
6
7
6
7
2
Union 6 and 5
0
1
4
Course 1DL231: Algorithms and Data Structures II
3
2
5
Information Technology
UNION-BY-SIZE
0
1
4
3
2
6
7
6
7
5
Union 1 and 3
0
3
4
Course 1DL231: Algorithms and Data Structures II
2
1
5
UNION-BY-SIZE
Information Technology
0
3
4
2
6
1
7
5
Union 6 and 0
3
6
7
0
2
1
5
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
3
6
7
0
2
1
5
Union 3 and 6
4
3
6
7
0
2
1
5
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
If unions are done by size, then the depth
of any node is never more than log n:
Initially,
the depth of a node is 0.
If the depth of a node increases (by 1) as the
result of a union, then the resulting tree is at
least twice as large as the tree of that node.
The depth of a node is thus increased (by 1)
at most log n times.
FIND-SET thus takes O(log n) time.
Course 1DL231: Algorithms and Data Structures II
UNION-BY-SIZE
Information Technology
The implementation is very simple.
We use the same array p as before, but:
p[i]
= label of the parent of i if i is not a root
p[i] = −size of the tree with root i:
The − indicates that i is a root
size is the number of nodes of the tree rooted at i
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
3
6
7
0
2
1
5
4
6
3
3
-3
0
6
-4
-1
0
1
2
3
4
5
6
7
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-SIZE
UNION(x, y) {pre: x and y are distinct roots}
if p[y] < p[x]
then p[y] ← p[y] + p[x]
p[x] ← y
{make y the new root}
else p[x] ← p[x] + p[y]
p[y] ← x
{make x the new root}
This operation takes Θ(1) time.
●
A sequence of m find & union-by-size
operations takes O(m log n) time.
●
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-RANK
0
1
2
3
2
3
4
5
6
7
5
6
7
Union 0 and 4
0
1
4
Course 1DL231: Algorithms and Data Structures II
UNION-BY-RANK
Information Technology
0
1
2
3
5
6
7
3
5
6
7
4
Union 3 and 2
0
1
4
Course 1DL231: Algorithms and Data Structures II
2
Information Technology
UNION-BY-RANK
0
1
4
3
5
6
7
2
Union 1 and 3
0
3
4
Course 1DL231: Algorithms and Data Structures II
2
5
1
6
7
UNION-BY-RANK
Information Technology
0
3
4
2
5
6
7
1
Union 0 and 3
0
5
3
4
2
Course 1DL231: Algorithms and Data Structures II
1
6
7
This is different from the
result of “union-by-size 0
and 3”, which would
actually be obtained with
the union-by-rank
algorithm hereafter for
“union 3 and 0”!
UNION-BY-RANK
Information Technology
If the union operations are done by rank,
then the height of a tree increases (by 1)
only when equally high trees are unioned:
This
can only happen log n times.
FIND-SET thus takes O(log n) time.
Course 1DL231: Algorithms and Data Structures II
UNION-BY-RANK
Information Technology
The implementation is very simple.
We use the same array p as before, but:
p[i]
= label of the parent of i if i is not a root
p[i] = −rank−1 of the tree with root i:
The − indicates that i is a root
rank is an upper bound on the height of the tree
rooted at i
The −1 is necessary for trees of height 0
Course 1DL231: Algorithms and Data Structures II
Information Technology
UNION-BY-RANK
0
5
6
3
4
2
1
-3
3
3
0
0
-1
-1
-1
0
1
2
3
4
5
6
7
Course 1DL231: Algorithms and Data Structures II
7
Information Technology
UNION-BY-RANK
UNION(x, y) {pre: x and y are distinct roots}
if p[y] < p[x]
then p[x] ← y
{make y the new root}
else if p[y] = p[x]
then p[x] ← p[x]−1 {update rank}
p[y] ← x
{make x the new root}
This operation takes Θ(1) time.
●
A sequence of m find & union-by-rank
operations takes O(m log n) time.
●
Course 1DL231: Algorithms and Data Structures II
WHAT ELSE CAN WE DO?
The union operation takes Θ(1) time, so
there is not much we can do to make it
any faster.
Can we do something with the find
operation? Yes!
Information Technology
Course 1DL231: Algorithms and Data Structures II
PATH COMPRESSION
Path compression is a technique for
dynamically changing the data structure
during a find operation.
When we perform FIND-SET(x), the
parent of every node from x to the root is
changed to the root.
So subsequent find operations run faster.
We are speculating though!
Information Technology
Course 1DL231: Algorithms and Data Structures II
Information Technology
PATH COMPRESSION
find(5)
6
6
5
0
3
0
3
4
2
1
4
2
5
1
1 step closer
2 steps closer
Course 1DL231: Algorithms and Data Structures II
PATH COMPRESSION
Information Technology
The basic idea (and hope) behind path
compression thus is:
We
do some extra work during a find.
We hope that this will speed up future find
operations.
Course 1DL231: Algorithms and Data Structures II
Information Technology
FIND with Path-Compression
FIND-SET(x)
if p[x] ≥ 0
then p[x] ← FIND-SET(p[x]); return p[x]
else return x
The ranks are not updated during path
compression, hence the ranks really are
overestimates of the heights of the trees.
A sequence of m find & union-by-rank
operations now takes O(m α(n)) time,
where α(n) ≤ 4 in practical cases.
Course 1DL231: Algorithms and Data Structures II
APPLICATION OF DISJOINT SETS
Information Technology
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Generating maze
puzzles.
Cells reachable from
each other constitute
an equivalence class.
Initially, all cells are in
equivalence classes
by themselves.
APPLICATION OF DISJOINT SETS
Information Technology
Randomly select a
wall to knock down:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Randomly select a cell.
Randomly select one of
its remaining walls.
APPLICATION OF DISJOINT SETS
Information Technology
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Randomly select a
wall to knock down:
Randomly select a cell.
Randomly select one of
its remaining walls.
Do not select any wall
on the outer boundary.
Do not select a wall if
the cells on both sides
are already connected
(are in the same
equivalence class).
APPLICATION OF DISJOINT SETS
Information Technology
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Course 1DL231: Algorithms and Data Structures II
Keep on knocking
down walls until all
cells are reachable
from each other.
OTHER APPLICATIONS
Information Technology
Finding connected components in
(dynamically evolving) graphs:
Electrical
circuit analysis:
Design VLSI circuit layout using a CAD tool.
Perform circuit extraction from layout.
Check if the right places are connected.
Course 1DL231: Algorithms and Data Structures II
OTHER APPLICATIONS
Information Technology
Finding minimal finite state machines:
Design
a finite state machine from a regular
expression.
Find a minimal equivalent machine (by
merging equivalent states), so that it runs
faster and uses less memory.
Course 1DL231: Algorithms and Data Structures II
© Copyright 2026 Paperzz