Disjoint Sets
Want to maintain a collection S = {S1, …, Sk} of disjoint dynamic
sets.
Each set has a representative member.
Operations:
Make-Set(x): Make new singleton set containing object x (x is
representative).
Union(x, y): Like before (x and y are objects in two sets to be
merged).
Find-Set(x): Returns pointer to representative set containing x.
Complexity: In terms of
n = no. of Make-Set operations.
m = total no. of operations.
Note: m n.
Using Linked Lists
Store set {a, b, c} as:
a
b
representative
c
tail
Make-Set and Find-Set are O(1).
Union(x, y): Append x’s list onto the end of y’s list. Update
representative pointers in x’s list.
Time is linear in |x|.
Running time for a sequence of m ops can take (m²) time.
(Not very good.)
Example
m = 2n – 1 operations.
Operation
M-S(x1)
M-S(x2)
M-S(xn)
U(x1, x2)
U(x2, x3)
U(x3, x4)
U(xn-1, xn)
“Time”
1
1
1
1
2
3
n–1
Total Time is:
• (n) = (m) for Make-Set ops.
•
n 1
i Θ(n
i 1
2
) Θ(m 2 ) for Union ops.
• (m²) total.
• (m) amortized.
Weighted-Union Heuristic
Keep track of list length in representative.
Modify Union so that smaller list is appended to
longer one.
Time for Union is now proportional to the
length of the smaller list.
Amortized Running Time of WUH
Theorem 21.1: Sequence of m operations takes
O(m + n lg n) time.
Proof:
M-S and F-S contribute O(m) total.
What about Union?
Time is dominated by no. of total times we change a rep.
pointer.
A given object’s rep. pointer can change at most lg n times.
Proof of Theorem 21.1 (Continued)
Note: n = no. of M-S’s = no. of objects
After object x’s rep. ptr. has been changed once, set has 2 members.
…………………………………………… twice..…….. 4 members.
…………………………………………… three times... 8 members.
…………………………………………… lg k times.. k members.
k n , so x’s rep. pointer can change at most lg n times.
O(n lg n) for n objects.
O(m + n lg n) total.
Disjoint-Set Forests
a
representative
b
d
set is {a, b, c, d}
M-S, F-S: Easy
Union: As follows…
c
y
x
y
Union
x
Will speed up sequence of Union, M-S, and F-S operations by
means of two heuristics.
Two Heuristics
1) Union by Rank
• Store rank of tree in rep.
• Rank tree size.
• Make root with smaller rank point to root with larger rank.
2) Path Compression
• During Find-Set, “flatten” tree.
d
c
d
F-S(a)
b
a
a
b
c
Operations
Make-Set(x)
p[x] := x;
rank[x] := 0
Find-Set(x)
if x p[x] then
p[x] := Find-Set(p[x])
fi;
return p[x]
Union(x, y)
Link(Find-Set(x), Find-Set(y))
Link(x, y)
if rank[x] > rank[y] then
p[y] := x
else
p[x] := y;
if rank[x] = rank[y] then
rank[y] := rank[y] + 1
fi
fi
rank = u.b. on height
Find-Set
a
b
c
F-S(a)
p[a] := F-S(b)
p[b] := F-S(c)
{ return c
return c
return c
c
a
b
Time Complexity
We cover the complexity analysis found in CLR
rather than CLRS.
– Note: This was Chapter 22 in CLR, which is why the
remaining lemmas etc. are numbered the way they are.
Tight upper bound on time complexity:
(m,n)).
O(m
• (m,n) = inverse of Ackermann’s function (almost a
constant).
– This bound, for a slightly different definition of than that
given here, is shown in CLRS.
• A slightly easier bound of O(m lg*n) is established in
CLR.
Ackermann’s Function
j 1
i 2
i, j 2
A(1, j) = 2j
A(i,1) = A(i–1, 2)
A(i, j) = A(i–1, A(i, j–1))
Grows very fast (inverse grows very slow).
2
A(3, 4) =
2
2
2
2
Notation: 2
10
2
2
2
2 16
2
2
means 2
22
2
2
22
2
22
2
Note: This is one of several
in-equivalent but similar
definitions of Ackermann’s
function found in the
literature. CLRS gives a
different definition.
Please see the CLR handout.
Powerpoint doesn’t do a
great job with this notation.
Inverse of Ackermann’s Function
(m,n) = min{i1 : A(i, m/n) > lg n}
Note: Not a “true” mathematical inverse.
Intuition: Grows about as slowly as Ackermann’s function does fast.
How slowly?
Let m/n = k.
m n k 1.
We can show that A(i, k) A(i, 1) for all i 1.
Consider i = 4:
A(i, k) A(4, 1) = 2
2
2
16
So, (m,n) 4 if lg n <
1080
1080,
i.e., if n <
210
80
.
Bound We Establish
We establish O(m lg*n) as an upper bound.
Recall lg*n = min{i 0: lg(i) n 1}.
2
In particular: lg * 2
2
k
k 1
And hence: lg*265536 = 5.
Thus, lg*n 5 for all practical purposes.
Example of Algorithm
MS(a) ; MS(b) ; ... ; MS(i) ; MS(j)
a/0
b/0
c/0
d/0
e/0
f/0
g/0
h/0
i/0
j/0
parent pointer
rank
U(a,b) ; U(c,d) ; U(e,f) ; U(g,h); U(i j)
b/1
d/1
f/1
h/1
j/1
a/0
c/0
e/0
g/0
i/0
Example (Continued)
b/1
d/1
f/1
h/1
j/1
a/0
c/0
e/0
g/0
i/0
U(a,d)
d
d/2
d
d
a/0
b/1
d
c/0
f/1
h/1
j/1
e/0
g/0
i/0
Example (Continued)
d
d/2
d
d
a/0
b/1
c/0
f/1
h/1
j/1
e/0
g/0
i/0
d
U(f,h)
d
h
d/2
h/2
j/1
d
d
a/0
d
b/1
c/0
h
h
e/0
f/1
g/0
h
i/0
Example (Continued)
d
h
d/2
h/2
j/1
d
d
b/1
c/0
h
a/0
h
f/1
g/0
i/0
h
e/0
d
U(d,h)
h
h/3
d
d/2
g/0
f/1
b/1
d
c/0
d
a/0
e/0
h
i/0
h
d
j/1
h
Example (Continued)
h
h/3
d
d
d/2
f/1
g/0
h
i/0
h
d
b/1
c/0
d
j/1
e/0
a/0
h
U(e,j)
h/3
d
d
d/2
b/1
c/0
d
a/0
h
d
e/0
f/1
g/0
h
h
h
j/1
i/0
Example (Continued)
h/3
d/2
d
d
b/1
c/0
d
h
a/0
e/0
f/1
g/0
h
h
h
j/1
d
i/0
FS(i)
BC
Block 2
h/3
BC
d/2
Block 1
d
BC
b/1
Block 0
Block 0
a/0
d
c/0
PC
d
h
d
e/0
f/1
g/0
h
h
h
i/0
j/1
Example (Continued)
h/3
d/2
b/1
a/0
d
e/0
f/1
g/0
h
h
h
d
c/0
h
j/1
i/0
d
d
FS(a)
h
h/3
a/0
d
b/1
d/2
d
d
c/0
d
e/0
f/1
g/0
h
h
h
i/0
j/1
Properties of Ranks
Lemma 22.2:
(i) (x:: rank[x] rank[p[x]]).
(ii) (x: x p[x]: rank[x] < rank[p[x]]).
(iii) rank[x] is initially 0.
(iv) rank[x] does not decrease.
(v) Once x p[x] holds, rank[x] does not change.
(vi) rank[p[x]] is a monotonically increasing function of time.
Proof:
By induction on number of operations (see example).
Lemma 22.3
Lemma 22.3: For all tree roots x, size(x) 2rank[x].
no. of nodes in
tree rooted at x
Proof:
Induction on number of Link operations
Basis:
Before first link, all ranks are 0 and each tree contains one node.
Step:
Consider Link(x,y).
Assume lemma holds before this operation.
We show it holds after.
2 cases.
Case 1: rank[x] rank[y]
Assume rank[x] < rank[y].
y
x
rank(x)
size(x)
y
rank(y)
size(y)
Link(x,y)
x
rank(x)
size(x)
Note: rank(x) = rank(x)
rank(y) = rank(y)
size(y) =
=
size(x) + size(y)
2rank(x) + 2rank(y)
2rank(y)
2rank(y)
No ranks or sizes change for any nodes other than y.
rank(y)
size(y)
Case 2: rank[x] = rank[y]
y
x
rank(x)
size(x)
y
Link(x,y)
rank(y)
size(y)
Note: rank(x) = rank(x)
rank(y) = rank(y) + 1
size(y) =
=
size(x) + size(y)
2rank(x) + 2rank(y)
2rank(y) + 1
2rank(y)
x
rank(x)
size(x)
rank(y)
size(y)
Lemma 22.4
Lemma 22.4: For any integer r 0, there are at most n/2r nodes of
rank r.
Proof:
Fix r. (r := 2 in example)
When rank r is assigned to some node x, label each node in the tree
rooted at x by ‘x’. (See example.)
By Lemma 22.3, 2r nodes are labeled each time.
By Lemma 22.2, each node is labeled at most once, when its root is
first assigned rank r. (See example.)
If there were more than n/2r nodes of rank r, then more than 2r (n/2r)
= n nodes would be labeled by a node of rank r, a contradiction.
Corollary 22.5
Corollary 22.5: Every node has rank at most lg n.
Proof:
r > lg n n/ 2r < 1 nodes of rank r.
Proving the Time Bound
Lemma 22.6: Suppose we convert a sequence S of m MS, U, and
FS operations into a sequence S of m MS, Link, and FS operations
by turning each Union into two FS operations followed by a Link.
Then, if sequence S runs in O(m lg*n) time, sequence S runs in
O(m lg*n) time.
Only have to consider MS, Link, FS operations.
Theorem 22.7
Theorem 22.7: A sequence of m MS, L, and FS operations, n of which are MS
operations, can be performed in worst-case time O(m lg*n).
Proof:
MS and Link take O(1) time.
Key: Accurately charging FS.
Partition ranks into blocks.
Put rank r into block lg*r for r = 0, 1, ..., lg n.
Highest-numbered block is lg*(lg n) = lg*n – 1.
Define:
B(j)
–1
1
2
2
if j = –1
if j = 0
if j = 1
2
2
j1
if j 2
Corollary 22.5
Blocks
For j = 0, 1, ... , lg*n – 1,
Block j consists of the set of ranks
{B( j–1 ) + 1, B( j–1 ) + 2, … , B( j )}
B(–1)
B(0)
B(1)
B(2)
B(3)
B(4)
=
=
=
=
–1
1
2
22 = 4
=2
=
24 16
22
2
22
2
216 65536
Block
0
1
2
3
4
Ranks
0, 1
2
3, 4
5, … , 16
17, … , 65536
Charging for Find-Sets
Two types of charges for FS: Block Charges and Path Charges.
Consider FS(x0)
x
x1
Charge each node as either
Block Charge
or
Path Charge
x0
For j = 0, 1, ... , lg*n – 1, assess one block charge to the last node
with rank in block j on the path x0, x1, ... , x .
Also assess one block charge to the child of the root, i.e., x -1.
Assess other nodes in x0, ... , x a Path Charge. (See example.)
Claim
Claim: Once a node other than a root or its child is assessed a B.C.,
it will never be assessed a P.C.
Proof of Claim:
rank[p[x]] – rank[x] is monotonically increasing.
So, lg*(rank[p[x]]) – lg*(rank[x]) is monotonically increasing.
Thus, once x and p[x] are in different blocks, they will always be in
different blocks.
Remaining Goal
Total cost of FS’s = Total B.C.’s + Total P.C.’s
Want to show:
Total B.C.’s + Total P.C.’s = O(m lg*n).
Bounding B.C.’s
This part is easy.
Block numbers range over 0, …, lg*n – 1 .
lg*n + 1 B.C.’s per FS
m FS’s total
m (lg*n + 1) B.C.’s .
Bounding P.C.’s
Let N(j) = number of nodes whose ranks are in block j.
Claim: For all j 0, N(j) 3n / 2 B(j) .
Proof of Claim:
B(j)
n
By Lemma 22.4, N(j)
r
2
r B(j1)1
For j = 0 :
N(0) n / 20 + n / 21
= 3n / 2
= 3n / 2B(0)
Proof of Claim (Continued)
For j 1:
N(j)
n
B(j) (B(j 1)1)
2 B(j1)1
n
r 0
1
2 B(j1)1 r 0 2 r
n
2 B(j1)
n
B(j)
3n
2B(j)
1
2r
Bounding P.C.’s (Continued)
Let P(n) overall number of path charges.
lg* n 1
P(n)
(max number of nodes with ranks in Block j )
j 0
• (max number of path charges per node of Block j)
By claim, upper-bounded by 3n/ 2B(j)
If node x is assessed a P.C. :
Note: Any node in Block j
that is assessed a P.C. will
be in Block j after all m
operations.
rank > r
rank r
x
Path Compression
… x …
x gets new parent with
increased rank
Bounding P.C.’s (Continued)
So, every time x is assessed a P.C., it gets a new parent with increased
rank.
Note: x’s rank is not changed by Path Compression.
Suppose x has a rank in Block j.
Repeated P.C.’s to x will ultimately result in x’s parent having a rank in
a Block higher than j.
From that point onward, x is assessed B.C.’s, not P.C.’s.
Worst Case: x has lowest rank in Block j, i.e., B(j–1) + 1, and x’s
parents’ ranks successively take on the values
B(j–1) + 2, B(j–1) + 3, …, B(j).
Finally!
Hence, x can be assesses at most B(j) – B( j – 1 ) – 1 P.C.’s.
Therefore, P(n)
lg*n 1
j 0
lg*n 1
j 0
3n
B(j) B(j 1) 1
2B(j)
3n
B(j)
2B(j)
3
nlg *n
2
Thus, FS operations contribute:
O(m(lg*n + 1) + n lg*n) =
MS and Link contribute O(n).
Entire sequence takes O(m lg*n).
O(m lg*n).
© Copyright 2026 Paperzz