Data Structures for Disjoint Sets

Data Structures for
Disjoint Sets
Chapter 22 in CLR
Chapter 21 in CLRS (2nd and 3rd)
21.1 Disjoint-set operations
A disjoint-set data structure maintains a collection S = {S1,S2,…,Sk} of
disjoint dynamic sets. Each set is identified by a representative, which is
some member of the set. In some applications we will not be concerned with
the member that is used as the representative. All we care about is that if we
ask for the representative twice without modifying the set between the
requests, we get the same answer both times.
2
21.1 Disjoint-set operations
Note that this does not mean that for the same set we always get the same
representative. It may depend on the way that the set was generated, and not
only on the outcome.
3
21.1 Disjoint-set operations
We will support the following operations:
1.
MAKE-SET(x) creates a new set whose only member (and thus representative)
is x. Since the sets are disjoint, this x should not be in any other existing set.
2.
UNION(x,y) unites the dynamic sets that contain x and y, say Sx and Sy, into a
new set that is the union of these two sets. The two sets are assumed to be
disjoint prior to the operation. We “destroy” the sets Sx and Sy, removing them
from the collection. The representative of this set is any member of the new
set.
3.
FIND-SET(x) returns a pointer to the representative of the (unique) set
containing x.
4
21.1 Disjoint-set operations
We shall analyze the running time in terms of two parameters: n, the number
of MAKE-SET operations, and m, the total number of operations. Since the
sets are disjoint, each UNION operation reduces the number of sets by one.
Therefore, the number of such operations is bounded by n-1. Note that we
have m≥n, and we assume that the n MAKE-SET operations are the first n
operations performed.
5
21.1 Disjoint-set operations - application
The operations we defined are well suited for determining the connected
components of an undirected graph.
First we construct sets of vertices that belong to the same connected
component:
CONNECTED-COMPONENTS(G)
for each vertex v∈V[G]
do MAKE-SET(v)
for each edge (u,v)∈E[G]
do if FIND-SET(u) ≠ FIND-SET(v)
then UNION(u,v)
6
21.1 Disjoint-set operations - application
Then we can check for any two vertices if they belong to the same
component:
SAME-COMPONENT(u,v)
if FIND-SET(u) = FIND-SET(v)
then return TRUE
else return FALSE
7
21.2 Linked-list representation of disjoint sets
A simple way to implement a disjoint-set data structure is to represent each set by
a linked list. The first object in each list is the set’s representative. Each object in
the linked list contains a set member, a pointer to the object containing the next set
member, and a pointer back to the representative. Each list maintains pointers
head, to the representative, and tail, to the last object in the list. Within each list,
the objects may appear in any order (apart from the fact that the first object in the
list is the representative).
With this representation, both MAKE-SET and FIND-SET are easy, requiring O(1)
time.
The following figure is an example of two sets.
8
21.2 Linked-list representation of disjoint sets
9
21.2 Linked-list representation of disjoint sets
– union
The simplest implementation of UNION takes significantly more time than MAKE-SET
or FIND-SET. We perform UNION(x,y) by appending x’s list onto the end of y’s list.
We use the tail pointer for y’s list to quickly find where to append x’s list. The
representative of the new set is the element that represented the set containing y.
Since we must update the pointer to the representative for all members of x’s list,
we pay the a price linear in the length of x’s list.
10
21.2 Linked-list representation of disjoint sets
– union
In fact, it is not difficult to produce a sequence of m operations on n objects that
requires Θ(n2) time, as the following example shows:
Suppose we have n objects x1,x2,x3,…,xn. We execute the sequence of n MAKESET operations followed by n-1 UNION operations shown in the following figure, so
that m=2n-1. We spend Θ(n) time performing the n MAKE-SET operations. Because
the i-th UNION operation updates i objects, the total time turns to be Θ(n2).
Note that this also means that the average time for an operation is Θ(n).
11
21.2 Linked-list representation of disjoint sets
– union
12
21.2 Linked-list representation of disjoint sets – a
weighted union heuristic
The above implementation can be easily improved, if we always append the short
list to the long one. To do this, we should first include in the representative an extra
field that contains the length of its list (this is easily maintained). With this simple
weighted-union heuristic, a single UNION operation may still require Ω(n)
Ω
time (if
both sets have Ω(n) members), yet we can improve the running time of a sequence
of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET-s, so it
will take only O(m + nlogn) time.
13
21.2 Linked-list representation of disjoint sets – a
weighted union heuristic
Theorem: Using linked-lists representation of disjoint sets and the weighted-union
heuristic, a sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which
are MAKE-SET operations, takes O(m + nlogn) time.
Proof: for each object in a set of size n, we upper bound the number of time its
pointer to the representative may have changed. Consider the object x. Whenever
its pointer changed, x was part of the smaller set in a UNION operation. For
example, in the first time, after the union was carried out, there were at least 2
objects in the resulting set. After the second time, at least 4, and so on. This means
that the pointer could not have been updated more then logn times!
14
21.2 Linked-list representation of disjoint sets – a
weighted union heuristic
So the over-all time for the UNION operations is not more than O(nlogn). Since each
of MAKE-SET and FIND-SET takes O(1) time, the O(m + nlogn) bound for the entire
sequence of operations is proven.
15