1 Based on J. Shewchuk`s lecture on disjoint sets Disjoint

Based on J. Shewchuk's lecture on disjoint sets
Disjoint Sets
A disjoint sets data structure represents a collection of sets that are disjoint: that is, no item is found in
more than one set. The collection of disjoint sets is called a partition, because the items are partitioned
among the sets. Moreover, we work with a universe of items. The universe is made up of all of the items
that can be a member of a set. Every item is a member of exactly one set.
For example, suppose the items in our universe are nodes a, b, c, d, p, q, and r. We will limit ourselves to
two operations. The first is called a union operation, in which we merge two sets into one. The second is
called a find query, in which we ask a question like, "What set does b belong to?" More generally, a
"find" query takes an item and tells us which set it is in. We will not support operations that break a set
up into two or more sets (not quickly, anyway). Data structures designed to support these operations
are called partition or union/find data structures. Applications of union/find data structures include
maze generation and Kruskal’s algorithm for computing the minimum spanning tree of a graph.
Union/find data structures begin with every item in a separate set.
|a|b|c|d|p|q|r|
The query "find(q)" returns "q". Suppose we take the union of q and p, and call the resulting set p. The
query "find(q)" now returns "p".
|a|b|c|d|pq|r|
Similarly, if 1.) a, b, c, and d are united and the resulting set is called a; and 2.) r is united with the set p,
we have the partition the looks like the following:
|abcd|pqr|
List-Based Disjoint Sets and the Quick-Find Algorithm
The obvious data structure for disjoint sets looks like this.
- Each set references a list of the items in that set.
- Each item references the set that contains it.
With this data structure, find operations take O(1) time; hence, we say that list-based disjoint sets use
the quick-find algorithm. However, union operations are slow, because when two sets are united, we
must walk through one set and relabel all the items so that they reference the other set.
Tree-Based Disjoint Sets and the Quick-Union Algorithm
In tree-based disjoint sets, union operations take O(1) time, but find operations are slower. However, for
any sequence of union and find operations, the quick-union algorithm is faster overall than the quickfind algorithm. To support fast unions, each set is stored as a general tree. The quick-union data
structure comprises a forest (a collection of trees), in which each item is initially the root of its own tree;
then trees are merged by union operations. The quick-union data structure is simpler than the general
tree structures you have studied so far, because there are no child or sibling references. Every node
knows only its parent, and you can only walk up the tree. The true identity of each set is recorded at its
1
root. Union is a simple O(1) time operation: we simply make the root of one set become a child of the
root of the other set. For example, when we form the union of | p q | and | r |:
Set p now becomes a set containing three members. However, finding the set to which a given item
belongs is not a constant-time operation. The find operation is performed by following the chain of
parent references from an item to the root of its tree. For example, find(r) will follow the path of
references until it reaches p. The cost of this operation is proportional to the item’s depth in the tree.
These are the basic union and find algorithms, but we’ll consider two optimizations that make finds
faster. One strategy, called union-by-size, helps the union operation to build shorter trees. The second
strategy, called path compression, gives the find operation the power to shorten trees.
Union-by-size is a strategy to keep items from getting too deep by uniting sets intelligently. At each root,
we record the size of its tree (i.e. the number of nodes in the tree). When we unite two trees, we make
the smaller tree a subtree of the larger one (breaking ties arbitrarily).
Implementing Quick-Union with an Array
Suppose the items are non-negative integers, numbered from zero. We’ll use an array to record the
parent of each item. If an item has no parent, we’ll record the size of its tree. To distinguish it from a
parent reference, we’ll record the size s as the negative number -s. Initially, every item is the root of its
own tree, so we set every array element to -1.
The forest illustrated at left below is represented by the array at right.
This is a slightly inelegant way to implement tree-based disjoint sets, but it’s fast (in terms of the
constant hidden in the asymptotic notation). Let root1 and root2 be two items that are roots of their
respective trees. Here is code for the union operation with the union-by-size strategy.
2
Path Compression
The find() method is equally simple, but we need one more trick to obtain the best possible speed.
Suppose a sequence of union operations creates a tall tree, and we perform find() repeatedly on its
deepest leaf. Each time we perform find(), we walk up the tree from leaf to root, perhaps at
considerable expense. When we perform find() the first time, why not move the leaf up the tree so that
it becomes a child of the root? That way, next time we perform find() on the same leaf, it will run much
more quickly. Furthermore, why not do the same for every node we encounter as we walk up to the
root?
In the example above, find(7) walks up the tree from 7, discovers that 0 is the root, and then makes 0
the parent of 4 and 7, so that future find operations on 4, 7, or their descendants will be faster. This
technique is called path compression.
Let x be an item whose set we wish to identify. Here is code for find, which returns the identity of the
item at the root of the tree. Recall that items are numbered starting from zero.
3

Download Report

1 Based on J. Shewchuk`s lecture on disjoint sets Disjoint

Paperzz.com

Your Paperzz