Deterministic Graph-Clustering in External

Deterministic Graph-Clustering in
External-Memory with Applications to
Breadth-First Search?
Andreas Beckmann and Ulrich Meyer
Institute for Computer Science,
Goethe University,
60325 Frankfurt am Main, Germany
{beckmann,umeyer}@cs.uni-frankfurt.de
Abstract. Given an integer parameter µ > 1 and an undirected connected graph G with n vertices in external-memory we would like to cluster the vertices of G as follows: (1) there are O(n/µ) clusters, (2) each
cluster comprises Ω(µ) vertices, and (3) for any two vertices u and v
belonging to the same cluster, their distance in G is O(µ). Each vertex
must be assigned to exactly one cluster. If an arbitrary spanning tree T
for G is provided, a recent approach [14] based on Euler tours computes
such a clustering with O(sort(n)) I/Os; however, due to randomization
(2) only holds in expectation.
We provide a simple deterministic top-down clustering approach that
achieves the same I/O-bound as the randomized Euler tour method. Our
result also implies the first √deterministic dynamic breadth-first search
(BFS) algorithm with o(n/ B) I/O amortized update cost for sparse
undirected graphs in external-memory. Furthermore, we provide an external-memory graph class where our new clustering√method facilitates
deterministic static BFS to use a factor of nearly Θ( B/ log log B) less
I/O than the best previous approach where B is the block size.
We also consider a deterministic bottom-up clustering method which is
hierarchical and supports fast access to clusters for varying values of µ
without the need of reconstruction.
1
Introduction
Graph Clustering is typically defined as finding sets of “related” vertices in
graphs [18]. Obviously, the topic is extremely broad. A recent overview by Schaeffer [18] sheds some light on the various aspects of clustering. In this paper we
consider a specific distance-based type of clustering with a lower bound on the
number of vertices per cluster on a computation model for data sets too large
to fit in the computer’s main memory.
?
Partially supported by the DFG grant ME 3250/1-2, and by MADALGO – Center for
Massive Data Algorithmics, a Center of the Danish National Research Foundation.
1.1
Computation model
We consider the commonly accepted external-memory (EM) model of Aggarwal
and Vitter [1]. It assumes a two level memory hierarchy with faster internal memory having a capacity to store M data items (e.g., vertices or edges of a graph)
and a slow infinite disk. In an I/O operation, one block of data, which can store B
consecutive items, is transferred between disk and internal memory. The measure
of performance of an algorithm is the number of I/Os it performs. The number
of I/Os needed to read N contiguous items from disk is scan(N ) = Θ(N/B). The
number of I/Os required to sort N items is sort(N ) = Θ((N/B) logM/B (N/B)).
For all realistic values of N , B, and M , scan(N ) < sort(N ) N .
1.2
Clustering problem
Given an integer parameter µ > 1 and an undirected connected graph G(V, E)
with |V | = n vertices and |E| = m edges where n + m > M , we would like
to group the vertices of G in a way that each vertex is assigned to exactly one
cluster. Additionally, we require that: (1) there are O(n/µ) clusters, (2) each
cluster comprises Ω(µ) vertices, and (3) for any two vertices u and v belonging
to the same cluster, their distance in G is O(µ); here the distance denotes the
number of edges on a shortest path between u and v. A clustering with these
properties – in particular (2) – is central to a recent external-memory algorithm1
for both static and dynamic breadth-first search [14].
We aim to produce such a clustering deterministically using close to O(sort(n+
m)) I/Os. Ideally, after some precomputation, it should be possible to report the
cluster of an arbitrary vertex v for varying values of µ in O(1) I/Os without explicit re-clustering or space blow-up.
1.3
Previous work
A large number of results have appeared in the area of external-memory computing. Vitter provides the most recent and extensive overview [19]. Concerning our
concrete clustering problem, we are not aware of any I/O-efficient deterministic
solution. Previous clustering methods typically rely on an arbitrary spanning
tree T of the graph G. In case T is not part of the input, it can be obtained
using O((1 + log log (B · n/m)) · sort(n + m)) I/Os [5].
In the Euler tour based clustering approach of [13] each undirected edge of T
is first replaced by two oppositely directed edges. Then, in order to construct the
Euler tour around this bi-directed tree, each node chooses a cyclic order [6] of its
neighbors. The successor of an incoming edge is defined to be the outgoing edge
to the next node in the cyclic order. The tour is then broken at a special node
(say the root of the tree) and the elements of the resulting list are then stored
in consecutive order using an external memory list-ranking algorithm; Chiang
et al. [9] showed how to do this in sorting complexity. Thereafter, the Euler tour
1
A short review of this algorithm is given in the appendix.
is chopped into clusters of µ nodes and duplicates are removed such that each
node only remains in the first cluster it originally occurs; again this requires
a couple of sorting steps. By construction, the distance in G between any two
vertices belonging to the same cluster is bounded by µ − 1. Unfortunately, this
method could produce very unbalanced clusters: in fact Ω(n/µ) clusters may
contain only a single vertex each.
A recent randomized approach [14] repairs this deficiency as follows: Each
vertex v in the spanning tree Ts is assigned an independent binary random
number r(v) with P[r(v) = 0] = P[r(v) = 1] = 1/2. When removing duplicates
from the Euler tour, instead of storing v in the cluster related to the chunk with
the first occurrence of a vertex v, now we only stick to its first occurrence iff
r(v) = 0 and otherwise (r(v) = 1) store v in the cluster that corresponds to
the last chunk of the Euler tour v appears in. For leaf nodes v, there is only
one occurrence on the tour, hence the value of r(v) is irrelevant. Obviously, each
vertex is assigned to a cluster only once. For chunk size µ > 1 and each but the
last chunk, it was shown that the expected number of kept vertices is at least
µ/8. It is unclear how this Euler tour based approach could be de-randomized.
In order to deal with varying values of µ, hierarchical variants of the deterministic Euler tour based clustering from [13] have been considered in the context of
cache-oblivious [10] breadth-first search (BFS) [7], cache-oblivious single-source
shortest-paths (SSSP) [3], and cache-aware (external-memory) SSSP [15, 16].
Unfortunately, as all these extensions are based on the Euler tour approach they
also inherit its deficiency concerning underfilled clusters.
1.4
New results
In Section 2 we provide a simple deterministic top-down clustering approach
that meets the requirements (1)-(3) listed in Section 1.2: for an undirected
connected graph G(V, E) and fixed integer parameter 1 < µ < n it takes
O((1 + log log (B · |V |/|E|)) · sort(|V | + |E|)) I/Os. If an arbitrary spanning tree
for G is provided then the clustering can be obtained by only considering this
tree using O(sort(|V |)) I/Os.
Our√ result also implies the first deterministic dynamic BFS algorithm with
o(|V |/ B) I/O amortized update cost for sparse undirected graphs in externalmemory. Furthermore, we provide a graph class where our new clustering
method
√
facilitates deterministic static BFS to use a factor of nearly Θ( B/ log log B)
less I/O than the best previous approach where B is the block size.
In Section 3 we also consider a deterministic bottom-up clustering method
which is hierarchical and supports fast access to clusters for varying values of µ
without the need of reconstruction.
2
Top-Down Clustering
Our deterministic clustering approach uses an arbitrary spanning tree Ts for G,
rooted at an arbitrary vertex s. Before we dive into details let us review a few
known concepts.
For rooted trees with n vertices it has been shown [8] that the following
computations can be carried out using O(sort(n)) I/Os: directing the tree edges
(identifying parent and child), computing BFS levels, and computing BFS numbers. Furthermore, Euler tour techniques can be applied to compute the number
of descendants of each tree vertex [11, 12]: assign weight zero to edges leading
from parent to child, whereas edges from child to parent receive weight one.
Then for a tree vertex v (excluding the root) its subtree size (including v itself)
is given by the difference between the prefix sums of the edges (v, parent(v)) and
(parent(v), v) on the Euler tour.
Time-forward processing (TFP) [12] is a technique to traverse a directed
acyclic graph from its sources to its sinks. Initially a topological ordering of the
nodes is chosen for their visit, assigning them unique consecutive “time stamps”.
Information that needs to be transmitted to nodes visited later is stored as
messages in a priority queue using the recipient’s time stamp as priority and is
retrieved during processing of the recipient node.
In order to compute our deterministic clustering we shall apply time-forward
processing on the edges of Ts using the BFS numbers concerning source node s
as time stamps. There is a global counter next cluster id for the next cluster
number to be assigned, initialized with zero. Two kinds of messages can be sent
from a node to its children: grow (cluster id, credit) and finish(cluster id). The
process is initialized with sending grow (next cluster id++, µ − 1) to the root
node s.
When processing a received grow (cluster id, credit) signal at node v we first
record a tuple (cluster id, v) denoting that v belongs to the cluster with number cluster id. If the credit is positive then a grow (cluster id, credit − 1) signal is issued for every child of v (if any). Otherwise (i.e., credit zero) for each
child w of v we do the following: if the subtree size of w is at least µ then a
grow (next cluster id++, µ − 1) signal is issued for w; if the subtree size of w is
less than µ then a finish(cluster id) signal is sent for w.
A received finish(cluster id) signal at node v will be forwarded to all children
of v (if any) and the tuple (cluster id, v) will be recorded to remember that v
belongs to the cluster with number cluster id.
Once the time-forward processing comes to an end, the recorded tuples can
be sorted in order to group the nodes according to their clusters.
Lemma 1. For an undirected connected graph G with n vertices and given spanning tree Ts the Top-Down Clustering with parameter 1 < µ < n assigns each
vertex to exactly one cluster such that (1) there are O(n/µ) clusters, (2) each
cluster comprises Ω(µ) vertices, and (3) for any two vertices u and v belonging
to the same cluster, their distance in G is O(µ). The clustering can be obtained
using O(sort(n)) I/Os.
Proof. Observe that the nodes in a cluster produced by the approach logically
form a connected subtree of Ts : Starting in a subtree node v with
grow (next cluster id++, µ − 1) the cluster for v keeps on expanding via the respective children, grandchildren etc. until either distance level µ − 1 from v has
been reached (credit zero) or all of v’s descendants have been visited. Since new
clusters are only started at nodes v with subtree size at least µ, in both cases
the cluster already comprises Ω(µ) vertices. On the other hand, the distance
between any two vertices in the cluster is bounded by 2 · (µ − 1) so far.
The purpose of the finish signals is to attach small subtrees (having at most
µ − 1 vertices which would be too small to form own clusters) at the leaves of
the clusters created by grow . This can increase the diameter of those clusters by
at most another 2 · (1 + (µ − 2)) = O(µ).
By now we have shown (2) and (3). Finally, observe that each node is visited
exactly once during the time-forward processing which is why it is assigned to
exactly one cluster. Together with (2) this implies (1) as well. Let us turn to the
I/O bound:
All precomputation steps on Ts (rooting, BFS levels, BFS numbers, and subtree sizes) take O(sort(n)) I/Os. After that the adjacency lists of Ts can be
modified in a way that each original node index is augmented by a tuple consisting of its BFS number and its subtree size. Sorting the augmented adjacency
lists according to the BFS numbers subsequently allows to provide all children
information required during the time-forward processing by just scanning the
augmented adjacency lists. Other than that, the time-forward processing requires O(n) insert/delete min operations on an I/O-efficient priority-queue for
the implementation of sending and receiving signals; using buffer trees [4] this
amounts to another O(sort(n)) I/Os.
t
u
In the description of the top-down clustering above the clusters only consist
of spanning tree vertices without any edges. In typical applications like BFS
the clusters should also comprise the complete adjacency lists of the underlying graph vertices and store all these data items consecutively. This is easily
accomplished using a constant number of sorting and scanning steps [8, 13].
2.1
Applications of the Top-Down Clustering
In this section we discuss two applications of the top-down deterministic clustering introduced in Section 2. The obvious application is of course the substitution
of the randomized Euler tour based clustering (Section 2) in external-memory dynamic BFS [14]. As the clustering is the only part within dynamic BFS that previously required randomization (see the appendix), equivalent worst-case bounds
can now be achieved deterministically. In fact, obtaining a spanning tree deterministically, is still slightly more costly than randomized with high probability.
However, the same spanning tree can be reused for changing parameters of µ,
which is why the I/O cost for its construction is dominated by other larger terms:
Corollary 1. For general sparse undirected graphs of initially n nodes and O(n)
edges and Θ(n) edge insertions, dynamic BFS can be deterministically solved
using amortized O(n/B 2/3 + sort(n) · log B) I/Os per update.
The second application area is static external-memory BFS. The MM BFS
algorithm [13] (also reviewed in some more detail in the appendix) loads a cluster
Fig. 1. Euler tour based clustering of T (k, l, µ) with µ = 6.
with its adjacency lists as soon as at least one vertex of this cluster belongs to
the current BFS level. We demonstrate that there are graph classes and side
conditions where the deterministic top-down clustering applied within MM BFS
leads to considerably better performance than the deterministic Euler tour based
clustering.
Consider the following tree graph class T (k, l, µ): it consists of a root node s
to which k chains each consisting of l · µ − 1 nodes are attached, i.e. n = k · (l ·
µ − 1) + 1. The deterministic Euler tour based clustering approach started in s
will produce l clusters on the first chain including s, each fully equipped with
µ nodes. However, as the Euler tour oscillates between the end vertices of the
chains and the root vertex s, whenever the tour enters a new chain an offset of
two is encountered for the cluster layout, see Figure 1. Hence, for BFS started
in node s, immediately the first cluster is loaded (due to s at BFS level 0). Then
another k − 1 clusters are loaded for level 1. Subsequently, at least every other
level Θ(k/µ) new clusters need to be loaded (if µ is odd then for every BFS
level i ≥ 2 MM BFS needs to load at least bk/µc new clusters). We would like
to argue that for each cluster loaded, at least one I/O is needed. This argument
could be spoiled if several clusters that need to be fetched for the same BFS
level are actually co-located in a common disk block, especially if these clusters
happen to be much smaller than µ. Of course, due to the offset, the sizes of
the clusters (including adjacency lists) intersecting BFS levels 0, . . . , µ − 1 and
(l − 1) · µ, . . . , l · µ − 1 can range between Θ(1) and Θ(µ); and Θ(µ + k) for the
cluster containing source node s. However, all clusters “in the middle” of the
chains are of size Θ(µ), and the clusters for chain i are stored before those for
chain i + 1 etc. Thus, there is a choice for l with (l − 2) · µ = Ω(B) which makes
sure that the clusters loaded for the same BFS level all concern different disk
blocks and therefore cause at least one I/O each.
√
For sparse graphs like T (k, l, µ), MM BFS chooses µ = B in order to balance the worst-case Θ(n/µ) I/Os for loading clusters against the worst-case
Θ(n · µ/B) I/Os to keep loaded adjacency-lists in an external-memory sorted list
(hot pool) until they are actually needed. We will focus on settings where the
main memory size M is just big enough to keep
√ completely in main
√ the hot pool
memory, i.e. k · µ = Θ(M ), thus k = Θ(M/ B) = Ω( B): as a result, there
is no I/O cost for maintaining the adjacency-lists except for loading them once
with their clusters. On the other hand, there is also no main-memory capacity
left to keep the data of extra clusters that are read anyway for future reference
(like for example in the optimization presented in [2]). All other I/O costs of
MM BFS are bounded by O((log log B) · sort(n)). From the discussion above we
conclude:
√
√
Lemma 2. For T (k, l, B) there are choices k = Θ(M/ B) and l = Θ(n/M )
such that MM BFS with
√ source node s and the deterministic Euler tour based
clustering takes Θ(n/ B) + O((log log B) · sort(n)) I/Os.
Now we only have to argue that MM BFS for the same graph class and side
conditions like above performs much better when the Euler tour based clustering
is replaced by the new top-down clustering: the key difference is that in the topdown approach the clustering does not suffer from offsets.
The cluster√contain√
ing s comprises the complete BFS levels 0, . . . , µ−1 = B −1 of T (k, l, B) con√
cerning source node s. All other clusters contain contiguous subchains of µ = B
vertices each. Furthermore, all these chain clusters
√ are aligned, i.e.√they comprise
the adjacency-lists for vertices at BFS levels i · B, . . . , (i + 1) · B − 1, for all
1 ≤ i < l. And even better, all cluster IDs for vertices
√ of a common BFS level
are consecutive. As
a
consequence,
all
the
k
=
Θ(M/
B) clusters to be loaded
√
at BFS levels i · B for all i = 1, . . . , l − 1 can be accessed using just Θ(M/B)
I/Os on each of these levels. Hence, besides the other O((log log B) · sort(n))
I/O-costs of MM BFS we only face O(1 + k/B + l · (M/B)) I/Os √
to fetch the
clusters. The
latter
bound
amounts
to
O(n/B)
I/Os
with
n
=
k
·
(l
·
B − 1) + 1,
√
k = Θ(M/ B), and l = Θ(n/M ).
Lemma 3. If the Euler tour based clustering in Lemma 2 is replaced by topdown clustering the I/O performance improves to O((log log B) · sort(n)).
3
Hierarchical Bottom-Up Clustering
Recent external-memory graph algorithms like the dynamic BFS approach
√ in [14]
apply several clusterings of different µ parameter (say µ = 21 , 22 , 23 , . . . , B) for
the same input graph. Furthermore, these clusterings could often be re-used for
a number of subsequent iterations (especially for incremental dynamic BFS on
already connected graphs). Unfortunately, in the external-memory setting space
is by definition a scarce resource and hence chances are that it is practically
impossible to keep even a few different clusterings at the same time. On the
other hand, even though it may not harm the theoretical worst-case bounds,
recomputing the same clusterings over and over again could actually become the
dominating part of the I/O numbers we see in practice: for example in improved
static BFS of [2, 13], the preprocessing with graph clustering frequently takes
about as much time or even more than the actual BFS phase, although the latter
comes with a significantly higher asymptotic I/O-bound than the preprocessing
in the worst case.
The high level idea for our hierarchical clustering is rather easy: interpret
the bit representation hbr , . . . , bq+1 , bq , . . . , b1 i of the graph vertex numbers as
a combination of prefix hbr , . . . , bq+1 i and suffix hbq , . . . , b1 i. Different prefixes
denote different clusters, and for a concrete prefix (cluster) its suffixes denote
vertices within this cluster. Depending on the choice of q we get the whole
spectrum between few larger clusters (q big) or many small clusters (q small).
In particular we would like the
√ following to hold:
For any 1 ≤ µ = 2q ≤ B, (1) there are dn/µe clusters, (2) each cluster
comprises µ vertices (one cluster may have less vertices), and (3) for any two
vertices u and v belonging to the same cluster, their distance in G is O(µ). In
order to make this work the original vertex numbers will have to be carefully
re-chosen. Additionally, a look-up table is built that allows to find the sequence
of disk blocks for adjacency lists associated with a concrete prefix (cluster ID)
using O(1) I/Os.
u
u
v1
u
v2
Fig. 2. Sibling-Merge
v1
w1
v2
v1
w2
Fig. 3. Parent-Merge
Fig. 4. Complex merge example
In order to group close-by vertices into clusters (such that an appropriate
0
renumbering can take place) we start
l √with
m the spanning tree Ts rooted at source
vertex s. Then we work in p = log B phases, each of which transforms the
current tree Tsj into a new tree Tsj+1 having d|Tsj |/2e vertices. (The external√
memory BFS algorithms considered here only use clusters up to a size of B
vertices so this construction is stopped after p phases. The hierarchical clustering
approach imposes no limitations and can be applied for up to dlog ne phases for
other applications.) The tree shrinking is done using time-forward processing
from the leaves toward the root (for example by using negated BFS numbers
for the vertices Tsj ). Consider the remaining leaves v1 , . . . , vk with highest BFS
numbers and common parent vertex u in Tsj . If k is even then v1 and v2 will
form a cluster (and hence a vertex in Tsj+1 ), v3 and v4 will be combined, v5 and
v6 , etc. (sibling-merge, see Figure 2). In case k is odd, v1 will be combined with
u (parent-merge, Figure 3) and (if k ≥ 3) v2 with v3 , v4 with v5 , etc. Merged
vertices are removed from Tsj and therefore any vertex is a leaf at the time it
is reached by TFP, e.g. node w1 shown in Figure 4 was already consumed by
vertex w2 , so it is no longer available at the time v1 , v2 get processed. Thus,
each vertex of Tsj+1 is created out of exactly two vertices from Tsj , except for the
root which may only consist of the root from Tsj . Note that the original graph
vertices kept in a cluster are not necessarily direct neighbors but they do have
short paths connecting them in the original graph. The following lemma makes
this more formal:
Lemma 4. The vertices of Tsj form clusters in the original graph having size
size(j) = 2j (excluding the root vertex which may be smaller), maximum depth
(j)
depth
= 2j − 1, and maximum diameter diam
(j)
= 2j+1 − 2.
Proof. The clusters defined by Ts0 consist of exactly one vertex with diam
(0)
=0
(0)
and depth
= 0. For j > 0, three ways of merging a vertex v1 have to be
considered (with a sibling ({v1 , v2 }), the parent ({v1 , u}) or not merged at all
({v1 })), resulting in
(j)
depth
= max
= max


(j)

 depth ({v1 , v2 }), 

(j)
depth
({v , u}),
1


(j)


depth ({v1 })
n
o 

(j−1)
(j−1)

(v1 ), depth
(v2 ) , 
 max depth

(j−1)
(j−1)
(v1 ) + 1 + depth
depth


(j−1)
depth
(j−1)
= 2 · depth
(u),


(v1 )
+1=
j−1
X
2i = 2 j − 1
i=0
(j)
diam
= max
= max


(j)

 diam ({v1 , v2 }), 

(j)
diam
({v , u}),
1


(j)


diam ({v1 })


(j−1)
(j−1)
(j−1)

(v1 ) + 1 + diam
(u) + 1 + depth
(v2 ), 
 depth

(j−1)
depth


(j−1)
diam
(j−1)
= 2 · depth
(j−1)
=
(u),


(v1 )
+ diam
(j−1)
= 2j + diam
(j−1)
(v1 ) + 1 + diam
j
X
+2
2i = 2j+1 − 2
i=1
(j)
(j)
Note that while diam and depth denote the maximum diameter and depth
possible for a cluster with 2j vertices the actual values may be much smaller. t
u
The hierarchical approach produces a clustering with (1) Θ(n/µ) clusters
each having (2) size√Θ(µ) (excluding the root cluster) and (3) diameter O(µ) for
each 1 ≤ µ = 2q ≤ B.
Details on the construction of Tsj+1 . Two types of messages (connect(id) and
merged (id)) are sent during the time-forward processing. When vertices are combined, the vertex visited first sends the ID of the new vertex in Tsj+1 to the other
one in a merged () message. The connect() messages are used to generate edges
of Tsj+1 using the new IDs. The merged () message (if any) of a vertex is sorted
before the connect() messages of that vertex, so checking whether the current
minimal entry in the priority queue has received such a message can be done in
O(1).
The three algorithms ClusterTree(), ProcessVertex() and RemoveMergedVertices() show how information is passed by these messages during the TFP and
a set of edges is emitted. Out of this list of edges, Tsj+1 can be constructed in
O(sort(|Tsj+1 |)) I/Os. ClusterTree() is called once for each tree Tsj .
ClusterTree()
foreach vertex v in Tsj insert v with priority (−bfsnum(v)) into the priority queue P
while P is not empty
RemoveMergedVertices()
retrieve smallest element v and all connect()-messages for v from P
RemoveMergedVertices()
ProcessVertex(v, messages(v))
ProcessVertex(current vertex v, messages(v))
let v 0 = newid(v) be the ID of a new vertex in Tsj+1 that contains v
foreach vertex w in messages(v) emit an edge (v 0 , w) of Tsj+1
if v has an unmerged sibling u (u has minimal priority in P )
send message merged(v 0 ) to u
else if v has a parent
send message merged(v 0 ) to parent(v)
else
// v is root of a tree with odd number of vertices
RemoveMergedVertices()
while P is not empty and the smallest element has a merged() message
retrieve smallest element v, the merged(u) message and all connect()-messages for v from P
foreach vertex w in connect()-messages(v) emit an edge (u, w) of Tsj+1
send message connect(u) to parent(v)
Renumbering the vertices. The p phases of contracting the spanning tree each
contribute one bit of the new vertex number hbr , . . . , bp+1 , bp , . . . , b1 i. The construction of Tsj+1 defines the bit bj+1 for the vertices from Tsj to be 0 for the left
and 1 for the right child in case of a sibling-merge, to be 0 for the parent and 1
for the child in case of a parent-merge, and to be 1 for the root vertex (unless
it was already merged with another vertex). After p contraction phases Tsp with
c = dn/2p e vertices/clusters remain. The remaining (r − p) bits are assigned by
computing BFS numbers (starting with (2dlog2 ce − c) at the root) for the vertices
of Tsp . These BFS numbers are inserted (in their binary representation) as the
bits hbr , . . . , bp+1 i of the new labels for the vertices of Tsp .
To efficiently combine the label bits from different phases and propagate
them to the vertices of G again time-forward processing can be applied. The trees
Tsp , . . . , Ts0 will be revisited in that order and vertices of Tsj can be processed e.g.
in BFS order and each vertex v of Tsj will combine the label bits received from
Tsj+1 with the bit assigned during the construction phase and then send messages
with its partial label to the vertices in Tsj−1 it comprises. The resulting new
vertex numbering of G will cover the integers from [n0 −n, n0 ) where n0 = 2dlog2 ne .
There will be a single gap [0, n0 − n) (unless n is a power of two) that can be
easily excluded from storage by applying appropriate offsets when allocating and
accessing arrays. Thereafter the new labeling has to be propagated to adjacency
lists of G and the adjacency lists have to reordered (O(sort(n + m)) I/Os).
Assuming the adjacency lists are stored as an adjacency array sorted by
vertex numbers (two arrays, the first with vertex information, e.g. offsets into the
edge information; the second with edge information, e.g. destination vertices),
x
)
there is no need for an extra index structure to retrieve any cluster in O(1 + B
I/Os where x is the number of edges in that particular cluster. This is possible
because all vertices
of a cluster are numbered contiguously (for all values of
√
1 ≤ µ = 2q ≤ B) and the numbers of the first and last vertex in a cluster can
be computed directly from the cluster number (and cluster size µ).
Lemma 5. For an undirected connected graph G with n vertices, m edges, and
given spanning tree Ts the Hierarchical Bottom-Up Clustering for all cluster sizes
1 ≤ µ = 2q ≤ n can be computed using O(sort(n)) I/Os for constructing the new
vertex labeling and O(sort(n + m)) I/Os for the rearrangement of the adjacency
lists of G.
4
Conclusions
We have presented two deterministic clustering methods that guarantee a lower
bound on the number of vertices per cluster. The top-down approach of Section 2
may actually produce clusters holding ω(µ) vertices, in extreme cases (e.g. stars)
the whole graph may be just one cluster. This potential to overfill cluster is
actually a welcome feature in applications like EM BFS. Hence, it would be
desirable to apply the top-down approach recursively in order to come up with
a deterministic hierarchical version. The problem in here is that the clusters for
parameter µ may have significantly varying numbers of subclusters for parameter
µ/2 turning the vertex naming and cluster indexing problem into a serious issue.
While big clusters can be given small indices and concatenations of variable bit
lengths and stop bits can be used, we may still end up with an index range
considerably larger than |V |. Hence, in order to really access a cluster based
on a vertex prefix, one may have to use some kind of (external) search tree or
use hashing but that would reintroduce randomization. In order to avoid these
pitfalls, the bottom-up method of Section 3 always combines exactly two clusters
for µ/2 into one for µ, thus making the indexing really easy. Unfortunately, now
we loose the potential to create overfilled clusters. It would be very interesting to
see whether (and if yes how) the advantages of both methods could be combined.
References
1. A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related
problems. Communications of the ACM, 31(9):1116–1127, 1988.
2. D. Ajwani, U. Meyer, and V. Osipov. Improved external memory BFS implementation. In Proc. 9th workshop on Algorithm Engineering and Experiments
(ALENEX), pages 3–12. SIAM, 2007.
3. L. Allulli, P. Lichodzijewski, and N. Zeh. A faster cache-oblivious shortest-path
algorithm for undirected graphs with bounded edge lengths. In Proc. 18th Ann.
Symposium on Discrete Algorithms (SODA), pages 910–919. SIAM, 2007.
4. L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1–24, 2003.
5. L. Arge, G. Brodal, and L. Toma. On external-memory MST, SSSP and multi-way
planar graph separation. J. Algorithms, 53(2):186–206, 2004.
6. M. Atallah and U. Vishkin. Finding Euler tours in parallel. Journal of Computer
and System Sciences, 29(30):330–337, 1984.
7. G. Brodal, R. Fagerberg, U. Meyer, and N. Zeh. Cache-oblivious data structures
and algorithms for undirected breadth-first search and shortest paths. In Proc.
9th Scandinavian Workshop on Algorithm Theory (SWAT), volume 3111 of LNCS,
pages 480–492. Springer, 2004.
8. A. Buchsbaum, M. Goldwasser, S. Venkatasubramanian, and J. Westbrook. On
external memory graph traversal. In Proc. 11th Ann. Symposium on Discrete
Algorithms (SODA), pages 859–860. ACM-SIAM, 2000.
9. Y. J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamasia, D. E. Vengroff, and J. S.
Vitter. External memory graph algorithms. In Proc. 6th Ann. Symposium on
Discrete Algorithms (SODA), pages 139–149. ACM-SIAM, 1995.
10. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious
algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science,
pages 285–297. IEEE Computer Society Press, 1999.
11. J. JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, 1992.
12. A. Maheshwari and N. Zeh. A survey of techniques for designing I/O-efficient
algorithms. In Algorithms for Memory Hierarchies, volume 2625 of LNCS, pages
36–61. Springer, 2003.
13. K. Mehlhorn and U. Meyer. External-memory breadth-first search with sublinear
I/O. In Proc. 10th Ann. European Symposium on Algorithms (ESA), volume 2461
of LNCS, pages 723–735. Springer, 2002.
14. U. Meyer. On dynamic Breadth-First Search in external-memory. In Proc.
25th Ann. Symposium on Theoretical Aspects of Computer Science (STACS),
volume 08001 of Dagstuhl Seminar Proceedings, pages 551–560. IBFI, 2008.
http://drops.dagstuhl.de/opus/volltexte/2008/1316/.
15. U. Meyer and N. Zeh. I/O-efficient undirected shortest paths. In Proc. 11th Ann.
European Symposium on Algorithms (ESA), volume 2832 of LNCS, pages 434–445.
Springer, 2003.
16. U. Meyer and N. Zeh. I/O-efficient undirected shortest paths with unbounded edge
lengths. In Proc. 14th Ann. European Symposium on Algorithms (ESA), volume
4168 of LNCS, pages 540–551. Springer, 2006.
17. K. Munagala and A. Ranade. I/O-complexity of graph algorithms. In In Proc. 10th
Ann. Symposium on Discrete Algorithms (SODA), pages 687–694. ACM-SIAM,
1999.
18. S. E. Schaeffer. Graph clustering. Computer Science Review, 1(1):27–64, 2007.
19. J. S. Vitter. Algorithms and Data Structures for External Memory. Foundations
and Trends in Theoretical Computer Science. now Publishers, 2008.
A
Review of Static and Dynamic EM BFS Algorithms
In this section, we shortly review the external memory MM BFS algorithm by
Mehlhorn and Meyer [13] and its dynamic extension [14]. The MM BFS algorithm consists of two phases – a preprocessing phase and a BFS phase. In order
to keep the description simple, we only concentrate on sparse graphs and assume
|E| = O(|V |).
MM BFS. In the preprocessing phase, the algorithm has to produce a clustering.
This can be done using any of the Euler-tour based clustering approaches described in Section 1.3. Alternatively, the following randomized “parallel” cluster
growing approach can be applied. It first chooses |V |/µ random sources. Then
it iteratively constructs the local BFS levels around these sources concurrently.
In this process each non-source vertex is assigned to a source within closest distance. Since the parallel BFS fronts may touch each other but not overlap, the
process ends once all vertices have been assigned to a source, which happens
after O(µ · log |V |) iterations with high probability (whp). The new fringes of
the local BFS searches in iteration i are computed by sorting and scanning the
fringes of iterations i − 1 and i − 2 similarly to the BFS algorithm by Munagala and Ranade [17] with the difference that all required adjacency lists, i.e.
for all sources, are concurrently retrieved in a single scan of the graph representation. Once all vertices have been assigned to sources, this information is used
to actually form the clusters of adjacency lists and store them consecutively
on disk. Altogether, O(sort(|V |)) read- and write-I/Os are required to maintain
the fringes and eventually store the clusters whereas O(scan(|V |) · µ · log |V |)
read-I/Os suffice to retrieve the adjacency lists whp.
Thus, the randomized parallel clustering growing approach is actually worse
than the Euler tour based approaches since the vertices within a cluster may
have larger max. distances: d¯ = O(µ · log |V |) whp. versus d¯ = O(µ). Its only
advantage is that it does not need a spanning tree for the graph.
For the BFS phase, the key idea is to load whole preprocessed clusters into
some efficient data structure (hot pool) at the expense of few I/Os, since the
clusters are stored contiguously on disk and contain vertices in neighboring BFS
levels. This way, the neighboring nodes N (l) of some BFS level l can be computed
by scanning only the hot pool. Similar to the preprocessing step, removing the
nodes visited in levels l − 1 and l from N (l) by parallel scanning yields the nodes
in the next BFS level. However, the nodes in l and consequently their neighbors
in N (l) may belong to different clusters and unstructured I/Os are required to
import them once into the hot pool, from where they are evicted again once
they have been used to create the respective BFS levels. Maintaining the hot
¯ I/Os, whereas importing the clusters
pool itself requires O(scan(|V | + |E|) · d)
into it accounts for O(|V |/µ + sort(|V | + |E|)) I/Os.
Dynamic BFS. In the following we review the high-level ideas to computing BFS
on general undirected sparse graphs in an incremental or decremental setting.
Using the randomized Euler tour based clustering that ensures well-filled clusters
in expectation (Section 1.3) Meyer [14] proved an amortized high-probability
bound of O(|V |/B 2/3 + sort(|V |) · log B) I/Os per update under a sequence of
either Θ(n) edge insertions, but no deletions or Θ(|V |) edge deletions, but no
insertions.
For the BFS phase, let us consider the insertion of the ith edge (u, v) in an
incremental setting and refer to the graph (and the shortest path distances from
the source in the graph) before and after the insertion of this edge as Gi−1 (di−1 )
and Gi (di ). We first run an external memory connected component algorithm
in order to check if the insertion of (u, v) enlarges the connected component Cs
of the source node s. If so, we run the Munagala/Ranade BFS algorithm on
the nodes in the new component starting from node v (assuming w.l.o.g. that
u ∈ Cs ) and add di (u) + 1 (di (u) = di−1 (u) in this case) to all the distances
obtained.
Otherwise, we run the BFS phase of MM BFS, with the difference that
the adjacency list for v is added to the hot pool H when creating BFS level
max{0, di−1 (v)−α} of Gi , for a certain advance α > 1. By keeping the adjacency
lists sorted according to node distances in Gi−1 this can be done I/O-efficiently
for all nodes v featuring di−1 (v) − di (v) ≤ α. For nodes with di−1 (v) − di (v) > α,
we import the whole clusters containing their adjacency lists into H using unstructured I/Os. If it would require more than α · n/B random cluster accesses,
we increase α by a factor of two, compute a new clustering for Gi−1 with larger
chunk size µ = Θ(α) and start a new attempt by repeating the whole approach
with the increased parameters.
The decremental version is similar, except that rather than advancing the
adjacency lists, we let them be in hot pool for α BFS levels. For nodes v with
di (v) − di−1 (v) > α, we use random I/Os to get the cluster containing v’s
adjacency list later on.
The analysis relies on the fact that there can be only be very few updates
in which the BFS levels change significantly for a large number of nodes. If it
can be guaranteed that each cluster loaded into the pool actually carries Ω(α)
vertices, most of the updates will require few cluster fetches in early attempts
with small advance. Unfortunately, up to now the applied modified Euler-tour
based clustering provided this guarantee only in an expected sense, which is
why randomized reclustering was used after each phase to facilitate high probability bounds. With the new deterministic Top-Down Clustering approach of
Section 2 the same spanning tree can be used as long as the connectivity of the
graph does not change. Hence, even a more expensive deterministic spanning
tree construction can be tolerated.