Practical Distance Sensitivity Oracles for Directed Graphs

Practical Distance Sensitivity Oracles for Directed Graphs
Jong-Ryul Lee
Chin-Wan Chung
Dept. of Computer Science, KAIST
291 Daehak-ro, Yuseong-gu,
Daejeon, Korea
Dept. of Computer Science, KAIST
291 Daehak-ro, Yuseong-gu,
Daejeon, Korea
[email protected]
chung [email protected]
ABSTRACT
paper, we deal with an interesting variation of the shortest
distance computation in a graph, called the distance sensitivity problem. Given a graph G = (V, E) where V is the set
of nodes and E is the set of edges, the distance sensitivity
problem is to answer queries that ask to compute the distance of the shortest path from a node to another avoiding
the failed part of the network. A distance sensitivity oracle
is a data structure which is designed to answer such queries.
The distance sensitivity oracle can be importantly used for a
network where a small number of recoverable failures (node
or edge) can simultaneously occur. For example, in road
networks, a road construction, a demonstration, or a car
accident can make a road (edge) or a road junction (node)
unavailable temporarily. In general, such node or edge failures will be recovered after the construction is finished, the
demonstration is over, or clearing the scene of the accident.
In addition to such failures, a user may directly want to
know the distance of the shortest path from a point to another avoiding user-specific roads such as frequently clogged
roads. In computer networks, network devices (nodes) and
links (edges) between them can be broken down by various
reasons such as a cut network cable. Such broken devices
or links can be recovered by replacing a broken equipment
with a normal equipment.
The main motivation of a distance sensitivity oracle is
that it enables us to avoid stalling queries. Consider that
we have an algorithm answering a shortest distance query
with an updatable data structure which is called a fully dynamic distance oracle in the theory community. Because
node/edge failures change the network structure, when they
are given, we have to update the distance oracle to correctly
answer the query. After the failures are recovered, we also
need to re-update it with the same reason. Note that while
a distance oracle is being updated, any query cannot be
processed. That is, we should stall queries that are given
during the update. In addition, the cost for the update can
be very large. For example, given that an edge is deleted,
the worst-case update time of a fully dynamic exact distance
oracle with O(1) query time is trivially Ω(n2 ), because the
distance information of O(n2 ) node pairs can be updated
and any distance oracle with O(1) query time must require
at least Ω(n2 ) space for a directed graph [16]. In order to
avoid such stalling during an expensive update operation, a
distance sensitivity oracle is essential.
A distance sensitivity oracle is a data structure answering
queries that ask the shortest distance from a node to another
in a directed graph expecting node/edge failures. It has
been mainly studied in theory literature, but all the existing
oracles for an n-node m-edge directed graph with real-valued
edge weights suffer from ω(n2 ) cost for space and ω(mn) cost
for preprocessing time. Due to the expensive costs, they
cannot be used even in a graph with thousand-level nodes.
Motivated by this, we develop two practical distance sensitivity oracles for directed graphs. The first oracle is an
exact oracle for this problem. The core structure of this
oracle is a distance graph which is a small graph summarizing useful path information on the entire graph. The exact
oracle efficiently computes the answer of a query using the
distance graph. The second oracle is an approximate oracle
that is derived by sparsifying the distance graph of the exact
oracle with the control of the loss of accuracy. Compared
with the existing distance sensitivity oracles, our oracles are
more efficient in terms of preprocessing time and space.
We conduct extensive experiments to evaluate our oracles. Because all the existing distance sensitivity oracles
for directed graphs are not applicable to real-life datasets,
we use the Dijkstra’s algorithm and a recent fully dynamic
distance oracle as competitors. The experiments demonstrate that our oracles outperform the competitors in terms
of query time. Compared with the dynamic distance oracle,
our exact oracle is more efficient in terms of preprocessing
time and space. Our approximate oracle is more efficient
than the dynamic distance oracle in terms of space, while it
has comparable performance in terms of preprocessing time.
Along with such efficiency, our approximate oracle has accuracy similar to that of the dynamic distance oracle. To
the best of our knowledge, this is the first work that handles
practical graphs with million-level nodes.
1.
INTRODUCTION
Computing the shortest distance from a node to another
is one of the fundamental issues in graph theory. In this
Existing oracles. Most existing works for a distance sensitivity oracle deal with only a constant number of failures in a directed graph or a variable number of failures
in an undirected graph [8, 2, 3, 6, 9, 1, 13]. Weimann and
1
Yuster propose the only oracle which can answer queries
containing a variable number of failures in a directed graph
[19]. Their oracle is constructed with a randomized MonteCarlo algorithm and they provide a theoretical guarantee
for high success probability that it finds the exact answer
e 4−α ) 1 preprocessof a given query. Their oracle has O(n
3−α
2−2(1−α)/f )
e
e
ing time, O(n
) space, and O(n
) query time,
where f is a parameter for the maximum number of failures, f = o(log n/ log log n), and α is a parameter in (0, 1)
for specifying the trade-off among preprocessing time, space,
and query time. However, since the cost for space and the
cost for preprocessing time are too expensive, their oracle is
not applicable to real-life datasets. In fact, all existing distance sensitivity oracles for a directed graph with real-valued
edge weights suffer from expensive ω(n2 ) cost for space and
ω(mn) cost for preprocessing time. Due to the expensive
cost for preprocessing time and space, the existing distance
sensitivity oracles cannot be applied to a network with even
several thousand nodes.
the existing distance sensitivity oracles is applicable to reallife datasets, we do not use any of them as a competitor.
We demonstrate that our oracles are much faster than the
competitors in terms of query time for most cases. Along
with the efficient query time, the preprocessing time and the
space of our exact oracle are better than those of the fully
dynamic distance oracle. The space of our approximate oracle is much smaller than that of our exact oracle and the
fully dynamic distance oracle, while the preprocessing time
of it is somewhat behind that of the fully dynamic distance
oracle. The accuracy of our approximate oracle is similar to
that of the fully dynamic distance oracle. Meanwhile, our
approximate oracle is faster than our exact oracle in terms
of query time, but it requires more expensive preprocessing
time. This paper presents the first distance sensitivity oracle
that handles practical graphs with million-level nodes.
Outline. The rest of this paper is organized as follows.
In Section 2, we review existing and applicable oracles for
the distance sensitivity problem. The distance sensitivity
problem is formulated in Section 3. We propose the exact
distance sensitivity oracle in Section 4 and the approximate
distance sensitivity oracle in Section 5. Then, we evaluate
the efficiency and the accuracy of our oracles with various
real-life datasets in Section 6 and conclude the paper in Section 7.
Contributions. The main contribution of this paper is to
propose two practical distance sensitivity oracles for a variable number of edge failures: an exact distance sensitivity
oracle and an approximate distance sensitivity oracle. Our
exact oracle is based on two ingredients: a distance graph
and a k-path cover. A distance graph is a small graph summarizing some useful path information on the entire graph.
A k-path cover is a set of nodes such that all paths of k consecutive nodes in the entire graph include at least one of the
nodes in this set. These ingredients were already applied for
shortest path computation in [10], but not for the distance
sensitivity problem. The core structure of our exact oracle
is a distance graph constructed based on a k-path cover.
While the idea of using a k-path cover to construct a distance graph appears in [10], the density of a distance graph
is not considered in [10]. We propose an effective method
to construct a more sparse distance graph by discovering a
relationship between a k-path cover and an independent set
in graph theory. Note that this relationship provides theoretical foundation to use the distance graph based on an
independent set for our exact oracle. Based on the distance
graph, we design an efficient query algorithm for our exact oracle. This query algorithm is a Dijkstra-like algorithm
and it finds the correct answer for any query on the distance
graph instead of G.
Even if our exact oracle is faster than the Dijkstra’s algorithm, its distance graph is usually very dense. Thus,
we devise an effective way of sparsifying the distance graph
with a theoretical guarantee for accuracy. By using this
sparsified distance graph, given two parameters α ≥ 1 and
fS ≥ 0, we derive our approximate oracle that guarantees
an α-approximation ratio for any query having at most fS
failed edges. This oracle is more efficient than our exact oracle while achieving a bounded approximation ratio for such
a query. For any query having more failed edges than fS , it
still shows good accuracy.
We conduct extensive experiments to evaluate our distance sensitivity oracles. In the experiments, we use two
competitors: the Dijkstra’s algorithm and a recent fully dynamic approximate distance oracle in [17]. Because none of
2.
RELATED WORK
The distance sensitivity problem has been mainly studied
in theory literature. Demetrescu et al. [8] proposed two
oracles for handling a single network failure. Their first ora2
e 2 ) space, and O(mn
e
cle has O(1) query time, O(n
+ n3 )
preprocessing time. Their second oracle has O(1) query
1.5
e
time, O(n2.5 ) space, and O(mn
+n2.5 ) preprocessing time.
Bernstein and Karger [2] improved the preprocessing time
e √mn2 ) using random sampling with O(1) query time
to O(
e 2 ) space. They improved the preprocessing time
and O(n
e
again to O(mn)
based on a deterministic construction algorithm [3] with the same query time and the same space. For
integer edge weights in the interval [−M, M ], Weimann and
Yuster [18] proposed an oracle, constructed by a randomized
e 1+α ) query time, O(n
e 3−α ) space,
algorithm, that has O(n
ω+1−α
e
and O(M n
) preprocessing time, where ω < 2.373
is the matrix multiplication exponent and α is a parameter in (0, 1). By improving the oracle in [18], Grandoni
e
and Williams [11] proposed an oracle has O(M
nω+1/2 +
ω+α(4−ω)
1−α
e
Mn
) preprocessing time, O(n
) query time, and
e 2.5 ) space. For dual network failures, Duan and Pettie
O(n
e
e 2)
[9] proposed an oracle that requires O(1)
query time, O(n
space, and ω(mn) polynomial preprocessing time. For a
variable number of failures, Weimann and Yuster proposed
an oracle constructed by a randomized algorithm in [18, 19].
This oracle was introduced first in [18] and completed in
[19]. As we reviewed, all the existing distance sensitivity
oracles for a directed graph with real-valued edge weights
suffer from ω(n2 ) space and ω(mn) preprocessing time.
There is only one study for the distance sensitivity problem with some experiments. Qin et al. [13] proposed a
distance sensitivity oracle that works only for undirected
unweighted graphs with any single edge failure. This work
can be applied only for a limited class of graphs and the
datasets used in its experiments are rather small.
1
e
f (n) = O(g(n))
if f (n) = O(g(n) logk (n)) for some constant k > 0.
2
One of non-trivial ways for handling the distance sensitivity problem is to use an existing fully dynamic distance
oracle [12, 7, 15, 14, 17]. There is a recent fully dynamic
approximate distance oracle, named LCA, based on landmarks without any theoretical guarantee for accuracy [17].
The query time of this oracle is O(lD) where l is the number
of landmarks and D is the diameter of a graph, while the update time is O(l(m+n log n)). We will compare our distance
sensitivity oracles with this fully dynamic distance oracle in
the experiments to show that a fully dynamic distance oracle
is inappropriate for solving the distance sensitivity problem.
3.
the k-path cover. Note that for all theorems and lemmas in
this section and the next section, their proofs are given in
Appendix.
Table 1 summarizes frequently used notations in Section 4
and Section 5.
Table 1: Frequently used notations
G
D
e
D
nout
G (v)
nin
G (v)
P (s, t, F )
d(s, t, F )
d(s, t)
d(P )
dD (u, v)
PROBLEM DEFINITION
To represent a general network, we consider a directed
graph G = (V, E) where V is the set of nodes and E is the set
of edges. This is because there are many cases that an edge
failure can be unidirectional. For example, road networks
can be represented as an undirected graph, but edge failures
(road construction, demonstration, or car accident, etc.) can
be unidirectional. Given a graph G, for any node v in G, the
set of inbound neighbors of v is denoted as nin
G (v) and the
set of outbound neighbors of v is denoted as nout
G (v). In
addition, we denote the empty set as ∅.
We define that a path P is a sequence of edges in E. Even
though P is a sequence, when we focus on its elements, not
the order, we handle it as a set of edges with set operations
such as intersection. Every edge (u, v) ∈ E is associated
with a weight, denoted as w(u, v), which is a non-negative
real value. Given
P any path P , the distance of P is defined
to be d(P ) = (u,v)∈P w(u, v). In addition, the length of
P , denoted as |P |, is defined to be the number of hops of
P . For any two nodes u, v ∈ V and a set of edges E ∗ ⊆ E,
the shortest (distance) path from u to v in the graph (V, E \
E ∗ ) is denoted as P (u, v, E ∗ ) and its distance is denoted as
d(u, v, E ∗ ).
4.1
4.1.1
the input graph
a distance graph
a sparsified distance graph
the outbound neighbors of node v in graph G
the inbound neighbors of node v in graph G
the shortest path from s to t in the graph (V, E \ F )
the distance of P (s, t, F )
the distance of P (s, t, ∅) (= d(s, t, ∅))
the distance of path P
the shortest distance from u to v in D
Distance Graph-based Query Processing
Motivation and Definition
Consider a set of nodes C ⊆ V , a graph D having C as
a node set, and a query (s, t, F ). Suppose that for some
1 ≤ L ≤ |P (s, t, F )| − 1, there are nodes u1 , u2 , ..., uL ∈
C which are included in P (s, t, F ). Then, we can divide
P (s, t, F ) into L + 1 sub-paths: P (s, u1 , F ), P (u1 , u2 , F ),
..., P (uL , t, F ). Suppose that d(s, u1 , F ) and d(uL , t, F )
are known and for each sub-path P (ui , uj , F ) such that
1 ≤ i < j ≤ L, there exists the edge (ui , uj ) in D. Then, by
letting the weight of every edge (x, y) in D be d(x, y, F ), we
can compute d(s, t, F ) on D instead of G by using a Dijkstralike algorithm. If we can efficiently compute d(s, u1 , F ),
d(uL , t, F ), and D, this entire procedure can be much faster
than computing d(s, t, F ) via the Dijkstra’s algorithm on G.
6
3
5
1
Definition 3.1 (Distance Sensitivity Problem). Given a directed graph G = (V, E), the distance sensitivity problem is
a query (s, t, F ), where s is the start node, t is the destination node, and F is a set of at most f failed edges, asking
to compute d(s, t, F ).
7
10
2
4
8
(a) An input graph
5
One trivial solution for the distance sensitivity problem
is the Dijkstra’s algorithm. This does not utilize any preprocessed structure and its running time is O(m + n log n)
with the Fibonacci heap. Another trivial solution is storing
the answer of every possible query. It takes O(n2+2f ) space,
since O(n2 ) comes from all pairs of nodes in V and O(n2f )
comes from all possible combinations of at most f edges
among O(n2 ) edges. Thus, a non-trivial distance sensitivity
oracle should be faster than the Dijkstra’s algorithm with a
data structure of size lower than O(n2+2f ).
4.
9
9
2
7
4
(b) The distance graph
of the input graph with
{2, 4, 5, 7, 9} as the node
set
EXACT ORACLE
Figure 1: Examples of an input graph and a distance graph
This section describes our exact oracle, called DIStance
graph-based Oracle (DISO), for the distance sensitivity problem. DISO is based on a compact data structure, called a
distance graph D. At a high level, given a query (s, t, F ),
DISO uses D to compute d(s, t, F ) using a Dijkstra-like algorithm. In the following subsections, we first formally define
the distance graph and explain how to use it for answering queries. Then, we show an effective strategy to choose
the node set of a distance graph based on the concept of
In order to make this idea concrete, we formally define
the distance graph which is the core structure of DISO as
follows:
Definition 4.1 (Distance Graph). Given any set of nodes
C ⊆ V , a distance graph is defined to be a graph D =
(VD , ED ) where VD = C and ED ⊆ C × C. For any node
pair (u, v) ∈ C × C, (u, v) is included in ED , if there exists at least one path from u to v in G which does not pass
3
4.1.3
through any other node in C. Each edge (u, v) ∈ ED is associated with a weight denoted as wD (u, v). For any two nodes
u, v ∈ VD , the shortest distance from u to v in D is denoted
as dD (u, v).
The procedure of the Dijkstra’s algorithm to compute
shortest distances with a priority queue is called the Dijkstra process. In the Dijkstra process, a node popped from
the priority queue is called a settled node.
Because the failed edge set is given in querying, sometimes we need to locally compute single source shortest distances in our query algorithm. For this, we simply modify the Dijkstra’s algorithm to avoid traversing beyond a
node x ∈ C if x is settled and is not the source. This
modified version of the Dijkstra’s algorithm is called the
bounded Dijkstra’s algorithm. For any node v ∈ V , let
us denote a set of nodes except v which are included in
C and settled in the bounded Dijkstra’s algorithm from v
in the outbound direction as Âout (v). Similarly, let us denote a set of nodes except v which are included in C and
settled in the bounded Dijkstra’s algorithm from v in the
inbound direction as Âin (v). For example, when {(9, 10)}
is given as a failed edge set for the input graph of Figure 1,
Âout (1) = {2, 5} and Âin (10) = {4, 7}. It is easy to see
that the distance from a node u to a node v computed by
the bounded Dijkstra’s algorithm on the graph (V, E \ F ) is
ˆ v, F ). Given a query (s, t, F ), we will use the
actually d(u,
bounded Dijkstra’s algorithm to maintain the edge weighting scheme for a distance graph and to get supersets of
Aout (s) and Ain (t) based on the following lemma.
For the rest of this section, consider that we have a set
of nodes C ⊆ V and the distance graph D having C as the
node set. Figure 1 shows an example input graph and the
distance graph of it derived from a node set. Intuitively, any
path from u to v in G that does not pass through any other
node in C is mapped to edge (u, v) in distance graph D. In
Figure 1, there are three paths in the input graph mapped to
the edge (4, 7) in the distance graph:h(4, 7)i, h(4, 8), (8, 7)i,
and h(4, 8), (8, 10), (10, 7)i.
Given a query (s, t, F ), let us define the out-access nodes
of s as nodes v ∈ C such that v 6= s and P (s, v, F ) does
not contain any node in C as an intermediate node. The
set of such nodes is denoted as Aout (s). A property of the
out-access nodes is that by definition, all shortest paths in
the graph (V, E \ F ) from s to any nodes in C must pass
through at least one of its out-access nodes. Similarly, we
define the in-access nodes of t as nodes v ∈ C such that
v 6= t and P (v, t, F ) does not contain any node in C as
an intermediate node. The set of such nodes is denoted as
Ain (t). All shortest paths from any nodes in C to t in the
graph (V, E \ F ) must pass through at least one of its inaccess nodes. For any two nodes u ∈ Aout (s) and v ∈ Ain (t),
we define,
du,v
D (s, t, F )
= d(s, u, F ) + dD (u, v) + d(v, t, F ).
Lemma 4.2. For any node v ∈ V , Aout (v) ⊆ Âout (v) and
Ain (v) ⊆ Âin (v).
(1)
4.1.4
Then, we define,
dD (s, t, F ) = min(u,v)∈A(s,t) du,v
D (s, t, F )
The Query Algorithm
Consider that a query (s, t, F ) is given. For easy understanding, we provide the outline of the query algorithm of
DISO.
(2)
where A(s, t) is a set of node pairs (u, v) such that u ∈
Aout (s) and v ∈ Ain (t).
4.1.2
A Core Tool:Bounded Dijkstra’s Algorithm
1. Access nodes computation: This step gets Âout (s)
and Âin (t) using the bounded Dijkstra’s algorithm on
the graph (V, E \ F ).
Construction
Let us discuss the construction of distance graph D given
node set C. The topological structure of the distance graph
can be computed by executing a graph traversal algorithm
for each node of C in preprocessing. Then, we discuss how
to give a weight wD (u, v) to each edge (u, v) ∈ ED . Given
any two nodes x ∈ V and y ∈ C, for any edge set E ∗ ⊆ E,
let us denote the shortest path from x to y in the graph
(V, E \E ∗ ) that does not pass through any node in C \{x, y}
ˆ y, E ∗ ). When E ∗ = ∅,
as P̂ (x, y, E ∗ ) and its distance as d(x,
i.e., for the graph (V, E), P̂ (x, y, E ∗ ) becomes P̂ (x, y, ∅).
P̂ (x, y, E ∗ ) may not be P (x, y, E ∗ ). For example, in Figure 1, consider that w(1, 3) > 1 and the other edge weights
in the input graph are equal to 1. Then, P̂ (1, 5, {(1, 5)}) is
h(1, 3), (3, 5)i and P (1, 5, {(1, 5)}) is h(1, 2), (2, 5)i. On the
other hand, it is noteworthy that if y ∈ Aout (x), P̂ (x, y, E ∗ )
ˆ y, E ∗ ) = d(x, y, E ∗ )). Based on
equals P (x, y, E ∗ ) (i.e., d(x,
this, for any edge (u, v) ∈ ED and any failed edge set F ⊆ E,
ˆ v, F ). This weighting scheme gives
we set wD (u, v) to d(u,
the following lemma.
2. Distance computation: By using Âout (s) and Âin (t),
this step computes dD (s, t, F ). It includes maintaining
the edge weighting scheme for D.
Computing access nodes. Given a query (s, t, F ), it is
not trivial to exactly compute Aout (s) and Ain (t), unless we
run the Dijkstra’s algorithm from s and from t in G until all
nodes in C are settled. Instead, we use Âout (s) and Âin (t)
that can be efficiently computed by running the bounded
Dijkstra’s algorithm from s in the outbound direction and
from t in the inbound direction, respectively. Let us define
that Â(s, t) is a set of node pairs (u, v) such that u ∈ Âout (s)
ˆ u, F )
and v ∈ Âin (t). For any pair (u, v) ∈ Â(s, t), d(s,
ˆ
and d(v, t, F ) are also computed when Âout (s) and Âin (t)
are computed, respectively. Because A(s, t) ⊆ Â(s, t) by
Lemma 4.2, we substitute A(s, t), d(s, u, F ), and d(v, t, F ) in
ˆ u, F ), and d(v,
ˆ t, F ) to compute
(1) and (2) with Â(s, t), d(s,
dD (s, t, F ). Note that this substitution does not harm the
correctness of the query algorithm as it will be proved.
Lemma 4.1. For any two nodes u, v ∈ C and any failed
edge set F ⊆ E, the edge weighting scheme guarantees that
dD (u, v) = d(u, v, F ).
Maintaining the edge weighting scheme. Because it
ˆ v, F ) for every edge (u, v) ∈
can be expensive to compute d(u,
ED given a query (s, t, F ), we propose an efficient maintenance technique for the edge weights in D called lazy recomputation. For each edge (u, v) in ED , wD (u, v) is set
We will explain how to efficiently maintain the edge weighting scheme later in detail.
4
ˆ v, ∅) in preprocessing. wD (u, v) will be recomputed
to d(u,
ˆ v, F ) in querying, if P̂ (u, v, ∅) contains some failed
to d(u,
edge and wD (u, v) is necessary to compute a query answer.
ˆ v, ∅) as the weight of (u, v) in D,
Otherwise, we can use d(u,
ˆ
ˆ
because d(u, v, F ) = d(u, v, ∅) when P̂ (u, v, ∅) does not contain any failed edge. For such recomputations, we use the
bounded Dijkstra’s algorithm on the graph (V, E \ F ). Because the bounded Dijkstra’s algorithm can act like a single
source shortest path algorithm, we actually consider recomputations over nodes, not edges. In what follows, we explain
how to efficiently implement the lazy recomputation.
Let us define that for any node u ∈ C, if u has an edge
(u, v) in D such that P̂ (u, v, ∅) contains at least one of failed
edges in F , u is called an affected node. The query algorithm
of DISO is a Dijkstra-like algorithm, which means it includes
the Dijkstra process. For any settled node u ∈ C in the
query algorithm, if u is an affected node, the weights of all
edges from u in D are recomputed by the bounded Dijkstra’s
algorithm from u in the outbound direction. That is, for any
ˆ
nodes v ∈ nout
D (u), wD (u, v) is set to d(u, v, F ).
The lazy recomputation must be faster than recomputing
the weights of edges from all affected nodes. However, because finding affected nodes itself can be expensive, in order
to more efficiently implement the lazy recomputation, we
need some advanced technique for it. Motivated by this, we
propose an efficient way of computing a node set containing
affected nodes.
d(s, t, F ) is presented in Algorithm 2. For any node v ∈
C ∪{t}, dD (v) keeps the minimum among observed distances
from s to v. In Line 2, Âout (s) and Âin (t) are computed by
the bounded Dijkstra’s algorithm. After that, the suspicious
nodes are computed in Line 3. Lines 4-22 describe the procedure to compute dD (s, t, F ) like the Dijkstra’s algorithm.
This procedure can be efficiently implemented with a priority queue. While the Dijkstra’s algorithm starts with a
single node, this procedure starts with nodes in Âout (s). In
Lines 18-19, if v is a suspicious node, the edge weights associated with v are recomputed by the bounded Dijkstra’s
algorithm. The other part in Lines 4-22 is identical to the
Dijkstra’s algorithm. After t is settled, the algorithm reˆ t, F ). Note that
turns the minimum between dD (t) and d(s,
ˆ
d(s, t, F ) can be computed together when Âout (s) and Âin (t)
are computed.
For example, suppose that all edge weights of the graph in
Figure 1 are equal to 1 and a query (1, 10, {(5, 7)}) is given.
Then, the query algorithm computes Âout (1) = {2, 5}, Âin (10) =
{4, 7, 9}, and S = {2, 4, 5}. After that, it finds P (1, 10, {(5, 7)}) =
h(1, 5), (5, 4), (4, 7), (7, 10)i. It should be noted that ∞ is assigned to wD (5, 7) right after 5 is settled in the query algorithm.
Algorithm 2: Distance Graph-based Query Algorithm
input
output
Lemma 4.3. Given a failed edge set F ⊆ E, for any node
v ∈ C, v is not an affected node, if there is no edge (x, y) ∈ F
such that there exists some path from v to x in G which does
not contain any node in C \ {v, x}.
1
2
3
4
5
From Lemma 4.3, we define that a node v ∈ C is a suspicious node, if there is some edge (x, y) ∈ F such that there
exists some path from v to x in G which does not contain any
node in C \ {v, x}. By definition, all affected nodes are also
suspicious nodes, so we can consider to use the suspicious
nodes instead of the affected nodes in the query algorithm.
Lemma 4.3 implies that we can efficiently compute all suspicious nodes by conducting multiple times a graph traversal
method such as the depth-first search. Based on this, the
algorithm to compute suspicious nodes is described in Algorithm 1. Let us denote the set of all paths from a node u
to a node v in G as π(u, v). In Algorithm 1, Sx is the set of
nodes each of which is a suspicious node due to (x, y) ∈ F
and it can be efficiently computed by using the depth-first
search from x in the inbound direction once. Note that in
the depth-first search, if it visits some node v in C, it does
not traverse beyond v, because we do not have to consider
paths containing any node in C as an intermediate node.
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Algorithm 1: Find Suspicious(C, F )
input
1
2
3
4
5
6
: s: the start node, t: the destination node, F : the
failed edge set
: d(s, t, F ): The shortest distance from s to t in the
graph (V, E \ F )
begin
Compute Âout (s) and Âin (t);
S :=Find Suspicious(C, F );
for v ∈ C ∪ {t} do
dD (v) := ∞;
for v ∈ Âout (s) do
ˆ v, F );
dD (v) := d(s,
dD (s) := 0;
Initialize a priority queue Q with the nodes in Âout (s);
while Q is not empty do
v := arg minu∈Q dD (u);
Pop v from Q;
if v = t then
break;
if v ∈ Âin (t) then
Push t into Q if dD (t) = ∞;
ˆ t, F )};
dD (t) := min{dD (t), dD (v) + d(v,
if v ∈ S then
ˆ x, F ) for any (v, x) ∈ ED ;
wD (v, x) := d(v,
for n ∈ nout
D (v) do
Push n into Q if dD (n) = ∞;
dD (n) := min{dD (n), dD (v) + wD (v, n)};
ˆ t, F )};
return min{dD (t), d(s,
For simplicity, Algorithm 2 is constructed as a variation
of the classic (not bidirectional) Dijkstra’s algorithm. If we
construct this query algorithm based on a more efficient online shortest path algorithm like the bidirectional Dijkstra’s
algorithm, the query algorithm will run faster.
: C: the node set of a distance graph, F : the set of
failed edges
: S: the set of suspicious nodes
output
begin
S := ∅;
for (x, y) ∈ F do
Sx := {v ∈ C|∃P ∈ π(v, x) s.t. P ∩ (C \ {v, x}) = ∅};
S := S ∪ Sx ;
Correctness. We now prove the correctness of this query
algorithm.
return S;
Lemma 4.4. For any query (s, t, F ), if P (s, t, F ) contains
some node in C,
Putting it together. Based on the techniques we introduced, the entire query algorithm of DISO to compute
dD (t) = dD (s, t, F ) = d(s, t, F ).
5
(3)
By definition, for any query (s, t, F ), if P (s, t, F ) does not
ˆ t, F ) = d(s, t, F ). This proposicontain any node in C, d(s,
tion and the previous lemma directly imply the correctness
of the query algorithm.
they do not consider the sparsity of the distance graph (the
overlay multigraph in [10]) derived from the k-path cover.
Thus, the distance graph derived from their k-path cover
may be very dense, and then it harms the efficiency of DISO.
Motivated by this, we propose an effective method to construct a distance graph considering its density, and it is
based on the independent set, which is a well-known concept in graph theory. An independent set I is a set of nodes
in V such that no two nodes in I are adjacent. Let us explain
how a distance graph for G is derived from I. Initially, we
have a graph GI = (VI , EI ) where VI = V and EI = E. For
each node v ∈ I, we remove v from GI and edges incident
out
to v. Then, we add edges (x, y) ∈ nin
G (v) × nG (v) into GI
if (x, y) 6∈ EI . After this process is finished, because I is an
independent set, it is easy to see that VI = V \ I and VI is
a minimal 2-path cover in G, and GI is a distance graph.
Given a maximal independent set computation method
GetIS (·), we construct a distance graph with a more elaborate way as follows. Consider a distance graph Di = (Vi , Ei )
for 0 ≤ i ≤ t − 1 where t is some parameter larger than 0.
Initially, D0 = G. We iteratively compute an independent
set ISi by calling GetIS (Di ) and construct a new distance
graph Di+1 for G with Vi+1 = Vi \ ISi . This procedure is
described in Algorithm 3 and gives the following lemma.
Theorem 4.1. For any query (s, t, F ), the query algorithm
of DISO correctly finds d(s, t, F ).
Cost analysis. We denote the average cost for the bounded
Dijkstra’s algorithm as cB . cB depends on the structure of
G and C. It is easy to see that Lines 2-9 require O(cB +
|C| log |C|) where O(|C| log |C|) comes from building priority queue Q. The procedure described in Lines 10-22
corresponds to the Dijkstra process. This procedure uses
O(|ED | + |C| log |C| + |S|avg cB ) time where |S|avg is the
average of |S|. Thus, the total query time is O(|ED | +
|C| log |C|+|S|avg cB ). Meanwhile, it is straightforward that
the preprocessing time to construct a distance graph is O(n|C|)
when C is given. The total space complexity required by this
query algorithm is O(|ED |).
4.2
Toward a Good Distance Graph
A good distance graph for DISO has a small number
of nodes and edges. This subsection presents an effective
method for constructing such a distance graph for DISO.
4.2.1
Algorithm 3: IS-based Path Cover
A k-Path Cover and its Effect
input
Because a set of nodes identifies an unique distance graph,
we focus on choosing a node set for a good distance graph.
For this purpose, we first define a k-path cover.
1
2
3
Definition 4.2 (k-Path Cover). For some integer k > 0, a
k-path cover C is a set of nodes such that all simple paths
consisting of k nodes in G pass though at least one of the
nodes in C.
4
5
6
For example, the node set of the distance graph in Figure 1
is a 3-path cover in the input graph. It is easy to see that all
paths consisting of 3 nodes in the input graph pass through
at least one of 2,4,5,7, and 9.
There can be many k-path covers in G, but we focus on a
minimal k-path cover. Our strategy to get a good distance
graph starts with selecting a minimal k-path cover as the
node set of a distance graph.
Let us examine how a minimal k-path cover C affects
DISO, when the distance graph derived from C is used for
DISO. The most important property of C is that the length
of the longest path between any two nodes of C in G is
bounded by k. It guarantees that the bounded Dijkstra’s
algorithm from a node v searches at most k hops from v.
When k is much smaller than the diameter of G, cB , which is
the average cost for the bounded Dijkstra’s algorithm, must
be smaller than that for the Dijkstra’s algorithm. Without
such a guarantee, cB is asymptotically the same as the cost
for the Dijkstra’s algorithm. Meanwhile, it is easy to see
that when k is large, the size of C becomes small. Thus,
there is some trade-off for the query time depending on k,
so we find an appropriate value for k by increasing k until
the query time is converged to minimum in experiments.
4.2.2
: G: the input graph, t: the parameter for the number
of iterative steps
: Dt : the distance graph
output
begin
Copy G to D0 ;
for i = 0 to t − 1 do
ISi := GetIS (Di );
Di+1 := a distance graph with Vi \ ISi ;
return Dt ;
Lemma 4.5. For all integer parameter t ≥ 1, Vt is a minimal 2t -path cover.
Independent Set Selection. The remaining part for the
distance graph construction is to compute the independent
set in Algorithm 3. For each step 0 ≤ i ≤ t − 1 in Algorithm 3, it computes an independent set for Di that is used
to derive Di+1 . Thus, if we can find an independent set such
that the number of nodes and the number edges in Di+1 are
minimized, then the output distance graph eventually has a
small number of nodes and edges. However, finding such an
independent set is a hard problem, because finding the maximum independent set is NP-hard, which only minimizes the
number of nodes in Di+1 . Based on this background, we propose a greedy method to find a maximal independent set I
such that the number of edges in the distance graph derived
from I is effectively reduced.
Our greedy method incrementally constructs an independent set and its derived distance graph as follows. Given
an input distance graph Di = (Vi , Ei ), our greedy method
starts with initializing I as the empty set and constructing
a graph DI = (VI , EI ) where VI = Vi and EI = Ei . Then, it
iteratively picks a node v from DI such that v is not adjacent
to any node in I on Di and v minimizes the following:
out
σ(v) = | {(x, y) ∈ N P air(v) \ EI } | − nin
Di (v) + nDi (v) ,
Selecting a Better k-Path Cover based on an
Independent Set
Computing the minimum k-path cover is NP-hard [4].
Funke et al. [10] introduced an efficient and effective heuristic method to compute a minimal k-path cover. However,
out
where N P air(v) denotes nin
Di (v) × nDi (v). It is easy to see
that σ(v) is the net contribution of v to the number of edges
6
in the distance graph derived from the resulting independent
set. After removing v, new edges (x, y) ∈ N P air(v) \ EI are
added into DI such that (x, y) 6∈ EI . This is repeated until
all nodes in VI are adjacent to some node in I on Di (i.e., I
is maximal). This method is described in Algorithm 4. In
Algorithm 4, VI∗ denotes the set of nodes in VI that are not
adjacent to any node in I on Di .
t-spanner problem that is NP-hard [5]. Thus, we propose a
heuristic sparsification algorithm to find an α-distance graph
with a small number of edges.
e satisfies the
Suppose that a sparsified distance graph D
following: for any edge (x, y) ∈ ED and any failed edge set
F such that |F | ≤ fS , d(x, y, F ) ≤ dDe (x, y) ≤ αd(x, y, F ).
e for SDISO, it is easy to prove that SDISO
If we use D
guarantees the α-approximation. Based on this observation, we consider a process to find an α-distance graph as
e = (V e , E e ) where V e = VD
follows. Consider a graph D
D
D
D
and EDe = ED . For each edge (u, v) ∈ ED , we test that if
e then for any edge (x, y) ∈ ED ,
(u, v) is removed from D,
d(x, y, F ) ≤ dDe (x, y) ≤ αd(x, y, F ). If it is true, then we
e and go to check the next edge. After
remove (u, v) from D
e becomes an α-distance graph
this process is finished, then D
with a small number of edges. Our sparsification algorithm
is an implementation of this process. In the subsequent subsections, we discuss how to efficiently determine whether an
e during this process.
edge can be deleted from D
Algorithm 4: GetIS (Di )
1
2
3
4
5
input
: Di = (Vi , Ei ): the distance graph
output
: I: the independent set
begin
I := ∅;
Construct DI = (VI , EI ) where VI = Vi and EI = Ei ;
while I is not maximal do
v := arg minv∈V ∗ σ(v) ;
6
7
8
I
Remove v from DI and add it to I;
Add new edges (x, y) ∈ N P air(v) \ EI into DI ;
return I;
The idea behind this method is that the method removes
a node which minimizes the net contribution of the node to
the number of edges in the distance graph derived from the
resulting independent set. Thus, the distance graph computed by Algorithm 3 has a relatively smaller number of
edges than the distance graph derived from a k-path cover
in [10].
5.2
e durConsider an intermediate sparsified distance graph D
ing the sparsification process. We want to determine whether
e violates the α-approximation, but
removing an edge from D
it is hard to do it in preprocessing, because a failed edge
set is only known after a query is given. For handling this
issue, we use the concepts of the upper bound and the lower
bound.
For any edge (u, v) ∈ ED and any failed edge set F such
that |F | ≤ fS , we propose an upper bound for dDe (u, v) and
a lower bound for d(u, v, F ). For d(u, v, F ), we use d(u, v, ∅),
which is abbreviated as d(u, v), as a lower bound, because
it is trivial that d(u, v) ≤ d(u, v, F ). We now construct an
upper bound for dDe (u, v). For any path PG in G and any
e such that they have the same start node and
path PDe in D
the same destination node, we say that PG corresponds to
PDe , if all nodes in PDe are included in PG and PG does not
include any other node in VDe . This means that if PG does
not contain any failed edge, then it can be used as PDe in
the query algorithm. Note that any path in G corresponds
e
to at most one path in D.
Complexity. Let us analyze the time complexity of Algorithm 3. The cost for GetIS (·) (Algorithm 4) is O(n(log n +
2
degavg
)) where degavg is the average degree in the observed
distance graphs. In Line 5 of Algorithm 3, we can get Di+1
without any additional cost, because DI in Algorithm 4
becomes Di+1 after I is computed. Therefore, the total
time complexity to construct a distance graph for DISO is
2
O(tn(log n + degavg
)).
5.
APPROXIMATE ORACLE
Given two parameters α ≥ 1 and fS ≥ 0, we propose an αapproximate distance sensitivity oracle for any query having
at most fS failed edges, named Sparsification-based approximate DISO (SDISO). DISO and SDISO use the same query
algorithm.
5.1
Disjoint Path-based Upper Bound
Background
5
Consider a distance graph D = (VD , ED ) where VD is a
k-path cover C. Let us define a sparsified distance graph
e = (V e , E e ) as a subgraph of D such that V e = VD and
D
D
D
D
EDe ⊆ ED . All notations and functions for a distance graph
are defined for a sparsified distance graph in the same way.
Given a path P , V (P ) denotes the set of nodes in P .
For any query (s, t, F ) such that V (P (s, t, F )) ∩ C = ∅, beˆ t, F ) equals d(s, t, F ), the query algorithm finds
cause d(s,
e
the correct answer with any distance graph including D.
e satisfyThus, we want to find a sparsified distance graph D
ing that for any two parameters α ≥ 1 and fS ≥ 0, and any
query (s, t, F ) such that V (P (s, t, F )) ∩ C 6= ∅ and |F | ≤ fS ,
d(s, t, F ) ≤ dDe (s, t, F ) ≤ αd(s, t, F ). We call such a sparsified distance graph an α-distance graph. For simplicity, if we
mention α-approximation, then it means α-approximation
e
for any query having at most fS failed edges. If we use D
for the query algorithm instead of D, then it guarantees αapproximation. However, finding the α-distance graph with
a minimum number of edges is equivalent to the minimum
9
2
7
4
Figure 2: A sparsified distance graph
Figure 2 presents a sparsified distance graph of the distance graph in Figure 1. For example, the path h(5, 4), (4, 8),
(8, 10), (10, 7)i in the input graph of Figure 1 corresponds
to the path h(5, 4), (4, 7)i in the sparsified distance graph in
Figure 2.
Then, we define the following concept:
Definition 5.1 (Detour). Given an edge (u, v) ∈ ED , the
detours of (u, v) are defined as paths from u to v in G each of
e that is not a path consisting
which corresponds to a path in D
of the single edge (u, v).
7
5
5
7
5
7
7
this without explicitly computing such a path and updating dU (x, y).
The key idea is to give a different approximation ratio to
e Consider alternative paths from x to y in
each edge in D.
G each of which consists of the sub-path of PG from x to u,
a path in πU (u, v), and the sub-path of PG from v to y. For
any failed edge set F such that |F | ≤ fS , it is easy to see
that there is at least one path, among such alternative paths
and the paths in πU (x, y)\{PG }, which does not contain any
failed edge. In addition, each alternative path PG0 satisfies
the following:
10
4
4
(a)
8
4
(b)
8
(c)
Figure 3: Example detours
Figure 3 presents some detours of edge (5, 7) of the sparsified distance graph in Figure 2. All the detours in Figure 3 pass through node 4 which is included in the node
set of the sparsified distance graph. Note that the path
h(5, 6), (6, 9), (9, 10), (10, 7)i in the input graph of Figure 1
cannot be a detour of (5, 7), because there is no path in the
sparsified distance graph that the path corresponds to.
Consider an edge (u, v) ∈ EDe to be tested during the sparsification process. One noteworthy property of the detours
e paths in
of (u, v) is that even if (u, v) is removed from D,
e corresponding to the detours can be used in the query
D
algorithm. Based on this, we consider a set of fS + 1 edgedisjoint detours πU (u, v) from u to v in G. We denote the set
e each of which a path in πU (u, v) corresponds
of paths in D
∗
(u, v). Let us denote the maximum distance among
to as πU
the paths in πU (u, v) as dU (u, v). Then, dU (u, v) is an upper bound for dDe (u, v), because at least one path in πU (u, v)
does not contain any failed edge. dU (u, v) is called a disjoint
path-based upper bound for dDe (u, v). We will discuss how
to select an edge-disjoint detour set in Section 5.5.
Note that for any edge (u, v) ∈ ED , dU (u, v), πU (u, v),
∗
(u, v) are computed when we test whether if (u, v)
and πU
can be removed in the sparsification process. This setting
gives us a guarantee that when we test it, all edges on paths
∗
(u, v) are included in EDe . After removing (u, v), EDe
in πU
is changed and for the next edge (u0 , v 0 ) to be tested, we
∗
(u0 , v 0 ).
compute dU (u0 , v 0 ), πU (u0 , v 0 ), and πU
Let us explain how to use the disjoint path-based upper bound dU (u, v) and the lower bound d(u, v) given an
e Suppose that dU (u, v) ≤ αd(u, v). Then,
edge (u, v) ∈ D.
e we can guarantee that
even if we remove (u, v) from D,
d(u, v, F ) ≤ dDe (u, v) ≤ αd(u, v, F ). In this way, we can preserve the α-approximation when we remove the first single
e = D. However, removing multiple edges by just
edge from D
applying this repeatedly may violate the α-approximation.
In the next subsection, we explain the reason and propose
an elegant way to remove multiple edges while keeping the
α-approximation.
5.3
d(PG0 ) ≤ d(PG ) − d(u, v) + dU (u, v).
Suppose that dU (u, v) ≤
αd(x,y)
d(u, v).
d(PG )
αd(x, y)
d(u, v)
d(PG )
αd(x, y) − d(PG )
d(PG )
≤ d(PG ) +
d(PG )
d(PG ) − d(u, v) + dU (u, v) ≤ d(PG ) − d(u, v) +
= αd(x, y).
Therefore, we can guarantee that,
d(PG0 ) ≤ d(PG ) − d(u, v) + dU (u, v) ≤ αd(x, y).
With this guarantee, we can avoid explicitly computing
some alternative path for PG . We generalize this idea by
using the following concept, called the permitted approximation ratio.
Definition 5.2 (Permitted Approximation Ratio). For any
edge (u, v) ∈ ED , let us define the dependent set λ(u, v) as
the set of edges (x, y) ∈ ED \ EDe (i.e., already removed
∗
(x, y) contains (u, v). For any
edges) such that a path in πU
edge (u, v) ∈ ED , the permitted approximation ratio θ(u, v)
is defined as follows:
min(x,y)∈λ(u,v) θx,y (u, v) if |λ(u, v)| > 0
θ(u, v) =
,
α
otherwise
where θx,y (u, v) = θ(x,y)d(x,y)
and du,v (x, y) is the distance
du,v (x,y)
∗
(x, y)
of the path in πU (x, y) corresponding to the path in πU
containing (u, v).
In the sparsification process, for every edge (u, v) ∈ ED ,
e is identical to D at first.
θ(u, v) is initially set to α, because D
As the sparsification process goes along, for remaining edges
e θ(u, v) is updated. Note that once an edge (u, v)
(u, v) in D,
e then θ(u, v) is never changed. Based on
is removed from D,
this, we derive the following theorem.
Permitted Approximation Ratio
Theorem 5.1. After the sparsification process is finished,
the α-approximation holds for any queries having at most fS
failed edges, if for any edge (u, v) ∈ ED \ EDe ,
Suppose that there is an edge (x, y) already removed from
e during the sparsification process such that dU (x, y) ≤
D
αd(x, y). Suppose that there is an edge (u, v) ∈ EDe such
∗
that some path PDe ∈ πU
(x, y) includes (u, v) and now we
want to test whether (u, v) can be removed. In this case,
e then P e cannot be used in the
if (u, v) is removed from D,
D
query algorithm, because PDe contains (u, v). Consider the
path, denoted as PG , in πU (x, y) corresponding to PDe . Then,
we need to find another path PG0 instead of PG for πU (x, y)
and update dU (x, y) if d(Pa0 ) > dU (x, y). However, computing such a path directly may cause an expensive cost.
For this reason, we propose an efficient way of handling
dU (u, v) ≤ θ(u, v)d(u, v).
5.4
Sparsification Algorithm
Based on the above techniques, the sparsification algoe is identirithm is presented in Algorithm 5. Because D
cal to D at first, θ(x, y) is set to α for all (x, y) ∈ ED
in Line 3. In Lines 4-10, the sparsification process is executed over edges (x, y) ∈ ED . The algorithm computes a
disjoint path-based upper bound dU (x, y) for πU (x, y) and
8
Algorithm 5: Sparsify(G, fS , α)
input
1
2
3
4
5
6
7
8
9
e (u, v), fS )
Algorithm 6: DisjointDetour(G, D,
: G: the input graph, D: the distance graph, fS : the
maximum number of failures, α:the approximation
ratio we want to achieve
e the sparsified distance graph
: D:
output
begin
e
Copy D to D;
θ(x, y) := α for all (x, y) ∈ ED ;
for (x, y) ∈ ED do
∗
Compute dU (x, y) and πU
(x, y);
if dU (x, y) ≤ θ(x, y)d(x, y) then
e
Remove (x, y) from D;
∗
for P ∈ πU
(x, y) do
for (u, v) ∈ P ∩ ED
e do
n
θ(u, v) := min θ(u, v),
10
input
output
1
2
3
4
5
6
θ(x,y)d(x,y)
d(P )
o
7
11
begin
∗
πU
(u, v) := ∅, ∆ := {(u, v)};
∗
while |πU
(u, v)| < f + 1 do
e
Find the shortest path PD
e from u to v in D avoiding
edges in ∆;
∗
∗
πU
(u, v) := πU
(u, v) ∪ {PD
e };
e e ); /* E(P
e e ): edges in D
e mapped from
∆ := ∆ ∪ E(P
D
D
paths which overlap the detour corresponding to PD
e */
∗
return maxP ∈π∗ (u,v) d(P ) as dU (u, v), πU
(u, v);
U
e
return D;
e e)
Then, there exists the detour corresponding to PDe . E(P
D
denotes the set of edges (x, y) in EDe such that P̂ (x, y, ∅)
overlaps the detour. For example, for the graph in Figure 1 and the sparsified distance graph in Figure 2, if PDe =
h(5, 4), (4, 7)i, P̂ (9, 7, ∅) = h(9, 10), (10, 7)i, and the detour
corresponding to PDe is h(5, 4), (4, 8), (8, 10), (10, 7)i, then
e e ) = {(5, 4), (4, 7), (9, 7)}. By adding edges in E(P
e e)
E(P
D
D
into ∆, we can guarantee the edge-disjointness between the
∗
(u, v). After
detours of (u, v) corresponding to paths in πU
∗
(u, v)| = fS +1, we get dU (u, v) as the maximum distance
|πU
∗
(u, v).
among the paths in πU
Line 4 can be efficiently implemented by the Dijkstra’s
e modified to avoid edges which are marked
algorithm on D
as included in ∆. Thus, computing the shortest path PDe
in Line 4 requires O(|C| log |C| + |EDe |). Therefore, the
total time complexity is cU = O(fS (|C| log |C| + |ED | +
e e )|avg )) where |E(P
e e )|avg is the average size of E(P
e e)
|E(P
D
D
D
in Line 6.
∗
πU
(x, y)
the path set
that πU (x, y) corresponds to in Line
5. According to Theorem 5.1, the algorithm tests whether
dU (x, y) ≤ θ(x, y)d(x, y). If it is true, (x, y) is removed from
∗
e In addition, for each edge (u, v) on paths in πU
D.
(x, y), if
e we update θ(u, v).
(u, v) is not yet removed from D,
This sparsification algorithm finds an α-distance graph
with a minimal number of edges in terms of the permitted
approximation ratio. Regarding efficiency, this algorithm
does not include optimization. Nevertheless, it removes a
sufficient number of edges that are unnecessary while keeping the α-approximation.
Let us analyze the time complexity of Algorithm 5. Let us
denote the cost for computing a disjoint path-based upper
bound as cU . The cost for removing (x, y) in Line 7 is negligible compared with that for the update of θ(u, v) in Lines
8-10. Thus, the total time complexity is O(m(cU +f |P |avg ))
where |P |avg is the average size of P in Line 8.
5.5
e the intermediate sparsified
: G: the input graph, D:
distance graph, (u, v):the edge in ED , fS : the
maximum number of failures
: dU (u, v): the disjoint path-based upper bound,
∗
πU
(u, v): the path set that πU (u, v) corresponds to
More efficient computation. D is sometimes extremely
dense, when G comes from a scale-free network. This may
strongly affect the efficiency of Algorithm 6, especially computing the shortest path in Line 4. Thus, we propose a more
efficient technique for Line 4 as follows.
Let us denote the shortest path from a node u to a node v
e as P e (u, v). Given the input edge (u, v) in Algorithm 6,
in D
D
for any parameter β > 0, we define the β-length shortest
path tree Tv of v as the shortest path tree of v in the inbound
direction consisting of nodes w such that |PDe (w, v)| ≤ β.
e
Consider that we want to find a path PD0e from u to v in D
avoiding edges in ∆ instead of PDe in Line 4 of Algorithm 6.
There are at most |nout
e (u)| paths from outbound neighbors
D
of u to v in Tv . Based on them, we can utilize at most
|nout
e (u)| paths from u to v via an outbound neighbor of u in
D
e which do not contain any edge in ∆. We use the shortest
D,
path among them as PD0e . Because the number of edges in Tv
is O(|C|), finding PD0e requires O(|C|) time. This technique
is called the shortest path tree-based detour computation.
Let us explain how to use the β-length shortest path tree
efficiently. Suppose that in the sparsification algorithm, the
edges of ED are tested in the order of the ID value of their
target node. Then, for any node v ∈ VD , the inbound edges
of v in D are consecutively tested. Thus, if we construct
Tv once when the first inbound edge of v is tested, then
we can also use Tv when the subsequent inbound edges of
v are tested. In this way, we can efficiently use Tv to find
edge-disjoint detours.
Efficient Detour Selection
Let us explain how to efficiently select edge-disjoint detours for any edge (u, v) ∈ ED . In order to remove many
e eventually, we can consider the fS + 1 edgeedges from D
disjoint detours of (u, v) such that the maximum distance
among them is minimized or the sum of their distances is
minimized. However, for any integer p > 1, finding the p
edge-disjoint paths from a node to another such that the
maximum distance among them is minimized or the sum of
their distances is minimized is NP-hard [20]. Because we can
only consider detours, not general paths, finding such fS + 1
edge-disjoint detours would be more complex. In addition,
there are usually lots of edges in ED and we should compute an edge-disjoint detour set for each edge, so we cannot
spend much time to compute it per edge.
In this background, we propose an efficient heuristic algorithm iteratively selecting fS + 1 edge-disjoint detours of
the input edge (u, v) ∈ ED . Assume that for any edge
ˆ y, ∅) (i.e., (x, y) in D
e
(x, y) ∈ EDe , wDe (x, y) is set to d(x,
corresponds to P̂ (x, y, ∅)). The idea of the algorithm is that
for efficiency, we want to compute edge-disjoint detours use not G. The algorithm is described in Algorithm 6.
ing D,
∗
Note that this algorithm only returns dU (u, v) and πU
(u, v),
because actual detours are not used in the sparsification ale that are avoided when the
gorithm. ∆ is the set of edges in D
shortest path is computed in Line 4. In Line 4, the shortest
e avoiding edges in ∆ is computed.
path PDe from u to v in D
9
With this technique, the time complexity of Algorithm 6 is
e 0e )|avg )) and the time complexity
changed to O(fS (|C|+|E(P
D
of Algorithm 5 will have an additional term |C|cβ where cβ
is the cost for computing a β-length shortest path tree. The
benefit of the reduction of cU outweighs the cost of the new
term |C|cβ .
6.
the minimum weight edge. This does not affect the correctness of course. In addition, DBLP and YOU are undirected
graphs. For the undirected graphs, we make them directed
by adding an edge (v, u) for each edge (u, v) ∈ E. Table 2
shows the detail statistics of all the datasets.
Edge weights. For road network datasets, we use the
travel time as the weight of each edge. For social network
datasets, since they do not provide any information about
edge weights, we set the weight of each edge as a real value
that is sampled uniformly at random from 0 to 1.
EXPERIMENTS
Our distance sensitivity oracles are evaluated with extensive experiments. We implement algorithms with C++ and
run the experiments on an Intel(R) i7-3970X 3.50 GHz CPU
machine with 64GB RAM.
6.1
Query generation. Because there is no well-known public
dataset including a graph structure and failures, we synthetically generate queries in the following way.
Let us consider a query (s, t, F ) and an edge e ∈ F . If
F is simply made by selecting f edges uniformly at random, there can be some edges e such that P (s, t, F ) =
P (s, t, F \ {e}). We say that such edges are not essential
to construct P (s, t, F ). In order to guarantee that F only
contains essential edges for P (s, t, F ), we generate queries
(s, t, F ) with the following procedure. First, we select two
nodes uniformly at random from V and assign them to s
and t, respectively. Next, after initializing F = ∅, we iteratively pick uniformly at random an edge x in P (s, t, F ), add
x into F , and recompute P (s, t, F ) until |F | = f . Because
we pick an edge in P (s, t, F ) at each iteration, every edge in
F contributes to the change of the shortest path from s to
t. Thus, this method makes P (s, t, F ) much different from
the shortest path from s to t in G.
Experimental Environment
Comparison methods. Recall that all the existing distance sensitivity oracles for directed graphs cannot be used
for practical datasets. We have four comparison methods as
follows. DISO is our exact oracle. SDISO is our approximate oracle. DI is the classic (not bidirectional) Dijkstra’s
algorithm with a binary heap. FDDO is a fully dynamic
distance oracle without any theoretic guarantee for accuracy. This oracle is one of four algorithms proposed in [17],
named LCA (meaning Lowest Common Ancestor), that has
efficient query time with good accuracy. For this algorithm,
we use 20 landmarks that are selected by a method proposed
in [17], named the best coverage technique. In addition, we
revise their update algorithm to make it work for a weighted
directed graph as described in [17].
In addition to them, we will compare the IS-based path
cover algorithm (Algorithm 3), denoted as IS-Cover, with
the k-path cover algorithm in [10], named Pruning. Pruning finds a minimal k-path cover considering the number of
nodes in the resulting cover. In Pruning, we set that nodes
in V are visited in increasing order of the sum of the indegree and the out-degree of a node. This setting is easy to
be implemented and presents good effectiveness in [10].
NY
Table 3: Results for varying t (DISO)
Dataset
EPI
NY
DBLP
YOU
CAL
USA
|V |
76K
264K
317K
1.1M
1.9M
24.0M
|E|
509K
734K
1M
3.0M
4.7M
58.3M
Avg. deg.
6.7
2.76
6.6
5.3
2.45
2.41
Max deg.
3.0K
8
0.7K
57.7K
7
9
EPI
Table 2: Statistics of the real-life datasets
Type
Social
Road
Social
Social
Road
Road
6.2
3
Q(ms)
46.26
30.73
23.38
21.68
23.82
Pruning
|C|
|ED |
140.83K 579.45K
89.05K
546.31K
59.67K
593.58K
44.74K
740.94K
37.93K 1,013.56K
18.27
14.88
23.05
22.43K
9.66K
6.16K
488.98K
709.71K
1,465.7K
Results
In the experiments, if |F | is not explicitly specified, we
consider that |F | = 5 when we generate queries. We generate 100 syntactic queries for each dataset and results are
averaged over the queries. Due to space limitation, we sometimes only present results for some of the datasets from experiments. Note that in such a case, results for the other
datasets show a tendency similar to that of the presented
results.
Datasets. In the experiments, we use six network datasets:
Epinions, NY, DBLP, Youtube, CAL, and USA. Epinions,
DBLP, and Youtube were published by Jure Leskovec2 . NY,
CAL, and USA were published in the 9th DIMACS Implementation Challenge3 . More specifically, Epinions is an online social network representing who-trust-whom and it is
abbreviated as EPI. DBLP is a co-authorship network where
two authors are directly connected if they co-authored at
least one paper. Youtube is a social network that is included in a video-sharing web site where users can make
friendship each other and it is abbreviated as YOU. NY is
a road network of New York city, CAL is a road network of
California and Nevada, and USA is the entire road network
of USA. In these datasets, there exist multiple edges defined
over the same node pair. Among those edges, we only take
2
IS-Cover
t Q(ms)
|C|
|ED |
1 38.84 142.24K 578.57K
2 24.95
90.10K 508.00K
3 18.56
62.97K 479.05K
4 15.15
47.20K 472.53K
5 12.97
37.25K 476.80K
6 11.93
30.63K 483.32K
7 11.50
25.89K 484.07K
8 11.52
22.43K 489.14K
1
17.0
24.6K
470.3K
2
13.6
11.4K
570.1K
3
14.3
7.7K
744.9K
The effect of t (Comparing with [10]). Recall that t
is the parameter to determine k when we select a k-path
cover, and k = 2t for some t > 0. The experimental results
for the effect of t are described in Table 3. This table also
includes the results of the comparison between IS-Cover and
Pruning. Let Q denote the query time. From the results,
we can find that the distance graph from IS-Cover is more
sparse than that from Pruning, so the query time of DISO
with IS-Cover is smaller than that with Pruning.
As discussed earlier, manipulating k (i.e., t) affects the
query time of DISO and SDISO. We can find an appropriate
http://snap.stanford.edu/data/.
http://www.dis.uniroma1.it/challenge9/.
10
DISO
DI
FDDO
Approximation error(%)
SDSIO
Query time(ms)
10000
1000
100
10
1
SDSIO
FDDO
10
1
0.1
0.01
EPI
DBLP
YOU
NY
CAL
EPI
USA
(a) Query time
DBLP
YOU
NY
CAL
USA
(b) Approximation error
Figure 4: Results over various datasets (|F | = 5)
value for t by increasing t from 1 until the query time reaches
minimum. After the query time hits the minimum, the query
time increases as t increases. We stop increasing t when the
query time increases for the first time. In the rest of the
experiments, t is set to be 7, 10, 15, 2, 5, and 3, for NY,
CAL, USA, EPI, DBLP, and YOU, respectively.
It is easy to see that if β is sufficiently large, the shortest
paths from outbound neighbors of u to v in Tv may overlap each other. Otherwise, such shortest paths are likely to
be edge-disjoint of each other. For example, β = 1, for all
outbound neighbors of u that are included in Tv , the shortest paths from the neighbors to v are edge-disjoint of each
other. However, if β is too small, it is likely that Tv does not
include many of the outbound neighbors of u. Thus, β is
experimentally determined for efficiency and effectiveness.
For the rest of the experiments, we experimentally determine the parameters. As a result, fS is set to 1 for all
datasets and α is set to 4 for the road network datasets. For
the social network datasets, α is set to 8. β is set to 4, 6,
and 4, for EPI, DBLP, and YOU, respectively.
Table 4: The results of parameter testing for SDISO
fS
0
1
2
3
4
5
α
2
4
6
8
β
2
4
6
8
NY
EPI
Q(ms) Error (%) R(%) Q(ms) Error (%) R(%)
5.14
36.25
13.95
4.74
5.91
8.30
6.64
6.63
27.56
5.39
2.09
13.09
8.06
1.34
47.02
6.48
1.03
18.39
10.36
0.14
78.59
7.13
0.52
25.08
11.43
0.00
99.96
8.22
0.20
32.93
11.40
0.00
100.00
9.08
0.18
41.19
7.76
6.64
6.54
6.21
0.98
6.63
12.02
14.02
39.24
27.56
24.72
23.39
5.78
5.47
5.39
5.37
0.31
1.57
2.09
2.87
15.31
13.54
13.09
12.88
-
-
-
6.62
5.39
5.48
6.02
0.63
2.09
1.29
0.76
23.67
13.09
14.00
18.49
Overall performance and accuracy. We compare the
comparison methods in terms of query time and accuracy
over the datasets. Because the query time of FDDO is too
large for USA, we omit the result from FDDO for USA.
The query time of each comparison method is presented in
Figure 4(a). For all the datasets, the query times of DISO
and SDISO are much smaller than those of FDDO and DI.
SDISO is as much as 2.2 times faster than DISO. Note that
FDDO is slower than DI in many datasets, because the cost
for the update operation of FDDO is comparable to the cost
for running the Dijkstra’s algorithm multiple times. This
result shows that FDDO is inappropriate for the distance
sensitivity problem.
The approximation errors of SDISO and FDDO are shown
in Figure 4(b). For all the datasets, the approximation error
of SDISO is lower than 7.1% and this is much lower than
the value of α we set. Based on these results, we can see
that the graph sparsification algorithm for SDISO is very
effective.
Table 5: The results for varying |F | (NY)
Parameter testing for SDISO. We test the three parameters, fS , α, and β, for SDISO. The results are presented
in Table 4. The default setting is fS = 1 for both NY and
EPI, α = 4 for NY, α = 8 for EPI, and β = 4 for EPI.
From this setting, we vary each parameter. Approximation
error is defined to be (AP P − OP T )/OP T , where OP T is
the optimum and AP P is an approximation. In this table, the approximation error is denoted as Error and the
ratio between |EDe | and |ED | is denoted as R. Note that
the shortest path tree-based detour computation is applied
only for scale-free network datasets, (i.e., social networks)
because for a bounded-degree network (i.e., road network),
the original detour set selection is sufficiently efficient.
The results say that, when fS ≥ 3, the number of removed
edges in the sparsification process for NY is very small. This
is because the number of edge-disjoint paths from a node u
to a node v in G is bounded by the minimum between the
out-degree of u and the in-degree of v. Thus, in NY, the
number of edge-disjoint paths, between two nodes, that can
be used for the sparsification is very small, because NY is
a bounded-degree graph. Nevertheless, even if fS = 0, the
error is only 36.25% for NY and 5.91% for EPI. Next, because α is a parameter for an approximation ratio, as it
increases, the query time decreases and the error increases.
Finally, when β = 4, the query time is smallest with reasonable accuracy. For explanation, consider a β-length shortest
path tree to compute an edge-disjoint detour set πU (u, v).
|F |
1
3
5
7
9
DISO
10.47
10.57
11.50
13.12
14.66
Query time(ms)
SDISO
DI
5.35
106.89
5.73
102.36
6.64
103.10
7.65
118.09
9.20
125.80
FDDO
262.37
685.49
947.29
1289.57
1405.29
Error(%)
SDISO
FDDO
6.69
7.13
6.75
6.01
6.63
2.47
7.12
7.54
6.46
1.07
We also conduct some experiments with different values
of |F |. As presented in Table 5, the query time of FDDO
is most strongly affected by |F |. One interesting thing is
that while the approximation error of SDISO is very stable,
that of FDDO is somewhat fluctuating. Because FDDO is
based on the landmark heuristics, its accuracy can be more
sensitive to queries.
Overall, our distance sensitivity oracles outperform the
competitors in terms of query time. SDISO is the most
11
Table 6: The Preprocessing time and the index size of DISO,
SDISO, and FDDO
Data
EPI
NY
DBLP
YOU
CAL
USA
Index size(MB)
DISO
SDISO
FDDO
12.6
1.7
47.7
10.6
3.0
232.6
49.4
7.8
279.0
279.4
19.0
998.7
34.9
8.9
1,663.9
455.7
77.4
-
[5] L. Cai and D. Corneil. Tree spanners. SIAM J.
Discrete Math, 8:359–387, 1995.
[6] S. Chechik, M. Langberg, D. Peleg, and L. Roditty.
F-sensitivity distance oracles and routing schemes. In
ESA, pages 84–96, 2010.
[7] C. Demetrescu and G. F. Italiano. A new approach to
dynamic all pairs shortest paths. J. ACM,
51(6):968–992, Nov. 2004.
[8] C. Demetrescu, M. Thorup, R. A. Chowdhury, and
V. Ramachandran. Oracles for distances avoiding a
failed node or link. SIAM J. Comput.,
37(5):1299–1318, Jan. 2008.
[9] R. Duan and S. Pettie. Dual-failure distance and
connectivity oracles. In SODA, pages 506–515, 2009.
[10] S. Funke, A. Nusser, and S. Storandt. On k-path
covers and their applications. Proc. VLDB Endow.,
7(10):893–902, June 2014.
[11] F. Grandoni and V. Williams. Improved distance
sensitivity oracles via fast single-source replacement
paths. In FOCS, pages 748–757, 2012.
[12] V. King. Fully dynamic algorithms for maintaining
all-pairs shortest paths and transitive closure in
digraphs. In FOCS, pages 81–89, 1999.
[13] Y. Qin, Q. Z. Sheng, and W. E. Zhang. Sief:
Efficiently answering distance queries for failure prone
graphs. In EDBT, pages 145–156, 2015.
[14] L. Roditty and U. Zwick. Dynamic approximate
all-pairs shortest paths in undirected graphs. In
FOCS, pages 499–508, 2004.
[15] M. Thorup. Fully-dynamic all-pairs shortest paths:
Faster and allowing negative cycles. In SWAT’04,
pages 384–396, 2004.
[16] M. Thorup and U. Zwick. Approximate distance
oracles. In STOC, STOC ’01, pages 183–192, 2001.
[17] K. Tretyakov, A. Armas-Cervantes,
L. Garcı́a-Bañuelos, J. Vilo, and M. Dumas. Fast fully
dynamic landmark-based estimation of shortest path
distances in very large graphs. In CIKM, pages
1785–1794, 2011.
[18] O. Weimann and R. Yuster. Replacement paths via
fast matrix multiplication. In FOCS, pages 655–662,
Oct 2010.
[19] O. Weimann and R. Yuster. Replacement paths and
distance sensitivity oracles via fast matrix
multiplication. ACM Trans. Algorithms,
9(2):14:1–14:13, Mar. 2013.
[20] P. Zhang and W. Zhao. On the complexity and
approximation of the min-sum and min-max disjoint
paths problems. In Combinatorics, Algorithms,
Probabilistic and Experimental Methodologies, volume
4614, pages 70–81. 2007.
Preprocessing time(s)
DISO
SDISO
FDDO
4.5
43.5
92.1
6.0
23.4
122.5
29.0
1552.2
308.6
1470.8
10917.3
1633.7
60.0
184.9
1091.1
1785.5
24506.7
-
efficient method in terms of query time while achieving accuracy similar to that of FDDO. With such efficiency, as described in Table 6, the index sizes of DISO and SDISO are
more compact than that of FDDO. Because SDISO is based
on graph sparsification, the index size of SDISO must be
smaller than that of DISO. In terms of preprocessing time,
DISO outperforms SDISO and FDDO. The preprocessing
time of SDISO is comparable to that of FDDO.
7.
CONCLUSIONS
The distance sensitivity problem is an important variation
of the point-to-point shortest distance problem which can
be used in many applications. It has been mainly studied
in theory literature, but all the existing works of this problem for directed graphs suffer from ω(n2 ) space and ω(mn)
preprocessing time.
This paper presents two practical distance sensitivity oracles that efficiently work for directed graphs. The first
oracle is the exact oracle, named DISO, based on the distance graph. In order to compute a good distance graph
for the exact oracle, we utilize the relationship between the
k-path cover and the independent set. Next, we propose an
efficient approximate distance oracle, named SDISO, with
reasonable accuracy. This oracle is derived by applying a
sparsification algorithm to the distance graph of the exact
oracle with the control of the loss of accuracy.
We conduct extensive experiments to evaluate our distance sensitivity oracles. The experimental results demonstrate that our distance sensitivity oracles outperform the
competitors for all cases in terms of query time. Compared
with the recent fully dynamic distance oracle in [17], DISO
is more efficient in terms of preprocessing time and space.
SDISO is more efficient than the dynamic distance oracle
in terms of space, while it has comparable performance in
terms of preprocessing time. In addition, SDISO has similar accuracy to that of the dynamic oracle. Compared with
DISO, SDISO is more efficient in terms of query time and
space, but it requires more expensive preprocessing time and
has a small amount of error. This paper is the first work that
handles practical graphs with million-level nodes.
8.
REFERENCES
[1] S. Baswana, U. Lath, and A. S. Mehta. Single source
distance oracle for planar digraphs avoiding a failed
node or link. In SODA, pages 223–232, 2012.
[2] A. Bernstein and D. Karger. Improved distance
sensitivity oracles via random sampling. In SODA,
pages 34–43, 2008.
[3] A. Bernstein and D. Karger. A nearly optimal oracle
for avoiding failed vertices and edges. In STOC, pages
101–110, 2009.
[4] B. Brešar, F. Kardoš, J. Katrenič, and G. Semanišin.
Minimum k-path vertex cover. Discrete Appl. Math.,
159(12):1189–1195, July 2011.
APPENDIX
Proof of Lemma 4.1
If P (u, v, F ) contains any nodes in C except u and v, P (u, v, F )
can be divided into multiple sub-paths each of which does
not contain any node in C as an intermediate node. For
those paths P (x, y, F ), there exist corresponding edges (x, y)
ˆ y, F ) = d(x, y, F ). Then,
in D such that wD (x, y) = d(x,
12
dD (u, v) is the sum of the weights of such edges, and it is the
same as d(u, v, F ). Otherwise, there exists an edge (u, v) ∈
ˆ v, F ) = d(u, v, F ).
ED by definition and wD (u, v) = d(u,
Therefore, this lemma is proved.
and it does not contain any other node in Vi . Note that the
length of the longest path (in terms of length) in G between
two nodes in a x-path cover is x. In addition, suppose that
there also exists a path Pv,w from v to some node w ∈ Vi in
G such that its length is x and it does not contain any other
node in Vi . Then, there exist edges (u, v) and (v, w) in Di .
Thus, if v is included in ISi , u and w cannot be included
in ISi . v is not included in Vi+1 either. Note that the path
consisting of Pu,v and Pv,w does not contain any other node
in Vi+1 except u and w and its length is 2x. This implies
that this new path has the longest length among paths between nodes in Vi+1 . In addition, because ISi is maximal,
if Vi is minimal, for any node v ∈ Vi+1 , there exists a path
P from or to v such that |P | = 2x and it does not contain
any node in Vi+1 as an intermediate node. This means that
Vi+1 is also minimal. Therefore, Vi+1 is a minimal 2x-path
cover. Because the initial node set, V0 , is a minimal 1-path
cover, the result node set is a minimal 2t -path cover.
Proof of Lemma 4.2
Because the bounded Dijkstra’s algorithm avoids traversing beyond only nodes in C, for any two nodes v ∈ V
and w ∈ C, if P (v, w, F ) does not contain any node in C
as an intermediate node, P (v, w, F ) must be found in the
bounded Dijkstra’s algorithm. It means that w must be
included in Âout (v). In the same way, we can prove that
Ain (v) ⊆ Âin (v).
Proof of Lemma 4.3
The sufficient condition means that for any failed edge (x, y) ∈
F , all paths from v to x must pass through some node in
C \ {v, x}. Thus, if the sufficient condition is true, for any
edges (v, w) from v in D, all paths P from v to w, which does
not pass through any node in C as an intermediate node,
do not contain any failed edge in F . Therefore, v is not an
affected node.
Proof of Theorem 5.1
∗
Recall that for any edge (x, y) ∈ ED , πU
(x, y) is computed
when (x, y) is tested to be removed. Thus, for any edge
(u, v) ∈ ED \ EDe (i.e., removed edges), we can consider the
sequence < (u1 , v1 ), (u2 , v2 ), ..., (ui = u, vi = v) > such
that λ(u, v) = ∅ and for any 1 ≤ j ≤ i,
Proof of Lemma 4.4
We first prove dD (t) = dD (s, t, F ). Consider a query (s, t, F )
and the case that P (s, t, F ) contains at least one node in C.
The query algorithm starts with the nodes in Âout (s). Thus,
for any v ∈ C, if v is popped from Q, then,
ˆ u, F ) + dD (u, v) .
dD (v) = min
d(s,
(uj , vj ) = arg
In Lines 18-19, the algorithm properly updates the weights
of edges (v, x) ∈ ED if v is suspicious. Thus, by Lemma 4.3,
dD (v) is also properly computed. In addition, dD (t) keeps
ˆ t, F )} where P OP (t) is the set
minv∈P OP (t) {dD (v) + d(v,
of nodes popped from Q among the in-access nodes of t.
Therefore, by Lemma 4.2, after all,
ˆ u, F ) + dD (u, v) + d(v,
ˆ t, F )
d(s,
dD (t) =
min
(u,v)∈Â(s,t)
= dD (s, t, F ).
Next, we prove dD (s, t, F ) = d(s, t, F ). P (s, t, F ) can be
divided into P (s, u1 , F ), P (u1 , u2 , F ), ..., P (ul , t, F ) where
ui is a node included in C for any 1 ≤ i ≤ l and l is the
number of nodes in P (s, t, F ) ∩ C except s and t. It is easy
to see that u1 and ul are included in Aout (s) and Ain (t), respectively, because P (s, u1 , F ) and P (ul , t, F ) do not contain
any other node in C. Thus, (u1 , ul ) is included in A(s, t).
By Lemma 4.1, dD (u1 , ul ) = d(u1 , ul , F ). Therefore, we can
guarantee,
min
(u,v)∈A(s,t)
u ,ul
1
du,v
D (s, t, F ) = dD
θx,y (uj+1 , vj+1 ).
Because there is no case that (u1 , v1 ) = (ui , vi ) (i.e., a cyclic
∗
(ui , vi ) do not consequence), suppose that all paths in πU
tain any edge in ED \ EDe .
In order to prove the α-approximation for any queries having at most fS failed edges, we claim that for any edge
(uj , vj ) in the sequence and any failed edge set F such
that |F | ≤ fS , there exists a path Pj from uj to vj in
e such that Pj corresponds to Pj∗ ,
G and a path Pj∗ in D
Pj ∩ F = ∅ and d(Pj ) < θ(uj , vj )d(uj , vj ). This guarantees
that the query algorithm will consider some distance lower
than θ(uj , vj )d(uj , vj ) as the distance between uj and vj .
Consider the last edge (ui , vi ) in the sequence. Because all
∗
(ui , vi ) do not contain any edge in ED \ EDe ,
the paths in πU
we can guarantee that there is a path Pi from ui to vi in
πU (ui , vi ) such that Pi ∩F = ∅ and d(Pi ) ≤ θ(ui , vi )d(ui , vi ).
Then, consider an intermediate edge (uj , vj ) in the sequence.
Suppose that there is a path Pj from uj to vj in the graph
(V, E \ F ) such that d(Pj ) ≤ θ(uj , vj )d(uj , vj ). Next, consider the previous edge (uj−1 , vj−1 ). It easy to see that there
exists a path Pj−1 from uj−1 to vj−1 such that Pj−1 includes
Pj , Pj−1 ∩ F = ∅, and,
u∈Âout (s)
dD (s, t, F ) =
min
(x,y)∈λ(uj+1 ,vj+1 )
d(Pj−1 ) ≤ ψ − d(uj , vj ) + θ(uj , vj )d(uj , vj )
θ(uj−1 , vj−1 )d(uj−1 , vj−1 ) − ψ
d(uj , vj )
≤ψ+
ψ
(s, t, F )
≤ θ(uj−1 , vj−1 )d(uj−1 , vj−1 ),
= d(s, t, F ).
where ψ = duj ,vj (uj−1 , vj−1 ). Thus, the claim is proved.
Because we picked (u, v) arbitrarily, the claim can be applied for any such edges. This implies that for any removed
e some path P from u to v in the graph
edge (u, v) from D,
(V, E \ F ) such that d(P ) ≤ θ(u, v)d(u, v) ≤ αd(u, v) can be
considered in the query algorithm. Therefore, it guarantees
that the answer of the query algorithm is at most αd(s, t, F ).
Proof of Lemma 4.5
Consider an intermediate iteration i in Lines 4-5 of Algorithm 3. Suppose that Vi is a minimal x-path cover. Then,
Vi+1 is a minimal 2x-path cover. In order to prove this, consider a node u in Di . Suppose that there exists a path Pu,v
from u to some node v ∈ Vi in G such that its length is x
13