View - OhioLINK Electronic Theses and Dissertations Center

ABSTRACT
COMPUTING POINT-TO-POINT SHORTEST PATH USING AN APPROXIMATE
DISTANCE ORACLE
by Pawan Poudel
We propose an extremely simple and efficient shortest path algorithm that computes an
optimal shortest path between a pair of points in a metric space. Our algorithm works
similarly to Dijkstra’s algorithm, but uses heuristic information provided by an
approximate distance oracle to prune nodes that cannot be on the shortest path. Our
algorithm returns the exact shortest path in time (CS*)O(dim) using this linear size data
structure, where S* is the number of vertices in the shortest path, dim is the doubling
dimension of input graph, and C is a constant. We prove that this is nearly optimal by
proving a lower-bound of (CS*)Ω(dim). This paper presents theoretical and experimental
results to prove that if there exist efficient distance oracles for road maps, then our
algorithm explores very few nodes compared to Dijkstra’s algorithm, A* algorithm, and
Goldberg, et al’s ALT algorithms.
COMPUTING POINT-TO-POINT SHORTEST PATH USING AN APPROXIMATE
DISTANCE ORACLE
A Thesis
Submitted to the
Faculty of Miami University
in partial fulfillment of
the requirements for the degree of
Master of Computer Science
Department of Computer Science and Systems Analysis
by
Pawan Poudel
Miami University
Oxford, Ohio
2008
Advisor______________________________
Dr. William John Brinkman
Reader_______________________________
Dr. James D. Kiper
Reader_______________________________
Dr. Lukasz Opyrchal
Table of Contents
1. Introduction ................................................................................................................... 1
2. Definitions ...................................................................................................................... 3
2.1. Metric ....................................................................................................................... 3
2.2. Metric Space............................................................................................................. 3
2.3. Distance Oracle ........................................................................................................ 4
2.4. Shortest Path Distance Oracle .................................................................................. 4
2.5. Ball ........................................................................................................................... 4
2.6. Doubling Dimension ................................................................................................ 5
2.6.1. Doubling constant ............................................................................................. 5
2.7. ((1+ε)-short path) ..................................................................................................... 5
3. Related Works ............................................................................................................... 5
4. Background and Motivation ........................................................................................ 8
5. Thesis Research Problem.............................................................................................. 9
6. The Algorithm ............................................................................................................. 11
6.1. Assumptions........................................................................................................... 12
6.2. Correctness of our Algorithm................................................................................. 14
7. Experimental Methodology ........................................................................................ 16
7.1. Experimental setup................................................................................................. 17
7.2. Implementation....................................................................................................... 18
7.2.1. A* algorithm.................................................................................................... 18
7.2.2. Dijkstra’s Algorithm........................................................................................ 19
7.2.3. Our Algorithm ................................................................................................. 20
7.3. Experimental Results.............................................................................................. 21
7.3.1. Our Algorithm vs. Dijkstra’s Alogorithm........................................................ 23
7.3.2. Our Algorithm vs. A* Algorithm ..................................................................... 24
7.3.3. Our algorithm vs. Goldberg, et al’s ALT algorithms...................................... 26
8. Time Complexity ......................................................................................................... 27
9. Future Research .......................................................................................................... 31
10. Conclusion.................................................................................................................. 32
11. References .................................................................................................................. 33
A1. Appendix ................................................................................................................... 35
ii
List of Tables
Table 1. Comparison between our algorithm and Dijkstra’s algorithm --------------- 23
Table 2. Comparison between our algorithm and A* algorithm ----------------------- 25
Table 3. Comparison between our algorithm and ALT algorithms -------------------- 26
iii
List of Figures
Figure 1. Ball (x, r) ------------------------------------------------------------------------------ 4
Figure 2. Illustration of Doubling Dimension ----------------------------------------------- 5
Figure 3. Map of Chicago resembling ℓ1 plane --------------------------------------------- 10
Figure 4. Pruning nodes ------------------------------------------------------------------------ 14
Figure 5. Proof of Theorem 6.2.1 ------------------------------------------------------------- 15
Figure 6. Example run of Dijkstra’s algorithm ---------------------------------------------- 21
Figure 7. Example run of A* algorithm ------------------------------------------------------ 21
Figure 8. Example run of our algorithm ------------------------------------------------------ 22
Figure 9. Example run of our algorithm ------------------------------------------------------ 22
Figure 10. Speed-up graph (against Dijkstra’s algorithm) --------------------------------- 24
Figure 11. Speed-up graph (against A* algorithm) ----------------------------------------- 25
Figure 12. Average ratio graph (against Goldberg’s ALT algorithms) ------------------ 25
Figure 13. Graph Sm with 3 iterations -------------------------------------------------------- 27
Figure 14. A binary tree ------------------------------------------------------------------------ 30
Figure 15. Destination along the edge -------------------------------------------------------- 31
dim
Figure 16. Vertex p with degree 2 M + 1------------------------------------------------- 34
Figure 17. Binary tree with bounded degree but un-bounded doubling dimension ---- 35
Figure 18. Star-like graph ---------------------------------------------------------------------- 36
iv
Acknowledgement
I would like to sincerely thank Dr. Brinkman for his enormous help and guidance
throughout the process. This thesis could not have been written without Dr. Brinkman,
who not only served as my advisor, but also encouraged and motivated me throughout
my academic program. I would also like to thank Dr. Kiper and Dr. Opyrchal for willing
to be the members of thesis committee.
v
1. Introduction
The shortest path problem is one of the most extensively studied problems in the field of
algorithms. It has important applications, for example, finding the shortest path1 between
two locations in GPS devices and online software such as Google Maps, Map quest, and
Yahoo! Maps. Computer scientists and researchers have developed many powerful and
useful algorithms to compute the shortest path in a graph. Consider a problem of finding
the shortest path between a pair of points in a large network, such as the US road
network. This is a classic example of a point-to-point shortest path problem in a graph.
There are a plenty of algorithms to compute such a path, for example, Dijkstra’s
algorithm [6]; the A* algorithm [16]; the Floyd-Warshall algorithm [8, 22]; the BellmanFord algorithm [2, 9]; and Johnson’s algorithm [17], to name a few.
The biggest shortcoming of the aforementioned algorithms is that they traverse a big
portion of an input graph in order to find the shortest path. Thus, they perform poorly if
the given graph is too large. Their performance can be improved significantly if they
somehow preprocess the input graph, storing important information about the graph. This
information will then be very useful while computing the shortest path. Our algorithm
takes this approach of preprocessing the input graph and using a simple algorithm to
compute the shortest path based upon the information stored while preprocessing. Recent
work on the point-to-point shortest path problem is focused on preprocessing the input
graph in order to answer shortest path queries quickly. A more detailed discussion of
recent work related to our research appears in section 3.
A related area of research is the study of distance oracles. A distance oracle is used to
preprocess the input graph and store shortest distances between all pairs of vertices. The
simplest distance oracle is to compute and store the shortest distance between all pairs of
vertices v , v ' in an n × n matrix, requiring storage space O(n2), where n is the number of
vertices in an input graph.
1
Although we use “the” in front of “shortest path” in this paper, there could be more than one shortest
path with equal cost between a pair of points in a graph.
1
It takes polynomial time to create an n × n matrix. Answering any subsequent distance
query will then take O(1) time. However, the data structure of size O(n2) is infeasible for
desktop or embedded applications, especially when our target application is a huge graph
such as the U.S. road network, which has over 30 million nodes. Such a data structure
would require hundreds of terabytes of storage space. More efficient distance oracles
proposed by Thorup, et al. [21], Talwar [20], and Har-Peled, et al. [15] store important
information about the input graph rather than computing and storing the shortest distance
between all pairs of vertices. These distance oracles then compute and return the distance
between two points in constant time as an answer to a distance query.
We propose a simple and efficient point-to-point shortest path algorithm that uses
heuristic information provided by a distance oracle. Our algorithm issues distance queries
to the distance oracle. The answer obtained from a distance oracle helps us to prune a
considerable number of vertices that are not likely to be on the shortest path. No work has
been done until now to analyze the results of such pruning algorithms theoretically. We
present a pruning technique with theoretical and experimental results.
Although the concept of using a distance oracle to prune a significant number of nodes is
very simple, the technical process of implementing a working distance oracle tends to be
challenging and time consuming. Therefore, it is natural to ask an inevitable question: Is
it worth our time before we embark on a diligent pursuit of building an efficient distance
oracle? One of the main objectives of our research was to find an answer to this question.
The results from our experiments provide enough evidence for us to move forward with
building such a distance oracle. We conducted a large number of experiments comparing
the performance of our algorithm, Dijkstra’s algorithm, A* algorithm, and Goldberg’s
ALT algorithms [11] on the entire US road network, the largest road map accessible to
us. We successfully arrived to the conclusion, through our experiments, that our
algorithm outperforms Dijkstra’s, A*, and Goldberg’s ALT algorithms if we have access
to efficient distance oracles for road maps. In absence of an actual implementation of
distance oracle, we formulated a technique to simulate the presence of a distance oracle.
The detailed discussion of this technique appears in section 7.
2
As a result of our research, Yan Dai has begun an investigation of whether or not the nettree technique of Har-Peled and Mendel [15] can be applied to road maps. His
preliminary results (personal communication) seem to indicate that he will be successful
in developing the needed distance oracle.
This document is organized as follow: In section 2, we give definitions of some important
terms and concepts such as doubling dimension; in section 3, we discuss some of the
important works related to shortest path algorithms; in section 4, we give background
information and motivation for the research; in section 5, we define the thesis research
problem; in section 6, we explain the proposed algorithm; in section 7, we present our
experimental results; in section 8, we give theoretical proof of lower and upper bound.
2. Definitions
2.1. Metric
For a given set X, a metric is a distance function d: X × X → R+ such that
1. ∀x, y ∈ Χ, d ( x, y ) ≥ 0 - positive ness,
2. ∀x, y ∈ Χ, d ( x, y ) = d ( y, x) - symmetry,
3. ∀x ∈ Χ, d ( x, x) = 0 - reflexivity,
4. ∀x, y, z ∈ Χ, d ( x, y ) + d ( y, z ) ≥ d ( x, z ) - triangle inequality,
and in most cases (if the following condition is not met, the function is a semimetric, not a metric)
5. ∀x, y ∈ Χ, x ≠ y ⇒ d ( x, y ) > 0 - strict positive ness.
(Chavez, et al. [3])
2.2. Metric Space
A metric space M = (X, d) is a set X together with a distance function d, where d is a
metric.
3
2.3. Distance Oracle
A distance oracle of a finite metric space X is a “compact” data-structure that can answer
distance queries for pairs of points. The performance of a distance oracle is measured
using four parameters (P, S, Q, κ ), where P is the preprocessing time, S is the space used
by the distance oracle (in terms of memory words), Q is the query time, and κ is the
approximation factor (Har-peled, et al. [15]).
2.4. Shortest Path Oracle
A shortest path oracle of a finite metric space X is a data structure that can be used to
efficiently construct an actual shortest path between two points. In particular this differs
from a distance oracle because it returns a path, and not just a distance. The performance
of a shortest path oracle is measured using four parameters (P, S, Q, κ ), similar to the
definition for distance oracles. Note, however, that Q = Ω(S*), where S* is the minimum
number of edges in any κ -shortest path.
2.5. Ball
The ball B(x, r), with respect to a particular metric d, is the set { y ∈ Χ : d ( x, y ) ≤ r}
(Talwar [20]). In other words, x is the center of the ball, and r is the radius.
Figure 1: Ball(x, r)
4
2.6. Doubling Dimension
2.6.1. Doubling constant
The doubling constant of a metric space (X, d) is the smallest value λ such that every ball
in X can be covered by λ balls of half the radius. A ball B(x, r) is said to be covered
by λ balls of half the radius (r/2) if there exists a set S ∈X such that |S| = λ and
 r
B ( x, r ) ⊆ ∪ B  s ,  .
 2
s∈S
The doubling dimension of X is then defined as dim(X) = log2 λ (Gupta, et al [13]).
Figure 2: Illustration of Doubling Dimension. The left-hand figure shows that a grid-like
graph has doubling constant 4. The right-hand figure demonstrates that the Euclidean
plane has doubling constant 7.
2.7. ((1+ε)-short path)
Let ls,t denote the length of the shortest path from s to t, and l(P) denote the length of path
P. Then a path Pi is said to be (1 + ε )-short if l(Pi) ≤ (1+ε) ls,t, where s and t are the
endpoints of Pi. Thus, P is within ε of being the shortest path.
3. Related Works
There has been a considerable amount of work on finding efficient shortest path
algorithms in a graph. Dijkstra [6] proposed one of the most important shortest path
algorithms called Dijkstra’s algorithm. A lot of work has been done to implement
5
Dijkstra’s algorithm efficiently. Ahuja, et al. [1] implement Dijkstra’s algorithm on a
graph with n vertices, m edges, and non-negative integer arc costs bounded by C using
two-level radix heap. The implementation gives time bound of O(m + nlogC / log log C).
Hart, et al. [16] use heuristic information in graph searching to define the well-known
path finding algorithm A*. Goldberg [10] presents a simple shortest path algorithm with
running time O(m + nlogC) where C is the ratio of the largest and the smallest nonzero
arc length. Cherkassky, et al. [4] developed several natural shortest paths problem
generators in order to study practical performance of several shortest path algorithms
such as, Bellman-Ford-Moore Algorithm, Dijkstra’s Algorithm, and Incremental Graph
Algorithms.
Current research on shortest path algorithms is focused on answering distance and path
queries as quickly as possible by preprocessing the input graph. Goldberg, et al. [11]
propose a set of algorithms called ALT algorithms based on A* search in combination
with a graph-theoretic lower bound techniques based on landmarks and the triangle
inequality. ALT algorithms preprocess the input graph to answer shortest path queries
quickly. While preprocessing, they select a small number of landmarks, then compute
and store shortest path distances between all vertices and each of these landmarks. If the
landmarks chosen are the best possible set of landmarks then ALT algorithms work well.
However, luck still plays a role in selecting good landmarks. Thus, ALT algorithms do
not guarantee a good running time even though an optimal path is guaranteed (Goldberg,
et al. [11]). Additionally, Goldberg, et al, were unable to prove any theoretical bounds on
running time of ALT algorithms. A detailed comparison between the ALT algorithms and
our algorithm appears in section 7.
The performance of ALT algorithms is measured based on the number of vertices visited.
We take similar approach (output-dependent running time) to measure the performance
of our algorithm. The number of vertices visited gives the running time of our algorithm.
In general, modern point-to-point shortest path algorithms (P2P) scan only a small
portion of the entire input graph, unlike classic algorithms such as Dijkstra’s algorithm.
We may assume that P2P algorithms scan at least all the vertices along the shortest path,
6
because we are required to give this list of vertices as output. It makes more sense to
compute the running time of P2P algorithms based on number of vertices visited rather
than the size of graph. Also some algorithms (for example, Sanders and Schultes’
“Highway Hierarchies” [19]) do not return the list of nodes on the path, but instead only
return a list of nodes where the driver needs to change direction. For that reason, Sanders
and Schultes’ algorithm can get much faster running times than anything that is possible
if one is returning the real shortest path. They give a very fast algorithm that calculates
distances exactly. However, their algorithm does not return the actual shortest path, but
only a subset of the nodes on the shortest path. Gutman [14] has also proposed an
algorithm similar to Goldberg’s, but with a different approach. His algorithm gives the
exact shortest path, but only for a restricted class of graphs. Gutman’s algorithm,
optimized for road networks, uses the concept of reach. His algorithm is further
improved by Goldberg, et al [12] by introducing a bidirectional version that uses implicit
lower bounds.
Thorup, et al [21] propose an approximate distance oracle that can answer distance
queries for pairs of nodes. Their distance oracle preprocesses the input graph and
calculates approximate distance between pairs of nodes in a graph in O(kmn1/k) expected
time. It constructs a data structure of size O(kn1+1/k), where k ≥ 1 is an integer. The
approximate distance returned is of stretch at most 2k-1. The size of data structure
created by Thorup, et al’s distance oracle is not linear to the size of input graph. The
technique used by Thorup, et al, is to construct a collection of trees that form a tree cover
of the graph. Each node is contained only in a small number of trees. For any pair of
nodes there is a tree in the cover containing a path between them. Such a tree can be
found in constant time. Their approximate distance oracles, however, are still super-linear
in size. Thorup, et al, also argue that assuming a long-standing conjecture of Erdos, the
storage space cannot be decreased for general graphs.
In the case of doubling metrics, Talwar [20] significantly improves upon Thorup, et al’s
distance oracles by proposing an algorithm which returns (1+ ε )-approximate distance
7
  c k
∆
between pairs of nodes using a data structure of size O  n   k log  , where k is the
 ε 
ε 

doubling dimension of input graph, ∆ =
d max
is the aspect ratio of the metric, ε is a
d min
parameter 0 < ε ≤ 1 , and c is a constant. The size of the data structure is linear to the size
of input graph, as long as the doubling dimension of the graph is constant. Har-peled and
Mendel [15] further improve Talwar’s distance oracle by proposing a compact
representation scheme based on the construction of hierarchical nets. Har-peled and
Mendel’s compact representation scheme has preprocessing time P = 2O (dim) poly (n) ,
storage space S = ε − O (dim) n , query time Q = O(dim), and approximation factor κ = 1+ ε ,
where dim is doubling dimension and 0 ≤ ε ≤ 1 . For the case of metrics with bounded
doubling dimension, the result of Har-Peled, et al, is the best known.
Talwar [20] has also proposed an algorithm to compute (1+ ε )-approximation shortest
path in time l(log n)o(1) using a data structure of size n(log n)o(1), where l is the number of
hops in the output path. We prove in this thesis that our algorithm will run in (CS*)O(dim)
time, where S* is the number of vertices on the shortest path, and C is a constant. Note
that S* can be different from l, the number of hops in output path (not necessarily the
shortest path). Our data structure (which is just the Har-Peled’s distance oracle) has size
O(n). Our algorithm improves upon Talwar [20] and Goldberg, et al [11] by returning the
exact shortest path in (CS*)O(dim) time with only linear storage space.
4. Background and Motivation
The problem of finding the shortest path in huge road networks is, by its nature, very
expensive. The existing algorithms are not sufficiently fast to find the exact shortest route
between a pair of points in a large road map. Although Dijkstra’s algorithm always
computes the optimal shortest path if there exists one, it is too slow for common use, for
example, in online route-finding, or for an implementation in a GPS device. It scans the
entire set of vertices in an input graph in the worst case.
8
The A* algorithm is the most popular choice for computing the shortest path because of
its flexibility and its use in a wide range of context. A* works much better compared to
Dijkstra’s algorithm. If an estimate from current point to goal (h(n)), returned by a
heuristic, is exactly equal to the cost of reaching the goal from the current point, then A*
will only expand points on the shortest path and never expand anything else. This is the
highest level of efficiency any exact shortest path algorithm can attain. However, finding
such an exact heuristic for road networks is extremely difficult. The common
implementations of A* include the use of heuristics that either underestimate or
overestimate the cost of moving from current point to the goal. A* will expand more
nodes if h(n) is lower than the exact cost, making it slower. On the other hand, A* will
expand very few nodes if h(n) is higher than the exact cost, making it extremely fast, but
inaccurate. Similar is the case with other approximate and heuristic algorithms. Their
results can be easily off by a few percent, making them not a good choice. In this time of
high gas prices, one can not afford to take a longer route to the destination, especially
corporations with large shipping fleets, such as United Parcel Service (UPS), Inc.
Goldberg’s ALT algorithm does give the exact shortest path using heuristic approach,
however, they do not have provably good way to pick their landmarks, so good
performance can not be guaranteed.
The bottom line is that there is a need for an algorithm that can solve the shortest path
problem in huge road networks with greater efficiency. In this thesis, we provide such an
algorithm with significant experimental and theoretical results that prove our algorithm’s
efficiency. Section 6 provides a wealth of detail about our algorithm
5. Thesis Research Problem
Our first goal was to find the exact shortest path in a road map using very fast queries to a
data structure that uses small storage space. Section 8 proves that our current algorithm
has running time (CS*)O(dim), where S* is the number of vertices on the shortest path, C is
a constant, and dim is the doubling dimension of an input graph. Our second goal was to
do experimental analysis of maps to justify our claim that the use of an approximate
distance oracle in our algorithm yields significantly better results in terms of the number
9
of vertices visited compared to Dijkstra’s algorithm, A* algorithms, and Goldberg’s ALT
algorithms. Experimental results in section 7 show that our algorithm is even better in
practice than our theoretical bound would indicate.
Our research was focused particularly on creating a shortest path oracle with better
running time than Goldberg’s ALT algorithms and Talwar’s shortest path oracle, if we
were given a distance oracle that is efficient. One class of graphs for which good distance
oracles exist is doubling metrics. A metric is called a doubling metric if the doubling
dimension of the metric is constant with respect to growth in size of the metric. We say a
family of metrics is doubling if there is a constant C such that dim <= C for every metric
in the family. Philosophically, we expect that road maps most likely have low doubling
dimension: For example within a city it is probably close to 2 (see figure 3). This is
because city maps (which tend to be grids) look something like ℓ 1 on the plane. The
doubling dimension of the ℓ 1 plane is 2.
Figure 3: A portion of map of Chicago resembling ℓ1 plane with bounded doubling
dimension (dim = 2)
10
6. The Algorithm
Our algorithm solves the point-to-point shortest path problem on a weighted, undirected
graph G = (V, E). It takes a graph, source vertex, target vertex, and
ε as inputs, where ε
is a parameter 0 ≤ ε ≤ 1 . The algorithm works very similarly to Dijkstra’s algorithm
except that it uses an efficient pruning technique to explore very few nodes while
computing the shortest path. Our algorithm is exceedingly simple. The challenge lies in
proving that it is efficient in both theory and practice. In this thesis, we attempt to prove
its efficiency by providing theoretical proof and also the experimental results.
For each vertex v in graph, we maintain an attribute d[v], which is an upper bound on the
weight of the shortest path from source s to v. We call d[v] a shortest-path estimate. The
algorithm initializes the shortest-path estimates, predecessors, and the ClosedList (a set of
fully explored vertices) using the following procedure.
InitializeGraph (G, s)
{
for each vertex v ∈ V[G]
{
d[v] = ∞;
parent[v] = null;
}
d[s] = 0;
ClosedList = null;
}
The algorithm also maintains a priority queue Q, a set of visited vertices whose final
shortest-path weight has not been determined yet. The priority queue Q is initialized with
all the vertices in graph 2. It then repeatedly selects the vertex u from Q with the
minimum d[u], and inserts u into the ClosedList if ((h( s, u ) + h(u , t )) / h( s, t )) ≤ (1 + ε )
where s is the source, t is the destination, and h() is the actual shortest distance between
two vertices in an input graph.
2
Although initializing the priority queue with all the vertices in graph is not efficient, it makes the proof
of correctness of the algorithm simpler. The better approach is to initialize the priority queue with just the
source vertex ‘s’. Other vertices are added to the queue only after they are discovered for the first time
during the search.
11
By enforcing the condition ((h( s, x) + h(t , x)) / h( s, t )) ≤ (1 + ε ) our algorithm prunes
considerable number of vertices that are not in any (1+ ε )-short path. It also checks to
see if the shortest path from s to v, where v is u’s neighbor, found so far can be improved.
If so, it decreases the value of the shortest-path estimate d[v] and updates v’s predecessor
field parent[v] using the following procedure.
Relax (u, v)
{
if (d[v] > d[u] + dist_between(u,v))
{
d[v] = d[u] + dist_between(u,v);
parent[v] = u;
}
}
The complete algorithm is listed below.
1.
2.
3.
OurAlgorithm (G, s, t, ep)
{
InitializeGraph (G, s);
4.
Q = V[G];
5.
While (Q is not empty)
6.
{
7.
u = ExtractMin (Q);
8.
if (u == t)return;
9.
if ((u ∉ ClosedList)AND(((h(s,u)+h(t,u))/h(s,t))<=(1+ep))
10.
{
11.
Add u to the ClosedList;
12.
for each vertex v ∈ AdjList[u]
13.
Relax (u,v);
14.
}
15.
16.
}
}
6.1. Assumptions
We make the following assumptions about the type of graph that can arise from a road
map. We believe all road networks will obey these assumptions.
12
1. Input graph has constant degree. During our research, we found that the
highest degree of any node was 16, and this only happened in cases where
two-way streets were represented by two edges instead of a single edge.
2. All edge weights are nonnegative. Therefore, w(u,v) ≥ 0 for each edge
(u,v) ∈ E.
3. The aspect ratio of the graph, ∆ G, is bounded. (Since the graph comes from
real data, there cannot be infinitesimally small or infinitely large distances.)
4. No vertex has degree 2. A vertex with degree two can be removed, and the
incident edges combined into a single edge.
While computing the exact shortest path between source (s) and target (t), we restrict our
attention to only the (1+ ε )-short paths between s and t. To do this, we use a distance
oracle to check if a prospective path is (1+ ε )-short. Pruning the nodes that are unlikely to
be in the shortest path is not a new technique. The A* algorithm does pruning based on a
heuristic function, which estimates the shortest distance between the current node and the
destination node. Goldberg’s ALT algorithms prune unlikely nodes using the data
structure which stores shortest distance between landmarks and every other node in a
graph. The difference between their approaches and ours is that we use a distance oracle
that is provably good, and that has a user-tunable level of precision.
The distance oracle works as follow. It computes a (1+ ε )-approximation to the shortest
distance between a given pair of points. We are given a description of a large network,
such as the U.S. road network with n nodes and m edges. Each edge has a weight
associated with it. Let us assume the weight is the length of the edge. In any graph
m < n2, but it is most likely that m = O(n) in real road networks. To reduce the total
amount of time taken while computing the shortest distance between any two nodes,
distance oracles preprocess the road network by calculating the distance between all pairs
of nodes and storing them in a data structure.
Our algorithm queries the distance oracle for an approximate shortest distance between a
pair of points under consideration. We then use the answer received from distance oracle
13
to make intelligent decisions while computing the exact shortest path. Our algorithm
never expands a node which is not in any (1+ ε )-short path. We define h() as the
approximate shortest distance returned by the distance oracle, and d() as the exact
shortest distance. The approximate distance h() satisfies the inequality
d (u, v) ≤ h(u, v) ≤ (1 + ε )d (u, v) . If h( s, u ) + h(u, t ) > (1 + ε )h( s, t ) , the node u cannot
be on the shortest path from s to t. Thus, u is never expanded. On the other hand, we
might expand some nodes with d(s,u) + d(u,t) = (1+ ε )d(s,t), because of the imprecision
of the distance oracle. Hence, we never skip any node that is on the shortest path, and we
might visit some nodes that are on (1+ ε )-short paths, but at least we never visit a node
unless it is on a (1+ ε )-short path. Figure 4 shows two different paths from s to t. Path 1
contains x as an intermediate node between s and t. Path 2 contains y as an intermediate
node between s and t. Let us assume we have h( s, x) + h( x, t ) ≤ (1 + ε )h( s, t ) and
h( s, y) + h( y, t ) > (1 + ε )h( s, t ) . Our algorithm will not expand y because it is not on a
(1+ ε )-short path, but x gets expanded because it is on a (1+ ε )-short path.
Figure 4: Pruning nodes that are not in (1+ ε )-short path
6.2. Correctness of our Algorithm
Theorem 6.2.1 If we run our algorithm on a weighted graph G = (V, E) with source s,
target t, distance oracle error ε , and nonnegative weight function w, then it returns the
optimal shortest path from s to t.
14
Proof 6.2.1 We prove theorem 6.2.1 by contradiction. Our approach is very similar to the
one used by Cormen, et al [5] to prove the correctness of Dijkstra’s algorithm. The
following theorem is due to Cormen, et al [5].
Theorem 6.2.2 Let G = (V, E) be a weighted graph with weight function w : E → R+. Let
s ∈ V be the source vertex, and let the graph be initialized by InitializeGraph(G, s). Then
d [v] ≥ δ ( s, v) for all v ∈ V, and this invariant is maintained over any sequence of
relaxation steps on the edges of G. Moreover, once d[v] achieves its lower bound δ ( s, v) ,
it never changes.
Definition 6.2.1 (Shortest-path weight)
The shortest-path weight from u to v is defined by
min{w( p ) : u → v} if there is a path from u to v,
otherwise.
∞
δ (u , v) = 
We show that for each vertex u ∈ ClosedList, we have d[u] = δ ( s, u ) at the time when u
is inserted into set ClosedList and that this equality never changes thereafter. For the
purpose of contradiction, let u be the first vertex for which d[u] ≠ δ ( s, u ) when it is
inserted into set ClosedList. We will derive the contradiction that d[u] = δ ( s, u ) by
examining a shortest path from s to u. Since s is the first vertex into set ClosedList and
d[s] = δ ( s, s ) = 0, we must have u ≠ s. Our assumption d[u] ≠ δ ( s, u ) implies that there
must be some path from s to u, otherwise d[u] = δ ( s, u ) = ∞ which would violate the
assumption. Therefore, there is a shortest path p from s to u. Path p connects a vertex in
ClosedList, namely s, to a vertex in V – ClosedList, namely u. Let us consider the first
vertex y along p such that y ∈ V – ClosedList, and let x ∈ V be y’s predecessor. Thus, as
shown in Figure 5 below, path p can be decomposed as s
15
p1
→x→ y
p2
→u.
Figure 5: The proof of Theorem 6.2.1.
We must have had d[x] = δ ( s, x) when x was inserted into ClosedList, because u was
chosen as the first vertex for which d[u] ≠ δ ( s, u ) and x was inserted into ClosedList
before u (see figure 5). Edge (x, y) was relaxed after x was inserted into ClosedList.
Because x is y’s predecessor and there is only one way to reach y from x, we can claim
that d[y] = δ ( s, y ) . As we can see from the figure 5 that y occurs before u on a shortest
path from s to u and all edge weights in G are nonnegative, we have δ ( s, y ) ≤ δ ( s, u ) , and
thus
d [ y ] = δ ( s, y )
≤ δ ( s, u )
≤ d [u ]
We also have d[u] ≤ d[y] because both vertices u and y were in V – ClosedList when u
was chosen in line 7. Thus we have d [ y ] = δ ( s, y ) = δ ( s, u ) = d [u ] due to the two
inequalities. By theorem 6.2.2 this equality holds thereafter. Consequently, we arrived at
d[u] = δ ( s, u ) , which contradicts the assumption we started with.
7. Experimental Methodology
We tested how many vertices Dijkstra’s algorithm, A* algorithm and our algorithm fully
explored while computing the shortest path from source node s to target node t. Through
our experiments we attempt to show that our algorithm outperforms Dijkstra’s algorithm
16
and A* algorithm by exploring fewer points. We also present a comparison between our
experimental results and Goldberg, et al’s result.
7.1. Experimental setup
All experiments were run using ArcGIS Desktop 9.2 software under Windows XP
Professional x64 operating system on a desktop workstation, which had 4GB of RAM
and a 2.4 GHz Intel Core2 Quad processor. However, due to 32-bit nature of ArcGIS, and
limitations of .NET 2.0 Common Language Runtime (CLR) little above 800MB of
memory was accessible to an individual process.
We ran our experiments on an entire US road network (StreetMap USA) provided by
ESRI. The StreetMap includes all of the roads, local and highway, in the United States.
We opted for the StreetMap USA because it is the largest, the most complete, the most
accurate, and the most widely-used road map we could get our hands on. Therefore, we
believe the results from our experiments are widely applicable to other road networks too.
Our experimental approach is explained below in steps.
1. Define a distance – First we defined the distance between two random points (e.g.
320 km).
2. Pick a random point in the 2-D plane overlaid on the US map. We used Donald E.
Knuth’s subtractive random number generator algorithm (Knuth [18]) to generate
pseudo-random numbers between 1 and the maximum number of points in an
input map. Since a definite mathematical algorithm was used to generate the
random numbers, the chosen numbers were not completely random. However, the
numbers were sufficiently random for practical purpose.
3. Pick a random point at distance D (e.g. 320 km) from the first point (measured by
a straight line in the plane).
4. Find the two graph nodes closest to these two points on the map.
5. Count the number of vertices on (1+ ε )-short paths for the two selected graph
nodes.
17
6. Count the number of vertices visited by Dijkstra’s algorithm while finding the
shortest path between the two selected nodes.
7. Count the number of vertices visited by A* algorithm with the straight-line
heuristic.
8. Run this experiment (with 15 pairs of random points) for D values of 5km, 10km,
20km, 40km, 80km, 160km, 320km, and 640km; and for values of epsilon ( ε ) 1,
1/2, 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, 1/256, 1/512, 1/1024, 1/2048, 1/4096,
1/8192, 1/6384, 1/32768, 1/65536, 1/131072, and 0.
Our experiments were run on the full graph and not on sub-graphs. Luck did not play any
role during the course of experiments. For each different distance D, we ran 285
experiments and we used eight different distances. Due to high volume of computation
and limitations of ArcGIS Desktop 9.2, we found it infeasible to run the experiments
using pairs of points separated by distance greater than 640 km. We believe it would be
feasible to run experiments using pairs of points that were more than 640 km far away
from each other if we ran our experiments with ArcGIS Desktop 9.3 which supports x64
bit architecture. The ArcGIS Desktop 9.3 is due to release in late 2008.
7.2. Implementation
We implemented all three algorithms in C#.NET using ArcObjects library from ArcGIS.
Our decision to implement the algorithms using Network Analyst extension of ArcGIS
from ESRI was due to the rich ArcObjects COM library that could be used to build
custom plug-ins for ArcGIS to analyze maps. We used a standard heap implementation of
priority queues for all three algorithms and a generic dictionary<Tkey, Tvalue> for the
distance oracle.
7.2.1. A* algorithm
The A* algorithm makes use of a given heuristic function to compute lowest-cost path
between two points. It calculates g(x), the lowest cost to reach node x from source node s,
and h(x), the estimated cost to reach target node t from x. A node with the lowest
18
f(x) = g(x) + h(x) is picked by A* to explore further. A* guarantees the optimal shortest
path only if the heuristic function always underestimates the distance between two nodes.
We implemented A* using straight line distance heuristic. Since the straight line distance
between any two nodes is the lowest possible distance, the straight line heuristic always
underestimates the distance between two nodes. The reason for opting straight-line
heuristic over Manhattan distance heuristic was because A* does not guarantee to find
the exact shortest path unless h(u,t) <= d(u,t). The straight-line heuristic is also the most
commonly used in practice (Goldberg, et al. [11]).
For our implementation of A*, we used two sets, openList and closedList. The standard
heap implementation of priority queue was used as openList. ClosedList was
implemented as a generic dictionary<Tkey, Tvalue>. The openList set contains nodes
that are candidates for examining by the algorithm. Initially, the openList set contains just
the start node s. OpenList might contain every node in a graph depending on how the
algorithm is implemented. The closedList contains fully explored nodes. Initially, the
closedList set is empty. There is a main loop that repeatedly extracts the best node u from
openList and examines it. If the node u is the target node, then A* is done searching
nodes. Otherwise, the node u is added to the closedList and is explored by adding its
neighbor nodes to the openList if they are not fully explored yet. Each node also keeps a
pointer to its parent node so that we can determine how it was found. This pointer is used
to draw the shortest path on a graph.
7.2.2. Dijkstra’s Algorithm
Dijkstra’s algorithm uses a greedy approach to compute the lowest-cost path between two
nodes in a graph. It always explores the node closest to source node s. Dijkstra’s
algorithm always computes the exact shortest path if there exists one. It can be viewed as
a special case A* algorithm, with h(u,v) = 0. Therefore, our implementation of Dijkstra is
very similar to A*. Dijkstra extracts nodes with minimum g(x) from the openList and
adds it to the closedList after fully exploring it. The neighbors of this node are added to
19
the openList if they haven’t been fully explored yet. The algorithm stops its execution
when it extracts the target node from the openList.
7.2.3. Our Algorithm
Our algorithm computes the shortest path using a distance oracle. The distance oracle
pre-computes (1+ ε )-approximate shortest distances between all pairs of points in an
input graph. It then returns the approximate distance h(u, v), where u, v ∈G(V, E), in
response to distance queries issued by our algorithm.
It is important to note that the major focus of our research was to find out whether or not
using the distance oracle was a good idea, and, if it was, how much better our algorithm
would perform compared to Dijkstra’s and A* algorithms. We did not require an actual
implementation of a distance oracle to do this analysis. Therefore, we decided not to
implement one given the scope of our research. Since we did not implement the distance
oracle, we could not actually create the algorithm we wanted to test. However, with a
little bit of ingenuity we could identify which vertices our algorithm would visit if it was
implemented. The basis of comparison was the number of vertices fully explored by each
algorithm. By using the technique explained below, we were able to compute the total
number of vertices our algorithm would explore, if we had access to the distance oracle.
Our algorithm does not explore any vertex that is not on (1+ ε )-short path. It does that by
looking at each node, and see if the node is on a (1+ ε )-short path from the source s to
destination t. To check if a node was on a (1+ ε )-short path, we computed
 h ( s, x ) + h ( x, t ) 
 d ( s , x ) + d ( x, t ) 
 , where d(u, v) is the distance

 instead of 
h( s , t )
d ( s, t )




returned by our implementation of Dijkstra’s algorithm and h(u,v) is an approximate
 d ( s , x ) + d ( x, t ) 
 would always be a
d ( s, t )


distance a true oracle would return. The value 
number greater than or equal to 1 because of the triangle inequality. Now if
20
 d ( s , x ) + d ( x, t ) 

 <= (1 + ε ), then x is on a (1+ ε )-short path from s to t. So, for each
d ( s, t )


point x in the graph, we calculated the smallest ε so that x lies on some (1+ ε )-short
path. To achieve this, starting from source s we computed the shortest distance from s to
every vertex using our implementation of the Dijkstra’s algorithm. Then we did the same
starting from target node t. An actual distance oracle would compute an (1+ ε )approximate distance between all pairs of points in a graph, but we stopped once we
reached points x such that d(s, x) > 2 × d(s, t) or d (t, x) > 2 × d(s, t). If we stopped at that
point, we would still get every point that was in a (1+ ε )-short path with ε <= 1.
7.3. Experimental Results
We ran all three algorithms many times on US road network. The result showed that our
algorithm consistently outperformed A* and Dijkstra’s algorithm depending on what
value of ε we were testing.
Figure 6: Example run of Dijkstra’s algorithm. The nodes in green are fully explored by
Dijkstra. The nodes in blue are on the shortest path.
21
Figure 7: Example run of A* algorithm. The nodes in green are fully explored by A*. The
nodes in blue are on the shortest path.
Figure 8: Example run of our algorithm using distance oracle that has 1.5% built-in error.
The nodes in green are fully explored by our algorithm. The nodes in blue are on the
shortest path.
22
Figure 9: Example runs of our algorithm using distance oracle that has no built-in error. If
we have access to a distance oracle with 0% error, our algorithm will visit only those
vertices that are on the shortest path.
7.3.1. Our Algorithm vs. Dijkstra’s Algorithm
Our algorithm explored fewer vertices than Dijkstra’s algorithm in all cases. We ran
some experiments to see if our algorithm would outperform Dijkstra’s algorithm when
ε = 1 and it did. It shows that even if we had a distance oracle which has 100% error
built into it, our algorithm would still outperform Dijkstra’s algorithm. We used the
median speed-up as the basis of comparison between our algorithm, Dijkstra’s, and A*.
The speed-up is nothing but the number of nodes visited by Dijkstra’s or A* divided by
the number of vertices visited by our algorithm. For example, if Dijkstra’s algorithm
visits 40,000 vertices and our algorithm visits 1,000 vertices then the speed-up value is
40.
Distance (KM)
Distance
Oracle Error %
5
10
20
40
80
160
320
640
100
1.4
1.6
1.3
1.5
1.4
1.5
1.4
X
50
2.2
2.9
2.4
2.9
2.9
3.2
2.3
X
25
3.4
4.5
4.8
5.3
5.2
5.7
4.6
4.4
12.5
5.2
7.7
8.7
9.1
8.3
9.3
7.9
7.8
6.25
8.9
13.9
13.6
14.6
17.2
14.8
13.7
11.9
23
3.125
13.3
22.1
22.4
27
28.3
25.8
26.9
23.7
1.5625
16
33
38.1
43
53.4
45
43.9
36.2
0.78125
23
36.7
48.6
71.4
77.2
78
65.8
57.9
0.390625
23
37.8
55.1
86.5
99.5
138.3
115
95.9
0.1953125
23
38.4
59.2
122.6
161.1
212.5
254.4
153.9
0.09765625
23.8
43.6
77.4
132
184.4
331.2
385.4
255.9
0.048828125
23.8
44.6
91.3
139.8
192.8
416.8
475.6
424.6
0.024414063
26.2
44.6
91.3
158.2
196.2
513.6
512
637.6
0.012207031
26.2
44.6
91.3
163.3
197.2
570.8
582.2
872
0.006103516
26.2
44.6
91.3
164.1
197.2
574.7
607.4
972.9
0.003051758
26.2
44.6
91.3
165
203.8
577.6
625.5
1026.2
0.001525879
26.2
44.6
91.3
165
203.8
579
635.9
1046.6
0.000762939
26.2
44.6
91.3
165
203.8
579
647.6
1067.7
0
26.2
44.6
91.3
165
203.8
586.4
750.2
1355.5
Table 1: Comparison between our algorithm and Dijkstra’s algorithm in terms of median
speed-up.
1200
1000
800
Median
600
First Quartile
Third Quartile
400
200
0
1
0.1
0.01
0.001
0.0001
0.00001
0.000001
0.0000001
0.00000001
Figure 10: First quartile, median and third quartile of speed-up for the comparison
between our algorithm and Dijkstra’s algorithm. Distance = 320KM.
7.3.2. Our Algorithm vs. A* Algorithm
With each experiment we ran, our algorithm outperformed A* if ε <= 0.0625. However,
it worked worse than A* when ε ≥ 0.125. In other words our algorithm outperforms A*
algorithm if we have a distance oracle that has less than 6.25% error.
24
Distance (KM)
Distance
Oracle Error %
5
10
20
40
80
160
320
640
100
0.4
0.2
0.3
0.2
0.2
0.2
0.2
X
50
0.6
0.4
0.4
0.4
0.4
0.3
0.3
X
0.6
25
1
0.8
0.7
0.7
0.7
0.6
0.6
12.5
1.4
1.3
1.4
1.2
1.3
1
1
1
6.25
2.1
2
2.2
2.4
2.6
1.6
1.6
1.7
3.125
3
3.6
3.8
3.8
5.3
2.8
2.8
2.8
1.5625
3.7
4.4
5.8
6.4
8.5
5
5
4.7
0.78125
5
4.7
8.4
9.9
12
8.9
8.5
7.6
0.390625
5.1
5.2
9.3
14.3
14
16.6
15.3
11.5
20.6
0.1953125
5.6
5.8
10.3
19.3
19.5
25.7
25.2
0.09765625
5.6
7.2
11.2
21.2
27.8
37.7
33.9
37
0.048828125
5.6
7.2
12.4
22.5
31.1
52.8
50.3
63.2
0.024414063
5.6
7.2
12.5
22.7
32.8
60.9
67.8
90.4
0.012207031
5.6
7.2
12.6
23.9
33.2
65.1
77.3
116.8
0.006103516
5.6
7.2
12.6
23.9
33.2
67.6
82.5
136.9
0.003051758
5.6
7.2
12.8
24.2
33.2
74.5
86.2
145.2
0.001525879
5.6
7.2
12.8
24.2
33.5
74.5
86.4
157.2
0.000762939
5.6
7.2
12.8
24.2
33.5
74.5
86.4
159.2
0
5.6
7.2
12.8
24.2
34.3
74.5
92.1
173.6
Table 2: Comparison between our algorithm and A* algorithm in terms of median speedup.
140
120
100
80
Median
First Quartile
Third Quartile
60
40
20
0
1
0.1
0.01
0.001
0.0001
0.00001
0.000001
0.0000001
0.00000001
Figure 11: First quartile, median and third quartile of speed-up for the comparison
between our algorithm and A* algorithm. Distance = 320KM.
25
7.3.3. Our algorithm vs. Goldberg’s ALT algorithms
Goldberg, et al’s experimental results show that for their best ALT algorithms, running
on road graphs, the average number of vertices scanned varies between 4 and 30 times
the number of vertices on the shortest path. For example, if the given graph contains
3,000,000 vertices and if there are 1,000 vertices on the shortest path then ALT
algorithms scan 10,000 vertices (10 scanned vertices for every shortest path vertex). To
compare our results with Goldberg, et al’s results, we computed the ratio between nodes
on (1+ ε )-short path and nodes on the shortest path for our algorithm. We did not
implement Goldberg’s ALT algorithms since they already provided the experimental
results in their paper [11]. Table 3 below shows the ratio for all our different values of ε
and distance. The comparison shows that our algorithm scans fewer vertices than
Goldberg’s ALT algorithms if we have access to an efficient distance oracle.
Distance (KM)
Distance
Oracle Error %
100
50
25
12.5
6.25
3.125
1.5625
0.78125
0.390625
0.1953125
0.09765625
0.048828125
0.024414063
0.012207031
0.006103516
0.003051758
0.001525879
0.000762939
0
5
10
20
40
80
160
320
640
20
13
8
5
3
2
2
1
1
1
1
1
1
1
1
1
1
1
1
45
24
13
8
5
3
2
2
1
1
1
1
1
1
1
1
1
1
1
90
52
30
17
10
5
3
2
2
2
1
1
1
1
1
1
1
1
1
131
75
45
27
16
9
4
2
2
1
1
1
1
1
1
1
1
1
1
149
73
41
24
15
9
5
3
2
2
1
1
1
1
1
1
1
1
1
377
200
112
65
40
24
13
7
4
3
2
1
1
1
1
1
1
1
1
508
298
163
96
54
32
18
10
6
4
3
2
2
2
2
1
1
1
1
X
X
309
191
118
63
35
22
14
9
5
3
3
2
2
2
2
2
1
Table 3: Average ratio between nodes on (1+ ε )-short path and nodes on the shortest
path.
26
600
500
400
Average Ratio
300
200
100
0
1
0.1
0.01
0.001
0.0001
0.00001
0.000001
1E-07
1E-08
Figure 12: Average ratio between nodes on (1+ ε )-short path and nodes on the shortest
path. Distance = 320KM.
8. Time Complexity
We now give a proof for lower and upper bound on running time of our algorithm for the
special case of metrics with bounded doubling dimension. We show that there exist
graphs with doubling dimension dim which require (S*)Ω(dim) time for an exact shortestpath computation, if the algorithm is only allowed to use edge lengths and a distance
oracle. S* is the number of links on the shortest path from s to t, and C is a constant.
Consider the graph Sm as shown in figure 13.
Figure 13: The graph Sm with 3 iterations
27
Sm consists of m+4 vertices with 2m+2 edges. The middle stage of the graph contains m
parallel paths from the left endpoint to the right end-point. Sm is really just K2,m with two
extra edges and vertices, where m is the iteration.
Consider the following recursive construction of Gk,m. Let G0,m be a single edge with endpoints s and t. In order to make Gk,m, simply take Gk−1,m, and replace each edge with a
copy of Sm, where the two vertices of Sm that have degree 1 become the two end-points of
the edge. The graph has exactly (2m+2)k edges, and 2 + (m + 2)
The length of any shortest path from s to t is exactly 4 ≈ n
k
(2m − 2) k − 1
vertices.
(2m + 2) − 1
 2 


 dim +1 
.
Now consider any deterministic algorithm that uses a (1+ ε )-oracle to search for the
shortest path. Also, consider an adversarially constructed version of Sm and an
adversarially constructed (1+ ε )-oracle. The (1+ ε )-oracle always answers the distance
queries as if all edges have length 1. For any edge of Sm that is actually inspected by the
algorithm, set its length to 1. If, at the end of the algorithm, there is any edge not on the
 ε
selected shortest path that was not directly inspected, then set its length to  1 −  .
 2
Hence, if the algorithm is to be correct, it cannot allow there to be any uninspected edges,
except on its own selected path. Thus, at least n – S* edges must be explored, and
n − S * ∼ = ( S *)
 dim 


 2 
 ε
. If a path has one edge of length  1 −  , then the (1+ ε ) error in the
 2
 ε
oracle could end up canceling out the  1 −  . Hence, the unmodified graph and the
 2
modified graph could look identical from the point of view of the oracle. Therefore, no
deterministic algorithm can find the short edge unless it examines all the edges. Every
node has degree between 1 and 3 there are at least n edges, but S* could be as small
 2 


 dim 


2 
as 4k = n dim  . Hence, we may need ( S *)
time just to visit all the edges.
28
We proved that our algorithm’s lower bound on running time is (S*)Ω(dim). We now
proceed to find the upper bound on running time. The number of vertices in a ball of
radius l(1+ ε ), where l is the total number of vertices in a shortest path, centered at
source s gives the upper bound on running time. Let V(l(1+ ε )) be the total number of
vertices included within the Ball of radius l(1+ ε ). We will prove upper bounds on
V(l(1+ ε )). First note that, for k ≥ 0,
 l (1 + ε ) 
V (l (1 + ε )) ≤ (2dim ) k V 
 . This is from the definition of doubling metrics. Let K be
k
 2

the smallest value of k such that
e min
2
≥
l (1 + ε )
2k
, emin is the length of the shortest edge in
 l (1 + ε ) 
the shortest path. Hence, K ≥ lg 
 . Therefore,
 e min 
 l (1 + ε ) 
V (l (1 + ε )) ≤ (2dim )k V 

k
 2

≤ (2 )
dim
 l (1+ ε ) 

 e min 
lg 
≤ (2dim )
≤ (2dim )

l (1 + ε )
V   l (1+ε ) 
 lg  e min 
2
 l (1+ ε ) 

 e min 
lg 
 l (1+ ε ) 

 e min 
lg 






 l (1 + ε ) 
V

  l (1 + ε )  


  e min  
V ( e min )
 e min 
We know that V 
 = 1 . Thus,
 2 
V (l (1 + ε )) ≤ (2
dim
)
 l (1+ ε ) 
lg 

 e min 
 l (1 + ε ) 
≤
 e min 

dim


  e min 
≤ Ο
l
dim



29
Since S* is the number of vertices in the shortest path, we have
the aspect ratio of the graph. The aspect ratio C =
l
e min
= CS * , where C is
e max
, where, e max = length of the
e min
longest edge in the shortest path, and e min = length of the shortest edge in the shortest
path. If e min =1 (which we can assume without loss of generality by scaling all
distances), we have V (l (1 + ε )) = ( CS *)
Ο (dim)
.
 l 
Note that S * = Θ 
 is not always true for general graphs. Figure 14 below shows an
 e min 
example of the graph where it does not work.
Figure 14: A binary tree. The length of edges on level k ≥ 1 is double the length of edges
on level k+1.
The graph shown in figure 14 above has total number of vertices (n) = 2k – 1, where k is
the number of levels. The length of the shortest distance from s to t (l) = 2k – 1 = n, but
S* = lg(n) = k, i.e. l = 2S* - 1 = Ω(2S*). Therefore, the length of the shortest path is
exponential in the number of vertices in the shortest path. However, it is very unlikely to
have such a graph in a real map network since the leaves are very far away. In a general
map we can always expect
l
e min .C
≤ S* ≤
l
e min
30
, where C is the aspect ratio of the graph.
9. Future Research
The running time of our algorithm could be improved in two main ways. First, the
constant C is very large, around 3000. Since we expect dim to be in the range 2-4, we
need to remove the power of dim from C, or remove C altogether. Secondly, the lower
and upper bound of our algorithm (CS*)O(dim) only applies to the metrics with bounded
doubling dimension. It is possible that we could remove this exponential dependence on
dim using some other technique. In particular, augmenting our algorithm with other data
structures, and exploring other geometric insights about a map might be useful in
improving our bounds. Our eventual goal is to have a running time bound of O(S*),
where S* is the number of vertices in the shortest path. This would be optimal, because
the output of the algorithm must contain a list of length at least S*. However, our current
algorithm takes (CS*)O(dim). This is because it expands some of the nodes that are on
(1+ ε )-short paths but not on the exact shortest path.
Our target application is a general map. The location of the destination t might
correspond to any point along a road, and not necessarily only to the intersections (see
figure 15). For further research we can use the concept of map metric to calculate the
shortest path distance between points along a road by treating an input graph as being a 1D simplicial complex. By using a map metric every point along the road in the metric can
be contained. Since the degree of any map is bounded, the choice of map metrics makes
more sense than normal metrics. We present some interesting lemmas related to map
metric and doubling dimension in appendix section.
Figure 15: Destination along the edge
31
10. Conclusion
We proposed our shortest path algorithm to find the exact shortest path in a graph using
an approximate distance oracle. Our algorithm is a simple and efficient extension of
Dijkstra’s algorithm. For the special case of metrics that have bounded doubling
dimension, our algorithm uses Har-Peled’s distance oracle to prune a large number of
vertices that are not in (1+ ε )-short path from source vertex s to the target vertex t. Our
algorithm has a good running time bound of (CS*)O(dim) and it uses linear space. The
results of our experiments show that by using a distance oracle the shortest path between
two points in a large graph can be computed very efficiently.
32
11. References
[1] R. K. AHUJA, K. MEHLHORN, J. B. ORLIN AND R. E. TARJAN. 1990. Faster Algorithms for
the Shortest Path Problem. Journal of the Association for Computing Machinery 37, 213-223.
[2] R. BELLMAN. 1958. On a Routing Problem. Quarterly of Applied Mathematics 16, 87-90.
[3] E. CHAVEZ, G. NAVARRO, R. BAEZA-YATES AND J. E. MARROQU´IN. 2001. Searching in
Metric Spaces. ACM Computing Surveys 33, 273-321.
[4] B. V. CHERKASSKY, A. V. GOLDBERG AND T. RADZIK. 1996. Shortest Path Algorithms:
Theory and Experimental Evaluation. Math. Prog. 73, 129-174.
[5] T. H. CORMEN, C. E. LEISERSON, R. L. RIVEST AND C. STEIN. 2001. Chapters 24: SingleSource Shortest Paths, and 25: All-Pairs Shortest Paths. In Introduction to Algorithms,
Anonymous MIT Press and McGraw-Hill, , 580-642.
[6] E. W. DIJKSTRA. 1959. A Note on Two Problems in Connexion with Graphs. Numerische
Mathematik 1, 269-271.
[7] ESRI. 2006. Data & Maps and StreetMap USA.
[8] R. W. FLOYD. 1962. Algorithm 97: Shortest path. Communications of the ACM 9, 11-12.
[9] L. R. FORD JR. AND D. R. FULKERSON. 1962. Flows in Networks. Princeton University
Press.
[10] A. V. GOLDBERG. 2001. A Simple Shortest Path Algorithm with Linear Average Time. In
Proc. 9th ESA, Lecture Notes in Computer Science LNCS 2161, 230-241.
[11] A. V. GOLDBERG AND C. HARRELSON. 2005. Computing the Shortest Path: A* Search
Meets Graph Theory. In Proc. 16th ACM-SIAM Symposium on Discrete Algorithms 156-165.
[12] A. V. GOLDBERG, H. KAPLAN AND R. F. WERNECK. 2005. Reach for A*: Efficient Point-toPoint Shortest Path Algorithms. MSR-TR-2005-132, 1-41.
[13] A. GUPTA, R. KRAUTHGAMER AND J. R. LEE. 2003. Bounded geometries, fractals, and
low-distortion embeddings. Foundations of Computer Science 1-10.
[14] R. GUTMAN. 2004. Reach-based Routing: A New Approach to Shortest Path Algorithms
Optimized for Road Networks. In Proc. 6th International Workshop on Algorithm Engineering
and Experiments 100-111.
[15] S. HAR-PELED AND M. MENDEL. 2006. Fast construction of Nets in Low Dimensional
Metrics, and Their Applications. SICOMP 35, 1148-1184.
[16] P. E. HART, N. J. NILSSON AND B. RAPHAEL. A Formal Basis for the Heuristic
Determination of Minimum Cost Paths.
33
[17] D. B. JOHNSON. 1977. Efficient Algorithms for Shortest Paths in Sparse Networks.
Journal of the ACM (JACM) 24, 1-13.
[18] D. E. Knuth. "The Art of Computer Programming, volume 2: Seminumerical Algorithms".
Addison-Wesley, Reading, MA, second edition, 1981.
[19] P. SANDERS AND D. SCHULTES. 2007. Engineering Fast Route Planning Algorithms. 6th
Workshop on Experimental Algorithms (WEA), LNCS 4525, pp. 23-36.
[20] K. TALWAR. 2004. Bypassing the Embedding Algorithms for Low Dimensional Metrics.
Annual ACM Symposium on Theory of Computing 281-290.
[21] M. THORUP AND U. ZWICK. 2005. Approximate distance oracles. ACM Press 52, 1-24.
[22] S. WARSHALL. 1962. A Theorem on Boolean Matrices. Journal of the ACM (JACM) 9, 1112.
34
A1. Appendix
Definition A1.1 (Map metric) Consider a graph G with vertex set V, an edge set
E ∈ V × V, and a weight function ω : E → R+. A map metric is then defined as the set X
= V ∪ {v | v = α s + (1 − α )t for some (s,t) ∈ E and ∀ 0 < α < 1 } with the distance
function being the length of the shortest path.
The following lemmas state important properties of metrics with respect to doubling
dimension. We also present proof sketch for every lemma.
Lemma A1.1 Let MG be the map metric of a graph G. Then if MG has doubling
dimension dim M , the degree of the vertices of G is bounded by degG ≤ 2dimM . Hence, if
the map metric is doubling, it also has bounded degree.
Proof A1.1 Let m be the length of the shortest edge in the graph. Then consider
any point p which has degree at least
2dimM + 1. A ball of radius m around the
point cannot be covered by less than
2dimM + 1 balls of radius m/2. This is a
contradiction.
Figure 16: Vertex p with degree
35
2dimM + 1
Figure 16 depicts a small portion of the input graph where m is the length of the shortest
edge from node p to its neighbors which lie along the border of Ball(p, m). Extra node 5
makes deg( p) =
2dimM + 1. Each ball of radius m/2 can not cover more than one node.
dimM
If node p has degree ≤ 2
by
, all its neighbors contained within B(x, m) can be covered
2dimM balls. Since, deg(p) = 2dimM + 1, node 5 can not be covered by 2dimM balls of
radius m/2. This contradicts the notion of doubling dimension
Lemma A1.2 Bounded degree does not imply bounded doubling dimension.
Proof A1.2 Build a complete binary tree. Any edge that goes between two internal nodes
has length 1/ ∆ , and any edge that goes to a leaf has length 1. When ∆ = log2 n, this
gives doubling dimension Ω (log2 n).
Figure 17: Binary tree with bounded degree (3) but un-bounded doubling dimension
(O(n))
Lemma A1.3 There are shortest-path metrics in graphs which have bounded doubling
dimension, but un-bounded degree.
Proof A1.3 Let the graph Gn consist of a star-like configuration: There is a main vertex
x0 to which all the other vertices, x1 to xn−1 are connected. Let ω i be the weight of the
edge from x0 to xi, and let ω i = 2i. Then any ball of radius r can be covered by at most 3
balls of radius r/2, but the degree of x0 is Ω (n). This proof fails if we use our notion of a
map metric, because the previous statement now fails.
36
Figure 18: Star-like graph
Figure 18 shows a star-like configuration of graph Gn. X0 is the main vertex to which
vertices x1, x2, x3, and x4 are connected. We need only 2 balls of radius 4 to cover all
vertices within the ball of radius 8. Therefore, the doubling dimension of graph Gn is 1.
However, the degree is Ω (n).
37

Download Report

View - OhioLINK Electronic Theses and Dissertations Center

Paperzz.com

Your Paperzz