TEDI:Efficient Shortest Path Query Answering on Graphs

TEDI:Efficient Shortest Path Query Answering
on Graphs
Fang Wei
University of Freiburg, Germany
ACM SIGMOD 2010
June 25, 2010
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Application
A shortest path query on a graph finds the shortest path for the given
source and target vertices in the graph.
I
Ranked keyword search.
I
XML databases.
I
Social network.
I
Bioinformatics.
Motivation
Current techniques are based on pre-computation of compressed
Breadth First Search trees of graph (EDBT ’09). Main problem is
scalability.
For a graph with 20,000 vertices:
I
Space: the compact BSF-tree has the size of 744MB.
I
Time: index construction time takes more than 30 minutes.
Contribution
TEDI (TreE Decomposition based Indexing)
I
Solid theoretical background based on tree decomposition.
I
Linear time tree decomposition algorithm.
To find an optimal tree decomposition is NP-hard. The linear time
algorithm is based on heuristics.
I
Flexibility of balancing the time and space efficiency.
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Problem Definition
Let G = (V, E), E ⊆ V × V be an undirected graph where
n = |V|, m = |E|. Given u, v ∈ V, the shortest path problem finds a
shortest path from u to v w.r.t. the length of all paths from u to v.
I
One solution is to execute the Breadth First Search (BSF) on the
graph. The time complexity is O(n).
I
Another solution is to pre-compute, for each vertex pair (u, v),
the shortest path. The space overhead is at least O(n2 ). For large
graph, this is not affordable.
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Tree Decomposition
Definition (Tree Decomposition)
A tree decomposition of G = (V, E), denoted as TG is a pair
({Xi |i ∈ I}, T), where {Xi |i ∈ I} is a collection of subsets of V and
T = (I, F) is a tree such that:
S
1. i∈I Xi = V
2. for every (u, v) ∈ E, there is i ∈ I, (u, v) ∈ Xi
3. for all v ∈ V, the set {i|v ∈ Xi } induces a subtree of T
Tree Decomposition
Tree Decomposition
For each tree node i, there is a bag Xi consisting of graph vertices.
Given any graph G, there may exist many tree decompositions which
fulfill all the conditions in the definition. We are interested in those
tree decompositions with smaller bag sizes.
Definition (Width, Treewidth)
Let G = (V, E) be a graph.
I
The width of a tree decomposition ({Xi |i ∈ I}, T) is defined as
max{|Xi | : i ∈ I}.
I
The treewidth of G is the minimal width of all tree
decompositions of G. It is denoted as tw(G) or simply tw.
Tree Decomposition
Definition (Inner Edge)
Let G = (V, E) be a graph and TG = ({Xi |i ∈ I}, T) be a tree
decomposition of G. The inner edges of TG are the pairs of tree
vertices defined as follows:
{({u, i}, {v, i})|(u, v) ∈ E, u, v ∈ Xi , i ∈ I}
Definition (Inter Edge)
Let G = (V, E) be a graph and TG = ({Xi |i ∈ I}, T) be a tree
decomposition of G. Let v ∈ Xi and v ∈ Xj , where either (i, j) ∈ F or
(j, i) ∈ F holds. We call the edge from vertex {v, i} to {v, j} the inter
edge and denote it as ({v, i}, {v, j}).
Tree Decomposition
I
Inner edges: ({0, 2}, {5, 2}), ({1, 3}, {2, 3}), ({2, 1}, {3, 1}),
({3, 2}, {0, 2})
I
Inter Edges: ({5, 0}, {5, 2})
Tree Decomposition
Definition (Tree Path)
Let G = (V, E) be a graph and TG = ({Xi |i ∈ I}, T) be a tree
decomposition of G. Let u, v ∈ V. Let further {u, i} and {v, j} be tree
vertices in TG . A tree path from {u, i} to {v, j} is a sequence of tree
vertices connected with either inter or inner edges.
Lemma
Let G = (V, E) be a graph and TG = ({Xi |i ∈ I}, T) be a tree
decomposition of G. Let u, v ∈ V. Let further {u, i} and {v, j} be tree
vertices in TG . There is a path from u to v in G if and only if there is a
tree path from {u, i} to {v, j}.
Tree Decomposition
Example
Consider the above graph. 4 reaches 0 with path (4,1,2,3,0). There is
a tree path from {4, 1} to {0, 2} as ({4, 1}, {4, 3}, {1, 3}, {2, 3},
{2, 1}, {3, 1}, {3, 0}, {3, 2}, {0, 2})
Tree Decomposition
Lemma
Let {u, i} and {v, j} be two tree vertices, and SPi,j be the simple path
between tree node i and j. Let P be a tree path from {u, i} to {v, j}.
Then P visits every node in SPi,j .
Theorem
Let G = (V, E) be a graph and TG = ({Xi |i ∈ I}, T) be a tree
decomposition of G. Let u, v ∈ V, and ru (rv ) be the root node of the
induced subtree of u (v). Then for every node w in SPru ,rv , there is one
vertex t ∈ Xw such that d(u, v) = d(u, t) + d(t, v).
Tree Decomposition
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Index Construction
Input:
Output:
G = (V, E)
TG and local shortest paths.
1. GraphReduction(k, G): output vertex stack S and reduced graph
G0 .
2. TreeDecomposition(S, G, G0 ): output the tree decomposition TG .
3. LocalShortestPaths(G, TG ): compute local shortest paths in TG .
Graph Reduction
I
Remove the vertices one by one with increasing degree until k.
I
Push the removed vertices into a stack.
I
Tree will be constructed based on the stack.
Tree Decomposition
Complexity
Index construction time
Index size
Query time
Use BSF, at each tree node
O(n2 )
For each tree node l̄|X|2 , for root
l̄|R|2
O(k2 h), k is the number of reductions and h is the height of
the tree.
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Environment
I
CPU: Intel 2.4 GHz
I
Memory: 2GB
I
Language: C++ with STL
Dataset
Query Time
Index Construction Time
Index Size
Index Size
Outline
Introduction
Problem Definition
Tree Decomposition
Algorithms and Complexity
Experimental Results
Conclusion
Conclusion
I
Introduced an indexing and query answering scheme based on
tree decomposition for shortest path query answering.
I
Extensive experiments demonstrated that TEDI achieves the
improvement of performance.
I
The algorithm scales well over large datasets.
Thank You!