Graph Reachability Query
--Survey Report
By Xiangling Zhang@DBIIR RUC
1
Content
• Background
• Methods
• New trends
2
Background
• Huge amount of graph data being generated in real world applicati
ons
Route network
Social network
Biological network
Semantic web…
• Graph reachability query is becoming an important research topic.
• Problem definition:
Given two vertices u and v in a directed graph(DAG), a reachab
ility query asks if there is a path from u to v.
3
Content
• Background
• Methods
• New trends
4
Two possible solutions
• Traverse G(V, E) to answer reachability queries
Low query performance: O(|E|) query time
• Precompute and store the transitive closure TC
Fast query processing
Large storage requirement: O(|V|2)
DFS/BFS
Transitive closure
O(nm)
O(1)
O(n^2)
Construction Time
Query Time
Index Size
O(1)
O(m)
Trade off
O(1)
5
Methods classification
Type
Methods
Structure decomposition
& Interval code
Tree cover、Chain cover、Duallabeling、path-tree、Tree+SSPI、
GRIPP、GRAIL
Central node path
2-hop cover
Hybrid
3-hop cover
6
Single Interval Tree Coding Scheme(SIT)
• For tree-structured data
Assign each node one interval [start, end] such as the pair of
[pre-order, post-order] during DFS
Reachable iff one node’s interval contains the other’s
a [1,16]
[2,9] b
c
d [12,15]
[10,11]
e
f
[3,4]
[5,6]
g
[7,8]
h
[13,14]
Preorder: abefgcdh
Postorder: efgbchda
7
Tree cover [Agrawal et al. SIGMOD89]
• Extend to DAG [Agrawal et al. SIGMOD89]
Find a spanning tree T for the given graph G.
Assign postorder numbers and indices to the nodes.
Propagate the interval information through non-tree edge to the parents
nodes
a [1,8]
[1,4]
b
[5,5] c
d [6,7] [1,4]
e
h
[1,3]
[6,6]
f
[1,1]
g
[2,2]
8
Tree cover
a [1,8]
[1,4]
b
[5,5] c
a [1,8]
d [6,7] [1,4]
e
h
[1,3]
[6,6]
f
f
b
[1,1]
[4,4]
[1,2]
c
d [4,7][1,2]
[3,3]
e
[1,1]
[4,5]
g
g
h
[5,5] [6,6]
[2,2]
a [1,8]
a [1,8]
d [2,7]
c
f
b
[1,1]
[2,2]
[4,4]
[1,1]
c
d [4,7][1,1][2,2]
[3,3]
The best one
[1,1]
[2,5] b
e
[4,4]
[1,1]
[5,5]
g
h [6,6]
h
[5,5] [6,6]
e [2,4]
g
[2,2]
f
[3,3]
9
Tree cover
a
b
TopoOrder:{a,b,c}
c
a
a
Pred(a)={ }
Pred(a)={ }
b
Pred(b)={a}
a
b
c
Pred(c)={a,b}
Query
time
Index Construction Index
Time
size
O(logn)
O(nm)
O(n^2)
10
Chain cover [Jagadish. Database Syst.1990]
• A graph is split into node-disjoint chains. A node u can reach n
ode v if they exist in the same chain, and u precedes v.
Chain0: abdf
Chain1: gce
a [0,0]
[1,0]
b
d
[2,0]
f
[3,0]
g
[0,1]
c [1,1]
• Assign interval value to every node. [pij ,j] means node u’i pos
ition in j-th chain.
• Chaincode(vi ) means successor’s code:
Node a
e
[2,1]
b
c
d
e
f
chain [1,0] [2,0] [3,0] [3,0] [3,0] ----code [0,1] [1,1] [2,1]
g
[1,0]
•b------->e ?
Query
time
O(logn)
Index Construction Index
Time
size
O(n 2 kn k )
O(nk)
11
GRAIL [Hilmi etc. VLDB2010]
• GRAIL(Graph Reachability indexing via randomized interval labeling)
• The main idea of GRAIL is randomly traverse the graph d times and ge
nerate d intervals for every node.
0
[1,10]
[1,6]
1
2
15
[1,9]
[1,5]
[1,4]
3
4
5
7
6
[1,8]
node
exceptions
2
1,4
4
3,7,9
5
1,3,4,7,9
6
1,3,4,7,9
[1,3]
[1,7]
8
9
[1,1]
[2,2]
12
GRAIL
• GRAIL(Graph Reachability indexing via randomized interval labeling)
• The main idea of GRAIL is randomly traverse the graph d times and ge
nerate d intervals for every node.
[1,10][1,10]
0
3
[1,6][1,9]
1
2
[1,9][1,7]
node
exceptions
4
3,7,9
[1,5][1,8]
[1,4][1,6]
3
4
5
7
6
[1,8][1,3]
[1,3][1,5]
[1,7][1,2]
8
[1,1] [1,1]
9
[2,2] [4,4]
13
GRAIL
[1,10][1,10]
0
[1,6][1,9]
1
2
[1,9][1,7]
[1,5][1,8]
3
4
5
[1,8][1,3]
•2-----4 ?
7
6
[1,3][1,5]
•4------->9 ?
[1,7][1,2]
8
[1,1] [1,1]
9
[2,2] [4,4]
14
Methods classification
Type
Methods
Structure decomposition &
Interval code
Tree cover、Chain cover、Duallabeling、path-tree、Tree+SSPI、
GRIPP、GRAIL
Central node path
2-hop cover
Hybrid
3-hop cover
15
2-Hop Cover[Cohen et al.SODA2002]
• For each node a, maintain two sets of labels (nodes): Lin(a)
and Lout(a)
• For each connection (a,b),
– choose a node c on the path from a to b (center node)
– add c to Lout(a) and to Lin(b)
a
c
b
16
2-Hop Cover[Cohen et al.SODA2002]
• Then (u,v)Transitive Closure T L (u) L (v)
out
in
Node Lin Lout
1
2
3
4
5
6
1-6 ??
1
2
4
1
3
4
4
4
5
4
6
4
17
2-Hop Cover
• The optimal 2-hop cover problem is to find the minimum size 2hop cover, which is proved to be NP-hard.
1
2
4
3
5
6
Edges
2
I
1
2
4
5
initial density:
O
8
8
1.33
I O 24 6
(We can cover 8 connections with 6 nodes)
6
18
2-Hop Cover
• The optimal 2-hop cover problem is to find the minimum size 2hop cover, which is proved to be NP-hard.
1
2
4
5
3
I
6
1
4
2
5
3
6
initial density:
Edges
O
12
12
1.71
I O 43 7
4
19
2-Hop Cover
2
1
4
3
2
1
1
4
2
4
3
4
4
3
6
Node Lin Lout
4
5
5
6
Node Lin Lout
1
2
4
1
3
4
4
4
5
4
6
4
5
4
6
4
20
2-Hop Cover
• The computational cost is high. First, it needs to compute the ed
ge transitive closure. Second, it need to rank all 2-hop clusters S b
ased on the density.
Query
time
Index Construction Index size
Time
O(m1/2)
O(n3|TC|)
O(nm1/2)
21
Methods classification
Type
Methods
Structure decomposition &
Interval code
Tree cover、Chain cover、Duallabeling、path-tree、Tree+SSPI、
GRIPP、GRAIL
Central node path
2-hop cover
Hybrid
3-hop cover
22
3-Hop Cover[Jin. SIGMOD 2009]
• The three hops are:
1)the first hop from the starting vertex to the entry point of so
me chain
2)the second hop from the entry point in the chain to the exit
point of the chain
3) and the third hop from the exit point of the chain to the des
tination vertex.
3-Hop Cover
I:{11}
2
20 ?
O:{8}
I:{13}
O:{6}
I:{9}
O:{9}
Step1: In chain C1 which one contain 2, collect
the smallest vertices on any other chain that
node 2 can reach; X ={6,15}
Step2: In chain C4 which one contain 20,collec
t the largest vertices on any other chain whi
ch can reach node 20; Y={9,13}
Step3: Check if there is an (x,y) pair,
such that x.chainId=y.chainId and x<=y.
O:{15}
Yes! 6 and 9 in the s
ame chain and 6<9
Summary
Algo.
Index construction Query time
Time
Tree cover
O(nm)
Chain cover
GRAIL
2-hop cover
3-hop cover
O(n 2 kn k )
O(d(n+m))
O(n3 | TC |)
O(kn2 )
O(n)
Index size
O(n 2 )
O(logn)
O(nk)
O(d) ~O(n+m)
O(dn)
O(m1/ 2 )
O(logn+k)
O(n m1/ 2 )
O(nk)
Open Problems
• Do not take edge label into consideration;
• Can not update dynamically;
26
Content
• Background
• Methods
• New trends
27
New trends
I/O Cost Minimization: Reachability Queries Processing over Ma
sive Graphs
Scaling Reachability Computation on Large Graphs
The Exact Distance to Destination in Undirected World
Label-constraint Reachability Queries
K-Reach
28
Label-constraint Reachability Queries
Some of the graphs are edge-labeled to indicate different type
of relation, such as social network, semantic network etc.
Label-Constraint Reachability Query: Can u reach v through a
path whose edge labels must satisfy certain member constraints?
A
parent-of
sister-of
C
B
friend-of
D
employee-of
E
29
Label-constraint Reachability Queries
Q1: Can vertex 0 reach 9 only thr
ough edge labels { a,b,c } ?
Yes
Q2: Can vertex 0 reach 9 only th
rough edge labels { a,b } ?
No
30
K-Reach
The query asks whether there exists a path from s to t such that
the length of the path is no more than k.
The problem of k-hop reachability cannot be derived from clas
sic reachability, which is actually a special case of k-hop reachabi
lity.
31
Thanks All!
32
© Copyright 2026 Paperzz