An Analysis of the Feasibility of Graph Compression Techniques for

An Analysis of the Feasibility of Graph Compression
Techniques for Indexing Regular Path Queries
Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner
May 19, 2017
1
Regular Path Queries (RPQs)
Matching paths conforming to regular expression
Only distinct (start, end) vertex pairs in result set
v4
v5
final state
start state
s
q1
b
f
Automaton representing (ab)+
b
a
v0
v2
a
a
v3
a
a
a
b
b
v6
a
v1
Search in data graph
start
v2
v3
v6
end
v0 , v2 , v3
v0 , v2 , v3
v0 , v2 , v3
Final result set
2
Processing strategies
Baseline
Guided search with automaton on data graph
Adjacency list on column store
3
Processing strategies
Baseline
Guided search with automaton on data graph
Adjacency list on column store
MR-Index
Store results of RPQs for future use
Treat vertex pairs as edges of a reachability graph
Use graph compression for reachability graph
v0
v2
v6
v3
Reachability graph
3
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1010 1111
1
0 0 1
0
1
1000 1100
Adjacency matrix
4
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1010 1111
1
0
0 0 1
1
1000 1100
Conceptual K2 -tree for k = 2
Adjacency matrix
4
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1010 1111
1
0
0 0 1
1
1000 1100
Conceptual K2 -tree for k = 2
Adjacency matrix
4
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1010 1111
1
0
0 0 1
1
1000 1100
Conceptual K2 -tree for k = 2
Adjacency matrix
4
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1010 1111
1
0
0 0 1
1
1000 1100
Conceptual K2 -tree for k = 2
Adjacency matrix
4
K2 -tree graph compression
Compact representation of a binary relation, e.g., an adjacency matrix
Hierarchical graph compression
0
1
2
3
4
5
6
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
2
0
0
1
1
0
0
1
0
3
0
0
1
1
0
0
1
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
Adjacency matrix
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0 0 1
1010 1111
1
1000 1100
Conceptual K2 -tree for k = 2
T1 = 1010
T2 = 0011 0011
L = 1010 1111 1000 1100
Bitvector representation
4
Measurements
LDBC social network dataset with scale factor 1 (3 million vertices and 17 million edges)
generate all possible RPQs with path length up to 3 hops, and recursions of them
runtime in ms
200
150
100
base
ADJ
K2
K2C
50
0
−11
−10
−9
−8
−7
−6
−5
−4
x
selectivity in 10
(ADJ - adjacency list, K2 - K2 -tree, K2C - K2 -tree with leaf compression)
5
Space consumption
space consumption in MB
·1010
ADJ
K2
K2C
1.5
1
0.5
0
0
0.5
1
1.5
2
−4
selectivity
·10
Space consumption of queries
6
Measurements
Batch of 300 queries, memory budget of 10 GB
number of uses
time in 103 · s
10
8
6
4
2
base ADJ K2 K2C best
150
100
50
ADJ
K2
K2C
Batch processing with sampled query sets
7
Conclusion and Future Work
Graph compression promising for storing reachability information
K2 -trees not beneficial for all results sets
I
I
Too much overhead for tiny result sets
No good compression for huge result sets
Future Work
Experiment with other K2 -trees to compress uniform 1-regions as well
Improve access time by providing specialized range queries, extracting submatrices
Compare memory consumption and access time with other compact reachability indices
like FERRARI
8
Conclusion and Future Work
Graph compression promising for storing reachability information
K2 -trees not beneficial for all results sets
I
I
Too much overhead for tiny result sets
No good compression for huge result sets
Future Work
Experiment with other K2 -trees to compress uniform 1-regions as well
Improve access time by providing specialized range queries, extracting submatrices
Compare memory consumption and access time with other compact reachability indices
like FERRARI
Thank You
8