An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner May 19, 2017 1 Regular Path Queries (RPQs) Matching paths conforming to regular expression Only distinct (start, end) vertex pairs in result set v4 v5 final state start state s q1 b f Automaton representing (ab)+ b a v0 v2 a a v3 a a a b b v6 a v1 Search in data graph start v2 v3 v6 end v0 , v2 , v3 v0 , v2 , v3 v0 , v2 , v3 Final result set 2 Processing strategies Baseline Guided search with automaton on data graph Adjacency list on column store 3 Processing strategies Baseline Guided search with automaton on data graph Adjacency list on column store MR-Index Store results of RPQs for future use Treat vertex pairs as edges of a reachability graph Use graph compression for reachability graph v0 v2 v6 v3 Reachability graph 3 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1010 1111 1 0 0 1 0 1 1000 1100 Adjacency matrix 4 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1010 1111 1 0 0 0 1 1 1000 1100 Conceptual K2 -tree for k = 2 Adjacency matrix 4 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1010 1111 1 0 0 0 1 1 1000 1100 Conceptual K2 -tree for k = 2 Adjacency matrix 4 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1010 1111 1 0 0 0 1 1 1000 1100 Conceptual K2 -tree for k = 2 Adjacency matrix 4 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1010 1111 1 0 0 0 1 1 1000 1100 Conceptual K2 -tree for k = 2 Adjacency matrix 4 K2 -tree graph compression Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 0 1 1 0 0 1 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 Adjacency matrix 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1010 1111 1 1000 1100 Conceptual K2 -tree for k = 2 T1 = 1010 T2 = 0011 0011 L = 1010 1111 1000 1100 Bitvector representation 4 Measurements LDBC social network dataset with scale factor 1 (3 million vertices and 17 million edges) generate all possible RPQs with path length up to 3 hops, and recursions of them runtime in ms 200 150 100 base ADJ K2 K2C 50 0 −11 −10 −9 −8 −7 −6 −5 −4 x selectivity in 10 (ADJ - adjacency list, K2 - K2 -tree, K2C - K2 -tree with leaf compression) 5 Space consumption space consumption in MB ·1010 ADJ K2 K2C 1.5 1 0.5 0 0 0.5 1 1.5 2 −4 selectivity ·10 Space consumption of queries 6 Measurements Batch of 300 queries, memory budget of 10 GB number of uses time in 103 · s 10 8 6 4 2 base ADJ K2 K2C best 150 100 50 ADJ K2 K2C Batch processing with sampled query sets 7 Conclusion and Future Work Graph compression promising for storing reachability information K2 -trees not beneficial for all results sets I I Too much overhead for tiny result sets No good compression for huge result sets Future Work Experiment with other K2 -trees to compress uniform 1-regions as well Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI 8 Conclusion and Future Work Graph compression promising for storing reachability information K2 -trees not beneficial for all results sets I I Too much overhead for tiny result sets No good compression for huge result sets Future Work Experiment with other K2 -trees to compress uniform 1-regions as well Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI Thank You 8
© Copyright 2026 Paperzz