Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper April 26, 2017 Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 1 / 21 Large-scale graph processing is ubiquitous Social networks Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21 Large-scale graph processing is ubiquitous Social networks Genome analysis Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21 Large-scale graph processing is ubiquitous Social networks Genome analysis Graphs enable Machine Learning Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21 Powerful, heterogeneous machines Take advantage of heterogeneous machine to process tera-scale graphs Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21 Table of contents 1 Graph Processing: Sample Application 2 Design Mosaic Architecture Graph Encoding API 3 Evaluation Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 4 / 21 Graph Processing: Applications Community Detection Find Common Friends Find Shortest Paths Estimate Impact of Vertices (webpages, users, . . . ) ... Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 5 / 21 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core In memory Cluster Out-of-core In memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core ⇒ Cheap, but potentially slow In memory ⇒ Fast, but limited graph size Cluster Out-of-core ⇒ Large graphs, but expensive & slow In memory ⇒ Large graphs & fast, but very expensive Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core ⇒ Cheap, but potentially slow In memory ⇒ Fast, but limited graph size Cluster Out-of-core ⇒ Large graphs, but expensive & slow In memory ⇒ Large graphs & fast, but very expensive ⇒ Single machine, out-of-core is most cost-effective ⇒ Goal: Good performance and large graphs! Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21 Mosaic: Design goals Goal Run algorithms on very large graphs on a single machine using coprocessors Enabled by: Common, familiar API (vertex/edge-centric) Encoding: Lossless compression Cache locality Processing on isolated subgraphs Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 7 / 21 Architecture of Mosaic Usage of Xeon Phi & NVMe Involvement of Host Global vertex state <current state> ... Host Processors (Xeon) fetch <next state> ... receive ... per Xeon Phi (×4) Meta ...transfer (×6) stripped PCIe I1 I2 ... T1 T2 ... NVMe Steffen Maass Tile transfer edge processing ... (×61 cores) Xeon Phi Mosaic: Trillion Edges on a Single Machine April 26, 2017 8 / 21 Graph encoding: Idea Compression Split graph into subgraphs, use local (short) identifiers Cache locality Inside subgraphs: Sort by access order Between subgraphs: Overlap vertex sets Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 9 / 21 Background: Column first Locality for write Multiple sequential reads Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P11 4 5 6 P21 7 8 9 P31 10 11 12 P41 Source vertex P12 P13 P14 P22 P23 P24 P32 P33 P34 P42 P43 P44 Global adjacency Partition matrix (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 10 / 21 Background: Row first Locality for read Multiple sequential writes Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P11 4 5 6 P21 7 8 9 P31 10 11 12 P41 Source vertex P12 P13 P14 P22 P23 P24 P32 P33 P34 P42 P43 P44 Global adjacency Partition matrix (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 11 / 21 Background: Hilbert order Space-filling curve Provides locality between adjacent data points Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P11 4 5 6 P21 7 8 9 P31 10 11 12 P41 Source vertex P12 P13 P14 P22 P23 P24 P32 P33 P34 P42 P43 P44 Global adjacency Partition matrix (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 12 / 21 From global to local: Tiles Convert graph to set of tiles 1) Start with adjacency Matrix: ➋ ⑥ ➏ ⑤ ① ➊ ② ➑ ➍ ➒ ➎ ➌④➐ Steffen Maass ③ Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 ➊ ➋ 2 ➍ 3 P11 P13 P12 P14 4 ➌➎ ➐ 5 ➒ ➏ ➑ P22 P23 6 P21 P24 7 8 P34 P32 P33 9 P31 10 11 P43 P42 P44 12 P41 Source Global adjacency Partition vertex matrix (S = 3) Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21 From global to local: Tiles Convert graph to set of tiles 2) Use first edge in tile T1 : ➋ ⑥ ➏ ⑤ ① ➊ ② ➑ ➍ ➒ ➎ ➌④➐ Steffen Maass ③ Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T1) ①➊ 1 ➊ ➋ 2 ➍ ④ ② 3 P11 P13 P12 P14 meta (I1) 4 ➐ ➌➎ ③ (①,1) 5 (local) (②,2) ➒ ➏ ➑ P22 P23 6 P21 P24 7 8 P34 P32 P33 9 P31 10 11 P43 P42 P44 12 P41 Source ① : local vertex id Global adjacency Partition ( ,1) : local → global id vertex ① matrix (global) (S = 3) ➊ : local edge store order Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21 From global to local: Tiles Convert graph to set of tiles 3) Consume as many edges as possible: ➋ ⑥ ➏ ⑤ ① ➊ ② ➑ ➍ ➒ ➎ ➌④➐ Steffen Maass ③ Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T1) ①➊ 1 ➊ ➋ ➋ 2 ➍ ④ ② ➍ 3 P11 P13 P12 P14 meta (I1) ➌ 4 ➐ ➌➎ ③ (①,1) (③,5) 5 (local) (②,2) (④,4) ➒ ➏ ➑ P22 P23 6 P21 P24 7 8 P34 P32 P33 9 P31 10 11 P43 P42 P44 12 P41 Source ① : local vertex id Global adjacency Partition ( ,1) : local → global id vertex ① matrix (global) (S = 3) ➊ : local edge store order Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21 From global to local: Tiles Convert graph to set of tiles 4) Next edges do not fit in T1 , construct T2 : ➋ ⑥ ➏ ⑤ ① ➊ ② ➑ ➍ ➒ ➎ ➌④➐ Steffen Maass ③ Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T1) ①➊ 1 ➊ ➋ ➋ 2 ➍ ④ ② ➍ 3 P11 P13 P12 P14 meta (I1) ➌ 4 ➐ ➌➎ ③ (①,1) (③,5) 5 (local) (②,2) (④,4) ➒ ➏ ➑ P22 P23 6 P21 P24 Tile-2 (T2) 7 ①➎ ➐ 8 ④ ② P 34 P32 P33 ➑ 9 P31 ➏ ➒ 10 ③ meta (I2) (①,4) (③,5) 11 (local) (②,6) (④,3) P43 P42 P44 12 P41 Source ① : local vertex id Global adjacency Partition ( ,1) : local → global id vertex ① matrix (global) (S = 3) ➊ : local edge store order Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21 Locality with Hilbert-ordered tiles Overlapping sets of sources and targets Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 ➊ ➋ 2 ➍ 3 P11 P13 P12 P14 4 ➐ ➌➎ 5 ➒ ➏ ➑ P22 P23 6 P21 P24 7 8 P34 P32 P33 9 P31 10 11 P43 P42 P44 12 P41 Source Global adjacency Partition vertex matrix (S = 3) ⇒ Better locality than row-first or column-first Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 14 / 21 API: Pagerank example Pull: Gather per edge information Reduce: Combine results from multiple subgraphs Apply: Calculate non-associative regularization 1 // On edge processor (co-processor) 2 // Edge e = (Vertex src, Vertex tgt) 3 def Pull(Vertex src, Vertex tgt): 4 return src.val / src.out_degree 5 // On edge processor/global reducers (both) 6 def Reduce(Vertex v1, Vertex v2): 7 return v1.val + v2.val 8 // On global reducers (host) 9 def Apply(Vertex v): 10 v.val = (1 - α) + α × v.val Global graph processing Local graph processing on Tile Edge-centric operation Vertex-centric operation Formula: Pagerankv = α ∗ Steffen Maass P u∈Neighborhood(v ) Pageranku degreeu Mosaic: Trillion Edges on a Single Machine + (1 − α) April 26, 2017 15 / 21 Evaluation: Preprocessing Mosaic needs explicit preprocessing step 2-4 min for small datasets, 51 minutes for webgraph, 31 hours for trillion edges But: Can be amortized during execution: GridGraph: Mosaic faster after twitter: 20 iterations uk2007: 8 iterations X-Stream: Mosaic faster after twitter: 8 iterations uk2007: 5 iterations Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 16 / 21 Evaluation: Size of datasets Hilbert-ordered tiles allow efficient encoding of local graphs Effect: up to 68% reduction in data size Graph #vertices #edges Raw data rmat24 twitter ? rmat27 uk2007-05 hyperlink14 16.8 M 41.6 M 134.2 M 105.8 M 1,724.6 M 0.3 B 1.5 B 2.1 B 3.7 B 64.4 B 2.0 GB 10.9 GB 16.0 GB 27.9 GB 480.0 GB ? 4,294.9 M 1,000.0 B 8,000.0 GB ? rmat-trillion Steffen Maass Mosaic: Trillion Edges on a Single Machine Mosaic size (red.) 1.1 GB 7.7 GB 11.1 GB 8.7 GB 152.4 GB (−45.0%) (−29.4%) (−30.6%) (−68.8%) (−68.3%) 4,816.7 GB (−39.8%) April 26, 2017 17 / 21 Hilbert-ordered tiles: Cache locality Cache misses and execution times for three different strategies 100 35 Hilbert Row-First Column-First 30 25 Runtime (s) Cache Misses (%) 80 60 40 20 15 10 20 5 0 Pagerank BFS WCC 0 Pagerank BFS WCC ⇒ Hilbert-ordered tiles have up to 45% better cache locality, up to 43% reduction in runtime Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 18 / 21 Performance comparison Comparison to other single machine engines with Pagerank: Runtime (seconds) Mosaic 100 GridGraph X-Stream GraphChi 80 60 40 20 0 Steffen Maass rmat24 twitter rmat27 Mosaic: Trillion Edges on a Single Machine uk2007-05 April 26, 2017 19 / 21 Performance comparison Comparison to other single machine engines with Pagerank: log Runtime (seconds) 100 Mosaic GridGraph X-Stream GraphChi 10 1 0.1 rmat24 twitter rmat27 uk2007-05 ⇒ Mosaic outperforms other system by 2.7×to 58.6× Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 19 / 21 Conclusion Mosaic, a graph processing engine for trillion edge graphs on a single machine Hilbert-ordered tiles allow: Enable localized processing on coprocessors Optimizes cache locality Enables compression Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 20 / 21 Thank you! Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 21 / 21
© Copyright 2026 Paperzz