Mosaic: Processing a Trillion-Edge Graph on a Single

Mosaic: Processing a Trillion-Edge Graph on a Single
Machine
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang,
Mohan Kumar, Taesoo Kim
Georgia Institute of Technology
Best Student Paper
April 26, 2017
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
1 / 21
Large-scale graph processing is ubiquitous
Social networks
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
2 / 21
Large-scale graph processing is ubiquitous
Social networks
Genome analysis
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
2 / 21
Large-scale graph processing is ubiquitous
Social networks
Genome analysis
Graphs enable Machine Learning
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
2 / 21
Powerful, heterogeneous machines
Terabytes of RAM on multiple
sockets
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
3 / 21
Powerful, heterogeneous machines
Terabytes of RAM on multiple
sockets
Powerful many-core coprocessors
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
3 / 21
Powerful, heterogeneous machines
Terabytes of RAM on multiple
sockets
Powerful many-core coprocessors
Fast, large-capacity Non-volatile Memory
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
3 / 21
Powerful, heterogeneous machines
Take advantage of heterogeneous machine to process tera-scale graphs
Terabytes of RAM on multiple
sockets
Powerful many-core coprocessors
Fast, large-capacity Non-volatile Memory
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
3 / 21
Table of contents
1
Graph Processing: Sample Application
2
Design
Mosaic Architecture
Graph Encoding
API
3
Evaluation
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
4 / 21
Graph Processing: Applications
Community Detection
Find Common Friends
Find Shortest Paths
Estimate Impact of Vertices (webpages, users, . . . )
...
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
5 / 21
Mosaic: Design space
Graph Processing has many faces:
Single Machine
Out-of-core
In memory
Cluster
Out-of-core
In memory
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
6 / 21
Mosaic: Design space
Graph Processing has many faces:
Single Machine
Out-of-core ⇒ Cheap, but potentially slow
In memory ⇒ Fast, but limited graph size
Cluster
Out-of-core ⇒ Large graphs, but expensive & slow
In memory ⇒ Large graphs & fast, but very expensive
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
6 / 21
Mosaic: Design space
Graph Processing has many faces:
Single Machine
Out-of-core ⇒ Cheap, but potentially slow
In memory ⇒ Fast, but limited graph size
Cluster
Out-of-core ⇒ Large graphs, but expensive & slow
In memory ⇒ Large graphs & fast, but very expensive
⇒ Single machine, out-of-core is most cost-effective
⇒ Goal: Good performance and large graphs!
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
6 / 21
Mosaic: Design goals
Goal
Run algorithms on very large graphs on a single machine using coprocessors
Enabled by:
Common, familiar API (vertex/edge-centric)
Encoding: Lossless compression
Cache locality
Processing on isolated subgraphs
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
7 / 21
Architecture of Mosaic
Usage of Xeon Phi & NVMe
Involvement of Host
Global
vertex state
<current state>
...
Host
Processors
(Xeon)
fetch
<next state>
...
receive
...
per Xeon Phi
(×4)
Meta
...transfer
(×6)
stripped
PCIe
I1
I2
...
T1 T2
...
NVMe
Steffen Maass
Tile
transfer
edge
processing
...
(×61 cores)
Xeon Phi
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
8 / 21
Graph encoding: Idea
Compression
Split graph into subgraphs, use local (short) identifiers
Cache locality
Inside subgraphs: Sort by access order
Between subgraphs: Overlap vertex sets
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
9 / 21
Background: Column first
Locality for write
Multiple sequential reads
Target vertex
1 2 3 4 5 6 7 8 9 10 11 12
1
2
3 P11
4
5
6 P21
7
8
9 P31
10
11
12 P41
Source
vertex
P12
P13
P14
P22
P23
P24
P32
P33
P34
P42
P43
P44
Global adjacency
Partition
matrix
(S = 3)
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
10 / 21
Background: Row first
Locality for read
Multiple sequential writes
Target vertex
1 2 3 4 5 6 7 8 9 10 11 12
1
2
3 P11
4
5
6 P21
7
8
9 P31
10
11
12 P41
Source
vertex
P12
P13
P14
P22
P23
P24
P32
P33
P34
P42
P43
P44
Global adjacency
Partition
matrix
(S = 3)
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
11 / 21
Background: Hilbert order
Space-filling curve
Provides locality between adjacent data points
Target vertex
1 2 3 4 5 6 7 8 9 10 11 12
1
2
3 P11
4
5
6 P21
7
8
9 P31
10
11
12 P41
Source
vertex
P12
P13
P14
P22
P23
P24
P32
P33
P34
P42
P43
P44
Global adjacency
Partition
matrix
(S = 3)
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
12 / 21
From global to local: Tiles
Convert graph to set of tiles
1) Start with adjacency Matrix:
➋
⑥
➏
⑤
①
➊
②
➑
➍
➒
➎
➌④➐
Steffen Maass
③
Target vertex
1 2 3 4 5 6 7 8 9 10 11 12
1
➊
➋
2
➍
3 P11
P13
P12
P14
4
➌➎
➐
5
➒
➏
➑ P22
P23
6 P21
P24
7
8
P34
P32
P33
9 P31
10
11
P43
P42
P44
12 P41
Source
Global adjacency Partition
vertex
matrix
(S = 3)
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
13 / 21
From global to local: Tiles
Convert graph to set of tiles
2) Use first edge in tile T1 :
➋
⑥
➏
⑤
①
➊
②
➑
➍
➒
➎
➌④➐
Steffen Maass
③
Target vertex (global)
1 2 3 4 5 6 7 8 9 10 11 12
Tile-1 (T1)
①➊
1
➊
➋
2
➍
④
②
3 P11
P13
P12
P14
meta (I1)
4
➐
➌➎
③ (①,1)
5
(local) (②,2)
➒
➏
➑ P22
P23
6 P21
P24
7
8
P34
P32
P33
9 P31
10
11
P43
P42
P44
12 P41
Source
① : local vertex id
Global adjacency Partition ( ,1) : local → global id
vertex
①
matrix
(global)
(S = 3)
➊ : local edge store order
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
13 / 21
From global to local: Tiles
Convert graph to set of tiles
3) Consume as many edges as possible:
➋
⑥
➏
⑤
①
➊
②
➑
➍
➒
➎
➌④➐
Steffen Maass
③
Target vertex (global)
1 2 3 4 5 6 7 8 9 10 11 12
Tile-1 (T1)
①➊
1
➊
➋
➋
2
➍
④
②
➍
3 P11
P13
P12
P14
meta (I1)
➌
4
➐
➌➎
③ (①,1) (③,5)
5
(local) (②,2) (④,4)
➒
➏
➑ P22
P23
6 P21
P24
7
8
P34
P32
P33
9 P31
10
11
P43
P42
P44
12 P41
Source
① : local vertex id
Global adjacency Partition ( ,1) : local → global id
vertex
①
matrix
(global)
(S = 3)
➊ : local edge store order
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
13 / 21
From global to local: Tiles
Convert graph to set of tiles
4) Next edges do not fit in T1 , construct T2 :
➋
⑥
➏
⑤
①
➊
②
➑
➍
➒
➎
➌④➐
Steffen Maass
③
Target vertex (global)
1 2 3 4 5 6 7 8 9 10 11 12
Tile-1 (T1)
①➊
1
➊
➋
➋
2
➍
④
②
➍
3 P11
P13
P12
P14
meta (I1)
➌
4
➐
➌➎
③ (①,1) (③,5)
5
(local) (②,2) (④,4)
➒
➏
➑ P22
P23
6 P21
P24
Tile-2 (T2)
7
①➎
➐
8
④
②
P
34
P32
P33
➑
9 P31
➏
➒
10
③ meta (I2)
(①,4) (③,5)
11
(local) (②,6) (④,3)
P43
P42
P44
12 P41
Source
① : local vertex id
Global adjacency Partition ( ,1) : local → global id
vertex
①
matrix
(global)
(S = 3)
➊ : local edge store order
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
13 / 21
Locality with Hilbert-ordered tiles
Overlapping sets of sources and targets
Target vertex
1 2 3 4 5 6 7 8 9 10 11 12
1
➊
➋
2
➍
3 P11
P13
P12
P14
4
➐
➌➎
5
➒
➏
➑ P22
P23
6 P21
P24
7
8
P34
P32
P33
9 P31
10
11
P43
P42
P44
12 P41
Source
Global adjacency Partition
vertex
matrix
(S = 3)
⇒ Better locality than row-first or column-first
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
14 / 21
API: Pagerank example
Pull: Gather per edge information
Reduce: Combine results from multiple subgraphs
Apply: Calculate non-associative regularization
1 // On edge processor (co-processor)
2 // Edge e = (Vertex src, Vertex tgt)
3 def Pull(Vertex src, Vertex tgt):
4
return src.val / src.out_degree
5 // On edge processor/global reducers (both)
6 def Reduce(Vertex v1, Vertex v2):
7
return v1.val + v2.val
8 // On global reducers (host)
9 def Apply(Vertex v):
10
v.val = (1 - α) + α × v.val
Global graph
processing
Local graph
processing on Tile
Edge-centric operation
Vertex-centric operation
Formula: Pagerankv = α ∗
Steffen Maass
P
u∈Neighborhood(v )
Pageranku
degreeu
Mosaic: Trillion Edges on a Single Machine
+ (1 − α)
April 26, 2017
15 / 21
Evaluation: Preprocessing
Mosaic needs explicit preprocessing step
2-4 min for small datasets, 51 minutes for webgraph, 31 hours for
trillion edges
But: Can be amortized during execution:
GridGraph: Mosaic faster after
twitter: 20 iterations
uk2007: 8 iterations
X-Stream: Mosaic faster after
twitter: 8 iterations
uk2007: 5 iterations
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
16 / 21
Evaluation: Size of datasets
Hilbert-ordered tiles allow efficient encoding of local graphs
Effect: up to 68% reduction in data size
Graph
#vertices
#edges
Raw data
rmat24
twitter
?
rmat27
uk2007-05
hyperlink14
16.8 M
41.6 M
134.2 M
105.8 M
1,724.6 M
0.3 B
1.5 B
2.1 B
3.7 B
64.4 B
2.0 GB
10.9 GB
16.0 GB
27.9 GB
480.0 GB
?
4,294.9 M
1,000.0 B
8,000.0 GB
?
rmat-trillion
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
Mosaic size (red.)
1.1 GB
7.7 GB
11.1 GB
8.7 GB
152.4 GB
(−45.0%)
(−29.4%)
(−30.6%)
(−68.8%)
(−68.3%)
4,816.7 GB (−39.8%)
April 26, 2017
17 / 21
Hilbert-ordered tiles: Cache locality
Cache misses and execution times for three different strategies
100
35
Hilbert
Row-First
Column-First
30
25
Runtime (s)
Cache Misses (%)
80
60
40
20
15
10
20
5
0
Pagerank
BFS
WCC
0
Pagerank BFS
WCC
⇒ Hilbert-ordered tiles have up to 45% better cache locality,
up to 43% reduction in runtime
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
18 / 21
Performance comparison
Comparison to other single machine engines with Pagerank:
Runtime (seconds)
Mosaic
100 GridGraph
X-Stream
GraphChi
80
60
40
20
0
Steffen Maass
rmat24
twitter
rmat27
Mosaic: Trillion Edges on a Single Machine
uk2007-05
April 26, 2017
19 / 21
Performance comparison
Comparison to other single machine engines with Pagerank:
log Runtime (seconds)
100
Mosaic
GridGraph
X-Stream
GraphChi
10
1
0.1
rmat24
twitter
rmat27
uk2007-05
⇒ Mosaic outperforms other system by 2.7×to 58.6×
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
19 / 21
Conclusion
Mosaic, a graph processing engine for trillion edge graphs on a single
machine
Hilbert-ordered tiles allow:
Enable localized processing on coprocessors
Optimizes cache locality
Enables compression
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
20 / 21
Thank you!
Steffen Maass
Mosaic: Trillion Edges on a Single Machine
April 26, 2017
21 / 21