Massive Graph Triangulation by X. Hu, Y. Tao, and C. Chung, SIGMOD’13 Ilias Giechaskiel Cambridge University, R212 [email protected] February 21, 2014 Conclusions Takeaway Messages I Triangle listing important input for graph properties I I/O becomes bottleneck for massive graphs I I Obvious approach doesn’t work MGT algorithm I I I Total order of vertices guarantees unique triangle orientation Near optimal asymptotic I/O + CPU performance Much faster than alternatives in practice Ilias Giechaskiel [email protected] Massive Graph Triangulation 2 / 19 Triangle Listing Definition Given a graph G = (V , E ), list exactly once all ∆v1 v2 v3 = {v1 , v2 , v3 } such that vi ∈ V and (vi , vj ) ∈ E Motivation I Triangle = shortest non-trivial cycle and clique I Various metrics I I I I Dense neighborhood discovery Triangular connectivity k-truss Clustering coefficient Ilias Giechaskiel [email protected] Massive Graph Triangulation 3 / 19 In-Memory Triangle Listing [CC12] The Algorithm procedure list(G ) ∆(G ) ← ∅ loop u ∈ V loop v ∈ adjG (u) & v > u loop w ∈ adjG (u) ∩ adjG (v ) & w > v ∆(G ) ← ∆(G ) ∪ {∆uvw } return ∆(G ) The Problem I Random access to adjG (v ) for v ∈ adjG (u) I O (|E | · scan(dmax )) I/Os in the worst case I I When it doesn’t fit in the memory of size M Recall: scan(N) = Θ(N/B) where B is the disk block size Ilias Giechaskiel [email protected] Massive Graph Triangulation 4 / 19 Motivation Previous Approaches I External Memory Compact Forward (EM-CF) I I I I External Memory Node Iterator (EM-NI) I I I I O |E | + |E |1.5 /B I/Os |E | I/O reads Output insensitive O |E |1.5 /B · logM/B (|E |/B) I/Os Almost insensitive to M Output insensitive Graph Partition [CC12] I I I I O |E |2 /(MB) + K /B I/Os where K triangles p In practice, M > |E | If M = c|E |, asymptotically optimal But under a set of assumptions... Ilias Giechaskiel [email protected] Massive Graph Triangulation 5 / 19 Contributions This Approach I I O |E |2 /(MB) + K /B I/Os in all settings O |E | log |E | + |E |2 /M + α|E | CPU time I α is the arboricity of the graph I Both optimal up to constants I Key idea: total order for unique triangle orientation I Side note: also improves analysis of previous work Ilias Giechaskiel [email protected] Massive Graph Triangulation 6 / 19 Orienting G Defining G ∗ I Define ≺ on V by u ≺ v iff I I I G ∗ is G with edges oriented by ≺ I I I d(u) < d(v ) or d(u) = d(v ) and id(u) < id(v ) Is a total order Takes O (sort(|E |)) I/Os Recall: sort(N) = Θ N/BlogM/B N/B Every triangle {u, v , w } has unique orientation u ≺ v ≺ w Ilias Giechaskiel [email protected] Massive Graph Triangulation 7 / 19 The Algorithm Initial Idea 1. Load next cM edges of G ∗ into memory (Emem ) I All-or-nothing requirement (small-degree assumption) 2. Find all triangle with pivot edges in Emem Ilias Giechaskiel [email protected] Massive Graph Triangulation 8 / 19 The Algorithm Step 2 (Initial) procedure list(G , Emem ) loop u ∈ V Vmem (u) ← N + (u) ∩ Vmem Find triangles with u cone in Emem (u) ∪ Emem Ilias Giechaskiel [email protected] Massive Graph Triangulation 9 / 19 The Algorithm Step 2 (Details) procedure list(G ∗ , Emem ) Build hash structures loop u ∈ V Vmem (u) ← N + (u) ∩ Vmem + (u) loop v ∈ Vmem loop w ∈ Vmem (u) if v 6= w & (v , w ) ∈ Emem then Output ∆uvw Ilias Giechaskiel [email protected] Massive Graph Triangulation 10 / 19 The Algorithm Analysis I O |E |2 /(MB) + K /B I/O I I I I O |E | log |E | + |E |2 /M + α|E | CPU I I I I I I Θ (|E |/M) iterations O (|E |/B) I/Os for scanning O (K /B) for listing O (|E | log |E |) for G ∗ sorting Θ (|E |/M) iterations + O (|N + (u)| + |N + (u)| · |Vmem (u)|) + Σ|N (u)| = |E | Σv ∈V d + (v )2 = O(α|E |) Optimality comes from considering the complete graph Ilias Giechaskiel [email protected] Massive Graph Triangulation 11 / 19 The Algorithm Small-Degree Assumption I What if ∃v such that d + (v ) > cM/2? 1. 2. 3. 4. 5. Find one Load a set S of cM/2 of its out-edges Report all triangles involving one of the edges in S Remove S from the graph Repeat Ilias Giechaskiel [email protected] Massive Graph Triangulation 12 / 19 The Algorithm Small-Degree Assumption I How to implement step 3 I I I I Create hash table of loaded vertices Scan all |E | edges Also scan N(v ) for each v 6= u with u ∈ N(v ) Does not change complexity Ilias Giechaskiel [email protected] Massive Graph Triangulation 13 / 19 Evaluation Experimental Setup I 8GB memory (but memory conscious) I Graphs unoriented Real data I I I I I I I 364MB to 7.5GB 4.8 to 165 million vertices 28 to 938 million edges |E |/|V | from 1.2 to 15.1 Varied M from 5% to 25% of disk size Synthetic data I I I Random, Recursive Matrix, Small World m = 16n, n from 16 to 80 million 2.1GB to 10.6GB Ilias Giechaskiel [email protected] Massive Graph Triangulation 14 / 19 Evaluation Real Data I MGT always better for CPU I MGT almost always better for I/O I RGP higher hidden constant in complexity! Ilias Giechaskiel [email protected] Massive Graph Triangulation 15 / 19 Evaluation Ilias Giechaskiel [email protected] Massive Graph Triangulation 16 / 19 Evaluation Criticism I I/O analysis excludes cost of sorting I Algorithm does not exploit parallelism I I I I Is inherently sequential Not applicable to distributed environment Or across cores RGP ideas applied in this case [PC13] I Block I/O model for SSDs and parallel environment? I Behavior for large-degree vertices I Experiments lacking when M bigger percentage of graph Ilias Giechaskiel [email protected] Massive Graph Triangulation 17 / 19 Conclusions Key Insights I Total order of vertices guarantees unique triangle orientation I Key idea simple, but multiple tricks I Near optimal asymptotic I/O + CPU performance I Much faster than alternatives in practice Key Questions I Can you parallelize the algorithms non-trivially on a single PC? I How can you extend the I/O model to different environments? I How can you minimize data transfers in a distr. environment? I Your questions? Ilias Giechaskiel [email protected] Massive Graph Triangulation 18 / 19 Bibliography I Shumo Chu and James Cheng, Triangle listing in massive networks, ACM Trans. Knowl. Discov. Data 6 (2012), no. 4, 17:1–17:32. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung, Massive graph triangulation, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, NY, USA), SIGMOD ’13, ACM, 2013, pp. 325–336. Ha-Myung Park and Chin-Wan Chung, An efficient mapreduce algorithm for counting triangles in a very large graph, Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management (New York, NY, USA), CIKM ’13, ACM, 2013, pp. 539–548. Ilias Giechaskiel [email protected] Massive Graph Triangulation 19 / 19
© Copyright 2026 Paperzz