Triangle Counting in Large Sparse Graph Meng-Tsung Tsai [email protected] Triangle Counting in Large Sparse Graph – p.1/31 Problem Setting Triangle Counting in Large Sparse Graph – p.2/31 Problem Setting(1/3) Goal: Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m. Triangle Counting in Large Sparse Graph – p.3/31 Problem Setting(1/3) Goal: Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m. Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend. Triangle Counting in Large Sparse Graph – p.3/31 Problem Setting(1/3) Goal: Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m. Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend. Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one. Triangle Counting in Large Sparse Graph – p.3/31 Problem Setting(1/3) Goal: Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m. Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend. Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one. In terms of graph theory, 3 × number of triangles ∈ G CC(G) = . number of triples ∈ G Triangle Counting in Large Sparse Graph – p.3/31 Problem Setting(1/3) Goal: Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m. Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend. Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one. In terms of graph theory, 3 × number of triangles ∈ G CC(G) = . number of triples ∈ G triple u u TTu triangle u uTTu Triangle Counting in Large Sparse Graph – p.3/31 Problem Setting(2/3) Example: z` ` z T ` T Tz z Triangle Counting in Large Sparse Graph – p.4/31 Problem Setting(2/3) Example: z` ` z T ` T Tz z number of triangle = 2 Triangle Counting in Large Sparse Graph – p.4/31 Problem Setting(2/3) Example: z` ` z T ` T Tz z number of triangle = 2 number of triple = 8 Triangle Counting in Large Sparse Graph – p.4/31 Problem Setting(2/3) Example: z` ` z T ` T Tz z number of triangle = 2 number of triple = 8 cluster coefficient = 3 × 2 / 8 = 0.75 Triangle Counting in Large Sparse Graph – p.4/31 Problem Setting(2/3) Example: z` ` z T ` T Tz z number of triangle = 2 number of triple = 8 cluster coefficient = 3 × 2 / 8 = 0.75 Triple counting is easy; therefore, the main difficulty to calculate CC(G) is triangle counting. Triangle Counting in Large Sparse Graph – p.4/31 Problem Setting(3/3) Requirement: Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3 ) time. Triangle Counting in Large Sparse Graph – p.5/31 Problem Setting(3/3) Requirement: Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3 ) time. We focus on social network graphs which cluster coefficient is especially important in. Triangle Counting in Large Sparse Graph – p.5/31 Problem Setting(3/3) Requirement: Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3 ) time. We focus on social network graphs which cluster coefficient is especially important in. In social network, the fact that m = ω(n2 ) usually holds. Triangle Counting in Large Sparse Graph – p.5/31 Triangle Counting (Trivial Algorithm) Triangle Counting in Large Sparse Graph – p.6/31 Trivial Algorithm z T T v u Tz z Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = z T T Tz z Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = z T T Tz z Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists. Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = z T T Tz z Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists. 2 mean? Let M 2 be M · M . What does Mi,j Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = z T T Tz z Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists. 2 mean? Let M 2 be M · M . What does Mi,j 1P 2 Mi,j · Mi,j △= 6 Triangle Counting in Large Sparse Graph – p.7/31 Trivial Algorithm z T T v u Tz z + u z vz = z T T Tz z Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists. 2 mean? Let M 2 be M · M . What does Mi,j 1P 2 Mi,j · Mi,j △= 6 Simple Matrix Multiplication, Strassen Algorithm, and Winograd Algorithm all require O(n2 ) space to obtain M 2 . Not Acceptable! Triangle Counting in Large Sparse Graph – p.7/31 Triangle Counting (Forward Algorithm) Triangle Counting in Large Sparse Graph – p.8/31 Forward Algorithm(1/2) ~` ` T ````~ T T T T T~ ~ Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 ~` 4 ` T ````~ T T T 1 T 3 T~ ~ Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} ~` 4 {1, 2, 3} ` T ````~ T T T 1 {} T 3 {2} T~ ~ Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu ∩ Nv | Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu ∩ Nv | all triangles can be found Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu ∩ Nv | all triangles can be found + all found objects are triangles Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~ 1 ` ``` ``~ T T T T 4 T 3 T~ ~ ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu 2 ∩ Nv | all triangles can be found + all found objects are triangles Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~ 1 {} ` ``` ``~ T T T T 4 {1, 2} T 3 {1, 2} T~ ~ ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu 2 {1} ∩ Nv | all triangles can be found + all found objects are triangles Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(1/2) 2 {1} {1} ∩ {1, 2, 3} = {1} 4 {1, 2, 3} ~ 1 {} ` ``` ``~ T T T T 4 {1, 2} T 3 {1, 2} T~ ~ ~` ` T ````~ T T T 1 {} T 3 {2} T~ ~ △= P edge(u,v)∈E |Nu 2 {1} ∩ Nv | all triangles can be found + all found objects are triangles time: O(m · d(G)), space: Θ(m) Triangle Counting in Large Sparse Graph – p.9/31 Forward Algorithm(2/2) Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is. Triangle Counting in Large Sparse Graph – p.10/31 Forward Algorithm(2/2) Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is. √ √ If degree of vertex v ≤ 2m, |Nv | ≤ 2m. Triangle Counting in Large Sparse Graph – p.10/31 Forward Algorithm(2/2) Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is. √ √ If degree of vertex v ≤ 2m, |Nv | ≤ 2m. If degree of vertex v >= k, at most 2m/k vertices with √ higher degree. Thus, |Nv | <= 2m where √ deg(v) ≥ 2m. Triangle Counting in Large Sparse Graph – p.10/31 Forward Algorithm(2/2) Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is. √ √ If degree of vertex v ≤ 2m, |Nv | ≤ 2m. If degree of vertex v >= k, at most 2m/k vertices with √ higher degree. Thus, |Nv | <= 2m where √ deg(v) ≥ 2m. There exists another algorithm to find the optimum solution of d(G) in O(m) time. Triangle Counting in Large Sparse Graph – p.10/31 Triangle Counting (Four Russians’ Algorithm) Triangle Counting in Large Sparse Graph – p.11/31 Four-Russians’ Algorithm {1, 0, 1, 1, . . .} {0, 1, 0, 0, . . .} ... Triangle Counting in Large Sparse Graph – p.12/31 Four-Russians’ Algorithm sector z}|{ { 1, 0, 1, 1, . . .} {0, 1, 0, 0, . . .} ... {2, 3, . . .} {1, 0, . . .} ... Triangle Counting in Large Sparse Graph – p.12/31 Four-Russians’ Algorithm sector z}|{ { 1, 0, 1, 1, . . .} {0, 1, 0, 0, . . .} ... 0 1 2 3 0 0 0 0 0 1 0 1 0 1 2 0 0 1 1 {2, 3, . . .} {1, 0, . . .} ... 3 0 1 1 2 Triangle Counting in Large Sparse Graph – p.12/31 Four-Russians’ Algorithm sector z}|{ { 1, 0, 1, 1, . . .} {0, 1, 0, 0, . . .} ... 0 1 2 3 0 0 0 0 0 1 0 1 0 1 2 0 0 1 1 {2, 3, . . .} {1, 0, . . .} ... 3 0 1 1 2 The table utilized in Four-Russians’ Algorithm is 2log n by 2log n . Thus, its speedup is O(log n). Triangle Counting in Large Sparse Graph – p.12/31 Triangle Counting (FFR Algorithm) Triangle Counting in Large Sparse Graph – p.13/31 FFR Algorithm P The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward Algorithm can be sped up with Four-Russians’ Algorithm. Triangle Counting in Large Sparse Graph – p.14/31 FFR Algorithm P The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward Algorithm can be sped up with Four-Russians’ Algorithm. Let the length of sectors be 12 log m, additional space for table is Θ(m). Triangle Counting in Large Sparse Graph – p.14/31 FFR Algorithm P The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward Algorithm can be sped up with Four-Russians’ Algorithm. Let the length of sectors be 12 log m, additional space for table is Θ(m). The pnumber of non-all-zero sectors p in Nv is O( m/ log m) where deg(v) ≤ m/ log m. Triangle Counting in Large Sparse Graph – p.14/31 FFR Algorithm P The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward Algorithm can be sped up with Four-Russians’ Algorithm. Let the length of sectors be 12 log m, additional space for table is Θ(m). The pnumber of non-all-zero sectors p in Nv is O( m/ log m) where deg(v) ≤ m/ log m. The p in Nv is pnumber of non-all-zero sectors O( m/ log m) where deg(v) ≥ m/ log m. Triangle Counting in Large Sparse Graph – p.14/31 FFR Algorithm P The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward Algorithm can be sped up with Four-Russians’ Algorithm. Let the length of sectors be 12 log m, additional space for table is Θ(m). The pnumber of non-all-zero sectors p in Nv is O( m/ log m) where deg(v) ≤ m/ log m. The p in Nv is pnumber of non-all-zero sectors O( m/ log m) where deg(v) ≥ m/ log m. FFR needs O(m3/2 / log1/2 m) time. Triangle Counting in Large Sparse Graph – p.14/31 CPU Instruction versus Memory Access Triangle Counting in Large Sparse Graph – p.15/31 Instruction versus Memory(1/3) The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than that of memory access. Triangle Counting in Large Sparse Graph – p.16/31 Instruction versus Memory(1/3) The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than that of memory access. "logical and" C = A ˚ ∧ B, Ci = min(Ai , Bi ) Triangle Counting in Large Sparse Graph – p.16/31 Instruction versus Memory(1/3) The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than that of memory access. "logical and" C = A ˚ ∧ B, Ci = min(Ai , Bi ) Pg "population count" d = σ̊ A, d = i=1 Ai Triangle Counting in Large Sparse Graph – p.16/31 Instruction versus Memory(2/3) 4 wall time (second per 10,000 runs) ALGO 5 ALGO 2 with p= 8 ALGO 2 with p=16 3.5 3 2.5 2 0 10 20 30 40 50 60 bit density (x out of 64 bits are 1) Triangle Counting in Large Sparse Graph – p.17/31 Instruction versus Memory(2/3) 30 ALGO 2 with p= 8 ALGO 2 with p=16 ALGO 2 with p=22 wall time (second per 10,000 runs) 25 20 15 10 5 0 0 10 20 30 40 50 60 bit density (x out of 64 bits are 1) Triangle Counting in Large Sparse Graph – p.17/31 Instruction versus Memory(3/3) CPU instructions can handle sectors of size g, where g is the length of CPU register. Triangle Counting in Large Sparse Graph – p.18/31 Instruction versus Memory(3/3) CPU instructions can handle sectors of size g, where g is the length of CPU register. Is g a constant in the analysis of algorithm? Triangle Counting in Large Sparse Graph – p.18/31 Instruction versus Memory(3/3) CPU instructions can handle sectors of size g, where g is the length of CPU register. Is g a constant in the analysis of algorithm? Are all instructions O(1)-executable? Triangle Counting in Large Sparse Graph – p.18/31 Is g a constant? Triangle Counting in Large Sparse Graph – p.19/31 Is g a constant? Triangle Counting in Large Sparse Graph – p.20/31 Is g a constant? Assume a program executed on M , a random access machine, using Θ(S) memory space. Triangle Counting in Large Sparse Graph – p.20/31 Is g a constant? Assume a program executed on M , a random access machine, using Θ(S) memory space. Θ(S) memory address is required. Triangle Counting in Large Sparse Graph – p.20/31 Is g a constant? Assume a program executed on M , a random access machine, using Θ(S) memory space. Θ(S) memory address is required. The length of the registers in M is Ω(log S). Triangle Counting in Large Sparse Graph – p.20/31 Are all instructions O(1)-executable? Triangle Counting in Large Sparse Graph – p.21/31 Are all instructions O(1)-executable? Triangle Counting in Large Sparse Graph – p.22/31 Are all instructions O(1)-executable? AC 0 instructions are those which can be realized with polynomial size and constant depth circuit. Triangle Counting in Large Sparse Graph – p.22/31 Are all instructions O(1)-executable? AC 0 instructions are those which can be realized with polynomial size and constant depth circuit. Multiplication is not an AC 0 instruction. Triangle Counting in Large Sparse Graph – p.22/31 Are all instructions O(1)-executable? AC 0 instructions are those which can be realized with polynomial size and constant depth circuit. Multiplication is not an AC 0 instruction. To access multi-dimension array in constant time, multiplication must be constant time executable. Triangle Counting in Large Sparse Graph – p.22/31 Are all instructions O(1)-executable? AC 0 instructions are those which can be realized with polynomial size and constant depth circuit. Multiplication is not an AC 0 instruction. To access multi-dimension array in constant time, multiplication must be constant time executable. We suggest those instructions can be implemented faster than multiplication is constant time executable. Triangle Counting in Large Sparse Graph – p.22/31 Population Count Triangle Counting in Large Sparse Graph – p.23/31 Population Count(1/3) Triangle Counting in Large Sparse Graph – p.24/31 Population Count(1/3) σ̊ is not supported by all types of CPU. Triangle Counting in Large Sparse Graph – p.24/31 Population Count(1/3) σ̊ is not supported by all types of CPU. Any alternative way? Triangle Counting in Large Sparse Graph – p.24/31 Population Count(1/3) σ̊ is not supported by all types of CPU. Any alternative way? The previous work shows a bitwise twiddling method to realize the population count. The method needs O(log(2) g) basic instructions. Hence, the speedup is O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to g = Ω(log m). Triangle Counting in Large Sparse Graph – p.24/31 Population Count(1/3) σ̊ is not supported by all types of CPU. Any alternative way? The previous work shows a bitwise twiddling method to realize the population count. The method needs O(log(2) g) basic instructions. Hence, the speedup is O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to g = Ω(log m). Any faster solution? Triangle Counting in Large Sparse Graph – p.24/31 Population Count(1/3) σ̊ is not supported by all types of CPU. Any alternative way? The previous work shows a bitwise twiddling method to realize the population count. The method needs O(log(2) g) basic instructions. Hence, the speedup is O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to g = Ω(log m). Any faster solution? To calculate a collective of population counts, shall we execute each population count exactly? Triangle Counting in Large Sparse Graph – p.24/31 Population Count(2/3) { 1 { 1 { 1 1 0 1 0 0 1 0 0 0 } } } Triangle Counting in Large Sparse Graph – p.25/31 Population Count(2/3) + 20 21 { { { { { 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 } } } } } Triangle Counting in Large Sparse Graph – p.25/31 Population Count(2/3) + 20 21 { { { { { 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 } } } } } Using this method to reduce 2d − 1 σ̊ into d σ̊. Triangle Counting in Large Sparse Graph – p.25/31 Population Count(2/3) + 20 21 { { { { { 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 } } } } } Using this method to reduce 2d − 1 σ̊ into d σ̊. The speedup is Ω(log1/2 m/ log(4) m). Triangle Counting in Large Sparse Graph – p.25/31 Instruction versus Memory(2/3) 3 ALGO 3 ALGO 7[12 <- ALGO 10] ALGO 7[12 <- ALGO 12] elapsed wall time (second) 2.5 2 1.5 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 rewiring probability Triangle Counting in Large Sparse Graph – p.26/31 Instruction versus Memory(2/3) 100 ALGO 7[12 <- ALGO 10] ALGO 7[12 <- ALGO 12] speedup relative to ALGO 3(%) 80 60 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 rewiring probability Triangle Counting in Large Sparse Graph – p.26/31 Conclusion Triangle Counting in Large Sparse Graph – p.27/31 Conclusion Triangle Counting in Large Sparse Graph – p.28/31 Conclusion Previous efficient algorithm, Forward Algorithm, needs O(m3/2 ) time and O(m) space. Triangle Counting in Large Sparse Graph – p.28/31 Conclusion Previous efficient algorithm, Forward Algorithm, needs O(m3/2 ) time and O(m) space. To develop algorithms on random access machines, we come up with two arguments. Triangle Counting in Large Sparse Graph – p.28/31 Conclusion Previous efficient algorithm, Forward Algorithm, needs O(m3/2 ) time and O(m) space. To develop algorithms on random access machines, we come up with two arguments. Based on the arguments, our algorithm has Ω(log1/2 m/ log(4) m) speedup. Triangle Counting in Large Sparse Graph – p.28/31 Conclusion Previous efficient algorithm, Forward Algorithm, needs O(m3/2 ) time and O(m) space. To develop algorithms on random access machines, we come up with two arguments. Based on the arguments, our algorithm has Ω(log1/2 m/ log(4) m) speedup. Though it may slightly worse than FFR Algorithm in the analysis of speedup, it performs better in practical. Triangle Counting in Large Sparse Graph – p.28/31 Future Work Triangle Counting in Large Sparse Graph – p.29/31 Future Work Triangle Counting in Large Sparse Graph – p.30/31 Future Work Maybe some graph features are more proper to analyze than degeneracy when the algorithm to calculate the intersection of given two sets changed. Triangle Counting in Large Sparse Graph – p.30/31 Future Work Maybe some graph features are more proper to analyze than degeneracy when the algorithm to calculate the intersection of given two sets changed. The same arguments on random access machines can be applied to many other algorithms. Triangle Counting in Large Sparse Graph – p.30/31 Thanks for your attention! Any Questions? Triangle Counting in Large Sparse Graph – p.31/31
© Copyright 2026 Paperzz