Triangle Counting in Large Sparse Graph

Triangle Counting in Large Sparse
Graph
Meng-Tsung Tsai [email protected]
Triangle Counting in Large Sparse Graph – p.1/31
Problem Setting
Triangle Counting in Large Sparse Graph – p.2/31
Problem Setting(1/3)
Goal:
Calculating the cluster coefficient of a given graph
G(V, E), where |V | = n and |E| = m.
Triangle Counting in Large Sparse Graph – p.3/31
Problem Setting(1/3)
Goal:
Calculating the cluster coefficient of a given graph
G(V, E), where |V | = n and |E| = m.
Cluster coefficient indicates the probability that the
friend of one’s friend is also one’s friend.
Triangle Counting in Large Sparse Graph – p.3/31
Problem Setting(1/3)
Goal:
Calculating the cluster coefficient of a given graph
G(V, E), where |V | = n and |E| = m.
Cluster coefficient indicates the probability that the
friend of one’s friend is also one’s friend.
Cluster coefficient is one of the important features to
examine whether a man-made graph fits to the real
one.
Triangle Counting in Large Sparse Graph – p.3/31
Problem Setting(1/3)
Goal:
Calculating the cluster coefficient of a given graph
G(V, E), where |V | = n and |E| = m.
Cluster coefficient indicates the probability that the
friend of one’s friend is also one’s friend.
Cluster coefficient is one of the important features to
examine whether a man-made graph fits to the real
one.
In terms of graph theory,
3 × number of triangles ∈ G
CC(G) =
.
number of triples ∈ G
Triangle Counting in Large Sparse Graph – p.3/31
Problem Setting(1/3)
Goal:
Calculating the cluster coefficient of a given graph
G(V, E), where |V | = n and |E| = m.
Cluster coefficient indicates the probability that the
friend of one’s friend is also one’s friend.
Cluster coefficient is one of the important features to
examine whether a man-made graph fits to the real
one.
In terms of graph theory,
3 × number of triangles ∈ G
CC(G) =
.
number of triples ∈ G
triple
u
u
TTu
triangle
u
uTTu
Triangle Counting in Large Sparse Graph – p.3/31
Problem Setting(2/3)
Example:
z`
`
z
T `
T Tz
z
Triangle Counting in Large Sparse Graph – p.4/31
Problem Setting(2/3)
Example:
z`
`
z
T `
T Tz
z
number of triangle = 2
Triangle Counting in Large Sparse Graph – p.4/31
Problem Setting(2/3)
Example:
z`
`
z
T `
T Tz
z
number of triangle = 2
number of triple = 8
Triangle Counting in Large Sparse Graph – p.4/31
Problem Setting(2/3)
Example:
z`
`
z
T `
T Tz
z
number of triangle = 2
number of triple = 8
cluster coefficient = 3 × 2 / 8 = 0.75
Triangle Counting in Large Sparse Graph – p.4/31
Problem Setting(2/3)
Example:
z`
`
z
T `
T Tz
z
number of triangle = 2
number of triple = 8
cluster coefficient = 3 × 2 / 8 = 0.75
Triple counting is easy; therefore, the main difficulty to
calculate CC(G) is triangle counting.
Triangle Counting in Large Sparse Graph – p.4/31
Problem Setting(3/3)
Requirement:
Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n3 ) time.
Triangle Counting in Large Sparse Graph – p.5/31
Problem Setting(3/3)
Requirement:
Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n3 ) time.
We focus on social network graphs which cluster
coefficient is especially important in.
Triangle Counting in Large Sparse Graph – p.5/31
Problem Setting(3/3)
Requirement:
Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n3 ) time.
We focus on social network graphs which cluster
coefficient is especially important in.
In social network, the fact that m = ω(n2 ) usually holds.
Triangle Counting in Large Sparse Graph – p.5/31
Triangle Counting (Trivial Algorithm)
Triangle Counting in Large Sparse Graph – p.6/31
Trivial Algorithm
z
T
T v
u
Tz
z
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
z
T
T
Tz
z
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
z
T
T
Tz
z
Let M be a matrix such that Mi,j is 1 if f an edge to
connect vertices i and j exists.
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
z
T
T
Tz
z
Let M be a matrix such that Mi,j is 1 if f an edge to
connect vertices i and j exists.
2
mean?
Let M 2 be M · M . What does Mi,j
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
z
T
T
Tz
z
Let M be a matrix such that Mi,j is 1 if f an edge to
connect vertices i and j exists.
2
mean?
Let M 2 be M · M . What does Mi,j
1P 2
Mi,j · Mi,j
△=
6
Triangle Counting in Large Sparse Graph – p.7/31
Trivial Algorithm
z
T
T v
u
Tz
z
+
u
z
vz
=
z
T
T
Tz
z
Let M be a matrix such that Mi,j is 1 if f an edge to
connect vertices i and j exists.
2
mean?
Let M 2 be M · M . What does Mi,j
1P 2
Mi,j · Mi,j
△=
6
Simple Matrix Multiplication, Strassen Algorithm,
and Winograd Algorithm all require O(n2 ) space to
obtain M 2 . Not Acceptable!
Triangle Counting in Large Sparse Graph – p.7/31
Triangle Counting (Forward
Algorithm)
Triangle Counting in Large Sparse Graph – p.8/31
Forward Algorithm(1/2)
~`
`
T ````~
T
T T
T T~
~
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2
~`
4
`
T ````~
T
T T
1 T 3
T~
~
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
~`
4 {1, 2, 3}
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
∩ Nv |
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
∩ Nv |
all triangles can be found
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
∩ Nv |
all triangles can be found + all found objects are triangles
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~
1
`
```
``~
T
T
T T
4 T 3
T~
~
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
2
∩ Nv |
all triangles can be found + all found objects are triangles
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~
1 {}
`
```
``~
T
T
T T
4 {1,
2} T 3 {1, 2}
T~
~
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
2 {1}
∩ Nv |
all triangles can be found + all found objects are triangles
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(1/2)
2 {1}
{1} ∩ {1, 2, 3} = {1}
4 {1, 2, 3}
~
1 {}
`
```
``~
T
T
T T
4 {1,
2} T 3 {1, 2}
T~
~
~`
`
T ````~
T
T T
1 {}
T 3 {2}
T~
~
△=
P
edge(u,v)∈E |Nu
2 {1}
∩ Nv |
all triangles can be found + all found objects are triangles
time: O(m · d(G)), space: Θ(m)
Triangle Counting in Large Sparse Graph – p.9/31
Forward Algorithm(2/2)
Assign indices to vertices according to their degree. The
higher the degree of a vertex is, the lower the index of it is.
Triangle Counting in Large Sparse Graph – p.10/31
Forward Algorithm(2/2)
Assign indices to vertices according to their degree. The
higher the degree of a vertex is, the lower the index of it is.
√
√
If degree of vertex v ≤ 2m, |Nv | ≤ 2m.
Triangle Counting in Large Sparse Graph – p.10/31
Forward Algorithm(2/2)
Assign indices to vertices according to their degree. The
higher the degree of a vertex is, the lower the index of it is.
√
√
If degree of vertex v ≤ 2m, |Nv | ≤ 2m.
If degree of vertex v >= k, at most 2m/k vertices with
√
higher degree. Thus, |Nv | <= 2m where
√
deg(v) ≥ 2m.
Triangle Counting in Large Sparse Graph – p.10/31
Forward Algorithm(2/2)
Assign indices to vertices according to their degree. The
higher the degree of a vertex is, the lower the index of it is.
√
√
If degree of vertex v ≤ 2m, |Nv | ≤ 2m.
If degree of vertex v >= k, at most 2m/k vertices with
√
higher degree. Thus, |Nv | <= 2m where
√
deg(v) ≥ 2m.
There exists another algorithm to find the optimum solution
of d(G) in O(m) time.
Triangle Counting in Large Sparse Graph – p.10/31
Triangle Counting (Four Russians’
Algorithm)
Triangle Counting in Large Sparse Graph – p.11/31
Four-Russians’ Algorithm
{1, 0, 1, 1, . . .}
{0, 1, 0, 0, . . .}
...
Triangle Counting in Large Sparse Graph – p.12/31
Four-Russians’ Algorithm
sector
z}|{
{ 1, 0, 1, 1, . . .}
{0, 1, 0, 0, . . .}
...
{2, 3, . . .}
{1, 0, . . .}
...
Triangle Counting in Large Sparse Graph – p.12/31
Four-Russians’ Algorithm
sector
z}|{
{ 1, 0, 1, 1, . . .}
{0, 1, 0, 0, . . .}
...
0
1
2
3
0
0
0
0
0
1
0
1
0
1
2
0
0
1
1
{2, 3, . . .}
{1, 0, . . .}
...
3
0
1
1
2
Triangle Counting in Large Sparse Graph – p.12/31
Four-Russians’ Algorithm
sector
z}|{
{ 1, 0, 1, 1, . . .}
{0, 1, 0, 0, . . .}
...
0
1
2
3
0
0
0
0
0
1
0
1
0
1
2
0
0
1
1
{2, 3, . . .}
{1, 0, . . .}
...
3
0
1
1
2
The table utilized in Four-Russians’ Algorithm is 2log n
by 2log n . Thus, its speedup is O(log n).
Triangle Counting in Large Sparse Graph – p.12/31
Triangle Counting (FFR Algorithm)
Triangle Counting in Large Sparse Graph – p.13/31
FFR Algorithm
P
The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward
Algorithm can be sped up with Four-Russians’
Algorithm.
Triangle Counting in Large Sparse Graph – p.14/31
FFR Algorithm
P
The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward
Algorithm can be sped up with Four-Russians’
Algorithm.
Let the length of sectors be 12 log m, additional space
for table is Θ(m).
Triangle Counting in Large Sparse Graph – p.14/31
FFR Algorithm
P
The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward
Algorithm can be sped up with Four-Russians’
Algorithm.
Let the length of sectors be 12 log m, additional space
for table is Θ(m).
The
pnumber of non-all-zero sectors
p in Nv is
O( m/ log m) where deg(v) ≤ m/ log m.
Triangle Counting in Large Sparse Graph – p.14/31
FFR Algorithm
P
The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward
Algorithm can be sped up with Four-Russians’
Algorithm.
Let the length of sectors be 12 log m, additional space
for table is Θ(m).
The
pnumber of non-all-zero sectors
p in Nv is
O( m/ log m) where deg(v) ≤ m/ log m.
The
p in Nv is
pnumber of non-all-zero sectors
O( m/ log m) where deg(v) ≥ m/ log m.
Triangle Counting in Large Sparse Graph – p.14/31
FFR Algorithm
P
The red part of △ = edge(u,v)∈E |Nu ∩ Nv | in Forward
Algorithm can be sped up with Four-Russians’
Algorithm.
Let the length of sectors be 12 log m, additional space
for table is Θ(m).
The
pnumber of non-all-zero sectors
p in Nv is
O( m/ log m) where deg(v) ≤ m/ log m.
The
p in Nv is
pnumber of non-all-zero sectors
O( m/ log m) where deg(v) ≥ m/ log m.
FFR needs O(m3/2 / log1/2 m) time.
Triangle Counting in Large Sparse Graph – p.14/31
CPU Instruction versus Memory
Access
Triangle Counting in Large Sparse Graph – p.15/31
Instruction versus Memory(1/3)
The inner product in Four-Russians’ Algorithm can be
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than
that of memory access.
Triangle Counting in Large Sparse Graph – p.16/31
Instruction versus Memory(1/3)
The inner product in Four-Russians’ Algorithm can be
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than
that of memory access.
"logical and" C = A ˚
∧ B, Ci = min(Ai , Bi )
Triangle Counting in Large Sparse Graph – p.16/31
Instruction versus Memory(1/3)
The inner product in Four-Russians’ Algorithm can be
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than
that of memory access.
"logical and" C = A ˚
∧ B, Ci = min(Ai , Bi )
Pg
"population count" d = σ̊ A, d = i=1 Ai
Triangle Counting in Large Sparse Graph – p.16/31
Instruction versus Memory(2/3)
4
wall time (second per 10,000 runs)
ALGO 5
ALGO 2 with p= 8
ALGO 2 with p=16
3.5
3
2.5
2
0
10
20
30
40
50
60
bit density (x out of 64 bits are 1)
Triangle Counting in Large Sparse Graph – p.17/31
Instruction versus Memory(2/3)
30
ALGO 2 with p= 8
ALGO 2 with p=16
ALGO 2 with p=22
wall time (second per 10,000 runs)
25
20
15
10
5
0
0
10
20
30
40
50
60
bit density (x out of 64 bits are 1)
Triangle Counting in Large Sparse Graph – p.17/31
Instruction versus Memory(3/3)
CPU instructions can handle sectors of size g, where g is
the length of CPU register.
Triangle Counting in Large Sparse Graph – p.18/31
Instruction versus Memory(3/3)
CPU instructions can handle sectors of size g, where g is
the length of CPU register.
Is g a constant in the analysis of algorithm?
Triangle Counting in Large Sparse Graph – p.18/31
Instruction versus Memory(3/3)
CPU instructions can handle sectors of size g, where g is
the length of CPU register.
Is g a constant in the analysis of algorithm?
Are all instructions O(1)-executable?
Triangle Counting in Large Sparse Graph – p.18/31
Is g a constant?
Triangle Counting in Large Sparse Graph – p.19/31
Is g a constant?
Triangle Counting in Large Sparse Graph – p.20/31
Is g a constant?
Assume a program executed on M , a random access
machine, using Θ(S) memory space.
Triangle Counting in Large Sparse Graph – p.20/31
Is g a constant?
Assume a program executed on M , a random access
machine, using Θ(S) memory space.
Θ(S) memory address is required.
Triangle Counting in Large Sparse Graph – p.20/31
Is g a constant?
Assume a program executed on M , a random access
machine, using Θ(S) memory space.
Θ(S) memory address is required.
The length of the registers in M is Ω(log S).
Triangle Counting in Large Sparse Graph – p.20/31
Are all instructions O(1)-executable?
Triangle Counting in Large Sparse Graph – p.21/31
Are all instructions O(1)-executable?
Triangle Counting in Large Sparse Graph – p.22/31
Are all instructions O(1)-executable?
AC 0 instructions are those which can be realized with
polynomial size and constant depth circuit.
Triangle Counting in Large Sparse Graph – p.22/31
Are all instructions O(1)-executable?
AC 0 instructions are those which can be realized with
polynomial size and constant depth circuit.
Multiplication is not an AC 0 instruction.
Triangle Counting in Large Sparse Graph – p.22/31
Are all instructions O(1)-executable?
AC 0 instructions are those which can be realized with
polynomial size and constant depth circuit.
Multiplication is not an AC 0 instruction.
To access multi-dimension array in constant time,
multiplication must be constant time executable.
Triangle Counting in Large Sparse Graph – p.22/31
Are all instructions O(1)-executable?
AC 0 instructions are those which can be realized with
polynomial size and constant depth circuit.
Multiplication is not an AC 0 instruction.
To access multi-dimension array in constant time,
multiplication must be constant time executable.
We suggest those instructions can be implemented
faster than multiplication is constant time
executable.
Triangle Counting in Large Sparse Graph – p.22/31
Population Count
Triangle Counting in Large Sparse Graph – p.23/31
Population Count(1/3)
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(1/3)
σ̊ is not supported by all types of CPU.
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(1/3)
σ̊ is not supported by all types of CPU.
Any alternative way?
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(1/3)
σ̊ is not supported by all types of CPU.
Any alternative way?
The previous work shows a bitwise twiddling method to
realize the population count. The method needs
O(log(2) g) basic instructions. Hence, the speedup is
O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to
g = Ω(log m).
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(1/3)
σ̊ is not supported by all types of CPU.
Any alternative way?
The previous work shows a bitwise twiddling method to
realize the population count. The method needs
O(log(2) g) basic instructions. Hence, the speedup is
O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to
g = Ω(log m).
Any faster solution?
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(1/3)
σ̊ is not supported by all types of CPU.
Any alternative way?
The previous work shows a bitwise twiddling method to
realize the population count. The method needs
O(log(2) g) basic instructions. Hence, the speedup is
O(g 1/2 / log(2) g) = Ω(log1/2 m/ log(3) m) due to
g = Ω(log m).
Any faster solution?
To calculate a collective of population counts, shall we
execute each population count exactly?
Triangle Counting in Large Sparse Graph – p.24/31
Population Count(2/3)
{ 1
{ 1
{ 1
1
0
1
0 0
1 0
0 0
}
}
}
Triangle Counting in Large Sparse Graph – p.25/31
Population Count(2/3)
+
20
21
{
{
{
{
{
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
}
}
}
}
}
Triangle Counting in Large Sparse Graph – p.25/31
Population Count(2/3)
+
20
21
{
{
{
{
{
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
}
}
}
}
}
Using this method to reduce 2d − 1 σ̊ into d σ̊.
Triangle Counting in Large Sparse Graph – p.25/31
Population Count(2/3)
+
20
21
{
{
{
{
{
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
}
}
}
}
}
Using this method to reduce 2d − 1 σ̊ into d σ̊.
The speedup is Ω(log1/2 m/ log(4) m).
Triangle Counting in Large Sparse Graph – p.25/31
Instruction versus Memory(2/3)
3
ALGO 3
ALGO 7[12 <- ALGO 10]
ALGO 7[12 <- ALGO 12]
elapsed wall time (second)
2.5
2
1.5
1
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
rewiring probability
Triangle Counting in Large Sparse Graph – p.26/31
Instruction versus Memory(2/3)
100
ALGO 7[12 <- ALGO 10]
ALGO 7[12 <- ALGO 12]
speedup relative to ALGO 3(%)
80
60
40
20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
rewiring probability
Triangle Counting in Large Sparse Graph – p.26/31
Conclusion
Triangle Counting in Large Sparse Graph – p.27/31
Conclusion
Triangle Counting in Large Sparse Graph – p.28/31
Conclusion
Previous efficient algorithm, Forward Algorithm,
needs O(m3/2 ) time and O(m) space.
Triangle Counting in Large Sparse Graph – p.28/31
Conclusion
Previous efficient algorithm, Forward Algorithm,
needs O(m3/2 ) time and O(m) space.
To develop algorithms on random access machines,
we come up with two arguments.
Triangle Counting in Large Sparse Graph – p.28/31
Conclusion
Previous efficient algorithm, Forward Algorithm,
needs O(m3/2 ) time and O(m) space.
To develop algorithms on random access machines,
we come up with two arguments.
Based on the arguments, our algorithm has
Ω(log1/2 m/ log(4) m) speedup.
Triangle Counting in Large Sparse Graph – p.28/31
Conclusion
Previous efficient algorithm, Forward Algorithm,
needs O(m3/2 ) time and O(m) space.
To develop algorithms on random access machines,
we come up with two arguments.
Based on the arguments, our algorithm has
Ω(log1/2 m/ log(4) m) speedup.
Though it may slightly worse than FFR Algorithm in
the analysis of speedup, it performs better in practical.
Triangle Counting in Large Sparse Graph – p.28/31
Future Work
Triangle Counting in Large Sparse Graph – p.29/31
Future Work
Triangle Counting in Large Sparse Graph – p.30/31
Future Work
Maybe some graph features are more proper to
analyze than degeneracy when the algorithm to
calculate the intersection of given two sets changed.
Triangle Counting in Large Sparse Graph – p.30/31
Future Work
Maybe some graph features are more proper to
analyze than degeneracy when the algorithm to
calculate the intersection of given two sets changed.
The same arguments on random access machines can
be applied to many other algorithms.
Triangle Counting in Large Sparse Graph – p.30/31
Thanks for your attention!
Any Questions?
Triangle Counting in Large Sparse Graph – p.31/31