Jaccard filtering on two Network Datasets

Jaccard filtering on two Network Datasets
Minghao Tian
Final Report for CSE5559
May 2, 2016
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
1 / 15
Jaccard Index
Given an arbitrary graph G, let NuG denote the set of immediate
neighbors of u (i.e, nodes connected to u ∈ V (G) by edges in E(G)).
Given any edge (u, v ) ∈ E(G), the Jaccard index ρu,v of this edge is
defined as
N G ∩ NvG
ρu,v (G) = uG
.
Nu ∪ NvG
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
2 / 15
τ -Jaccard filtering
b for each edge (u, v ) ∈ E(G),
b we insert the edge (u, v )
Given graph G,
b ≥ τ . That is, V (G̃τ ) = V (G)
b and
into E(G̃τ ) if and only if ρu,v (G)
b | ρu,v (G)
b ≥ τ }.
E(G̃τ ) := {(u, v ) ∈ E(G)
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
3 / 15
The conclusion I want to derive
Jaccard-filtering may be “robust” to Erdős-Rényi type perturbation for
some datasets.
Erdős-Rényi type perturbation: with probability p we delete the existed
edges in G and with probability q we insert the non-existed edges to G.
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
4 / 15
Experiment Design
For a network G,
1
2
3
b
Apply Erdős-Rényi kind perturbation to G and derive G
b and derive Gτ and G
bτ
Apply τ -Jaccard filtering to G and G
Uniformly sampling N points in G and get the N × N shortest path
distance matrix MG induced by the whole graph G. Do the same
b and derive M b . Also
thing to the corresponding N points in G
G
uniformly sampling N points in Gτ and do the same things to Gτ
b τ , thus we get M and M b
and G
Gτ
Gτ
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
5 / 15
The distance-matrix-to-rips-complexes Algorithm
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
6 / 15
The distance-matrix-to-rips-complexes Algorithm
(conti.)
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
7 / 15
Experiment Design (conti.)
4
5
Use the distance-matrix-to-rips-complexes algorithm and Gudhi to
get the up-to-1 dimension barcodes BG , BGb , BGτ and BGb τ
i , B i ),
Compute the 1-Wasserstein distances Dw,1 (BG
b
G
i , B i ) and Bottleneck distances D
i
i
Dw,1 (BG
w,∞ (BG , B b ),
b
τ
Gτ
G
i , B i ) where i ∈ {0, 1} indicates the dimension of the
Dw,∞ (BG
bτ
τ
G
topological features
6
i
i
i , B i ) and D
Expect to see that Dw,1 (BG
w,∞ (BGτ , B b ) are
bτ
τ
G
Gτ
significantly smaller than the corresponding ones between G and
b
G
Also, we do not want Jaccard-filtering kill too many edges. One way to
see this is that we show you the persistent diagrams of the these
graphs and the ones after Jaccard-filtering still obtain many topological
features.
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
8 / 15
Facebook combined1 (Undirected, Nodes: 4039,
Edges: 88234)
We choose p = 0.2, q = 0.01 and τ = 0.04.
O v.s P
DJ v.s
JAP
1
2
3
4
5
DW ,∞
26
26
26
26
26
DW ,1
19546
15055
15050
11274
15665
DW ,∞
23
22
21
23
23
DW ,1
8623
2369
1173
5511
7738
Table : 1-dimension Facebook combined with p = 0.2, q = 0.01, τ = 0.04
and N = 100. “O” represents the original graph, “P” represents the perturbed
one, “DJ” represents the one after directly Jaccard-filtering and “JAP”
represents the one doing Jaccard-filtering after perturbation.
1
http://snap.stanford.edu/data/egonets-Facebook.html
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
9 / 15
Facebook combined(Undirected, Nodes: 4039,
Edges: 88234) (conti.)
barcode_N_1000_original_facebook
dim = 1
barcode_N_1000_perturb_facebook_p_0.2_q_0.01
dim = 1
4
x 10
30
4
x 10
30
8
6
7
25
25
5
6
20
5
15
4
3
10
Death Time
Death Time
20
4
15
3
10
2
2
5
5
1
1
0
0
5
10
15
Birth Time
20
25
30
0
0
0
5
10
15
Birth Time
20
25
30
0
Figure : Left: The persistent diagram of the Facebook combined dataset;
Right: The one after Erdős-Rényi type perturbation;
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
10 / 15
Facebook combined(Undirected, Nodes: 4039,
Edges: 88234) (conti.)
barcode_N_1000_Jaccard_filtering_directly_facebook_J_0.04
dim = 1
4
x 10
30
2.5
barcode_N_1000_Jaccard_filtering_after_perturb_facebook_p_0.2_q_0.01_J_0.04
4
dim = 1
x 10
30
2
1.8
25
25
1.6
2
1.4
20
1.5
15
1
10
Death Time
Death Time
20
1.2
15
1
0.8
10
0.6
5
0.5
5
0
0
0.4
0.2
0
0
5
10
15
Birth Time
20
25
30
0
5
10
15
Birth Time
20
25
0
30
Figure : Left: The one using Jaccard-filtering directly; Right: The one using
Jaccard-filtering after perturbation;
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
11 / 15
Twitter combined2 (Directed, Nodes: 81306, Edges:
1768149)
We view this directed graph as undirected one and choose p = 0.2,
q = 0.005 and τ = 0.01.
O v.s P
DJ v.s
JAP
1
2
3
4
5
DW ,∞
26
25
26
26
25
DW ,1
16641
5818
17013
15195
6769
DW ,∞
26
25
25
25
25
DW ,1
11133
15704
14030
14160
4588
Table : 1-dimension Twitter combined with p = 0.2, q = 0.005, τ = 0.01 and
N = 100
2
http://snap.stanford.edu/data/egonets-Twitter.html
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
12 / 15
Twitter combined3 (Directed, Nodes: 81306, Edges:
1768149) (conti.)
barcode_N_1000_original_twitter_combined
dim = 1
barcode_N_1000_perturb_twitter_combined_p_0.2_q_0.005
dim = 1
4
30
x 10
12
30
25
10
25
20
8
20
4
x 10
8
7
6
Death Time
Death Time
5
15
15
6
10
4
10
5
2
5
0
0
0
4
3
2
1
0
5
10
15
Birth Time
20
25
30
0
5
10
15
Birth Time
20
25
30
0
Figure : Left: The persistent diagram of the Twitter combined dataset; Right:
The one after Erdős-Rényi type perturbation;
3
Minghaohttp://snap.stanford.edu/data/egonets-Twitter.html
Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
13 / 15
Twitter combined4 (Directed, Nodes: 81306, Edges:
1768149) (conti.)
barcode_N_1000_Jaccard_filtering_directly_twitter_combined_J_0.01
dim = 1
30
4
x 10
7
barcode_N_1000_Jaccard_filtering_after_perturb_twitter_combined_p_0.2_q_0.005_J_0.01
4
dim = 1
x 10
30
2.5
6
25
25
2
5
20
4
15
3
Death Time
Death Time
20
1.5
15
1
10
10
2
5
0.5
5
1
0
0
5
10
15
Birth Time
20
25
30
0
0
0
5
10
15
Birth Time
20
25
0
30
Figure : Left: The one using Jaccard-filtering directly; Right: The one using
Jaccard-filtering after perturbation;
4
Minghaohttp://snap.stanford.edu/data/egonets-Twitter.html
Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
14 / 15
The End
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
15 / 15