Jaccard filtering on two Network Datasets
Minghao Tian
Final Report for CSE5559
May 2, 2016
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
1 / 15
Jaccard Index
Given an arbitrary graph G, let NuG denote the set of immediate
neighbors of u (i.e, nodes connected to u ∈ V (G) by edges in E(G)).
Given any edge (u, v ) ∈ E(G), the Jaccard index ρu,v of this edge is
defined as
N G ∩ NvG
ρu,v (G) = uG
.
Nu ∪ NvG
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
2 / 15
τ -Jaccard filtering
b for each edge (u, v ) ∈ E(G),
b we insert the edge (u, v )
Given graph G,
b ≥ τ . That is, V (G̃τ ) = V (G)
b and
into E(G̃τ ) if and only if ρu,v (G)
b | ρu,v (G)
b ≥ τ }.
E(G̃τ ) := {(u, v ) ∈ E(G)
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
3 / 15
The conclusion I want to derive
Jaccard-filtering may be “robust” to Erdős-Rényi type perturbation for
some datasets.
Erdős-Rényi type perturbation: with probability p we delete the existed
edges in G and with probability q we insert the non-existed edges to G.
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
4 / 15
Experiment Design
For a network G,
1
2
3
b
Apply Erdős-Rényi kind perturbation to G and derive G
b and derive Gτ and G
bτ
Apply τ -Jaccard filtering to G and G
Uniformly sampling N points in G and get the N × N shortest path
distance matrix MG induced by the whole graph G. Do the same
b and derive M b . Also
thing to the corresponding N points in G
G
uniformly sampling N points in Gτ and do the same things to Gτ
b τ , thus we get M and M b
and G
Gτ
Gτ
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
5 / 15
The distance-matrix-to-rips-complexes Algorithm
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
6 / 15
The distance-matrix-to-rips-complexes Algorithm
(conti.)
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
7 / 15
Experiment Design (conti.)
4
5
Use the distance-matrix-to-rips-complexes algorithm and Gudhi to
get the up-to-1 dimension barcodes BG , BGb , BGτ and BGb τ
i , B i ),
Compute the 1-Wasserstein distances Dw,1 (BG
b
G
i , B i ) and Bottleneck distances D
i
i
Dw,1 (BG
w,∞ (BG , B b ),
b
τ
Gτ
G
i , B i ) where i ∈ {0, 1} indicates the dimension of the
Dw,∞ (BG
bτ
τ
G
topological features
6
i
i
i , B i ) and D
Expect to see that Dw,1 (BG
w,∞ (BGτ , B b ) are
bτ
τ
G
Gτ
significantly smaller than the corresponding ones between G and
b
G
Also, we do not want Jaccard-filtering kill too many edges. One way to
see this is that we show you the persistent diagrams of the these
graphs and the ones after Jaccard-filtering still obtain many topological
features.
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
8 / 15
Facebook combined1 (Undirected, Nodes: 4039,
Edges: 88234)
We choose p = 0.2, q = 0.01 and τ = 0.04.
O v.s P
DJ v.s
JAP
1
2
3
4
5
DW ,∞
26
26
26
26
26
DW ,1
19546
15055
15050
11274
15665
DW ,∞
23
22
21
23
23
DW ,1
8623
2369
1173
5511
7738
Table : 1-dimension Facebook combined with p = 0.2, q = 0.01, τ = 0.04
and N = 100. “O” represents the original graph, “P” represents the perturbed
one, “DJ” represents the one after directly Jaccard-filtering and “JAP”
represents the one doing Jaccard-filtering after perturbation.
1
http://snap.stanford.edu/data/egonets-Facebook.html
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
9 / 15
Facebook combined(Undirected, Nodes: 4039,
Edges: 88234) (conti.)
barcode_N_1000_original_facebook
dim = 1
barcode_N_1000_perturb_facebook_p_0.2_q_0.01
dim = 1
4
x 10
30
4
x 10
30
8
6
7
25
25
5
6
20
5
15
4
3
10
Death Time
Death Time
20
4
15
3
10
2
2
5
5
1
1
0
0
5
10
15
Birth Time
20
25
30
0
0
0
5
10
15
Birth Time
20
25
30
0
Figure : Left: The persistent diagram of the Facebook combined dataset;
Right: The one after Erdős-Rényi type perturbation;
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
10 / 15
Facebook combined(Undirected, Nodes: 4039,
Edges: 88234) (conti.)
barcode_N_1000_Jaccard_filtering_directly_facebook_J_0.04
dim = 1
4
x 10
30
2.5
barcode_N_1000_Jaccard_filtering_after_perturb_facebook_p_0.2_q_0.01_J_0.04
4
dim = 1
x 10
30
2
1.8
25
25
1.6
2
1.4
20
1.5
15
1
10
Death Time
Death Time
20
1.2
15
1
0.8
10
0.6
5
0.5
5
0
0
0.4
0.2
0
0
5
10
15
Birth Time
20
25
30
0
5
10
15
Birth Time
20
25
0
30
Figure : Left: The one using Jaccard-filtering directly; Right: The one using
Jaccard-filtering after perturbation;
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
11 / 15
Twitter combined2 (Directed, Nodes: 81306, Edges:
1768149)
We view this directed graph as undirected one and choose p = 0.2,
q = 0.005 and τ = 0.01.
O v.s P
DJ v.s
JAP
1
2
3
4
5
DW ,∞
26
25
26
26
25
DW ,1
16641
5818
17013
15195
6769
DW ,∞
26
25
25
25
25
DW ,1
11133
15704
14030
14160
4588
Table : 1-dimension Twitter combined with p = 0.2, q = 0.005, τ = 0.01 and
N = 100
2
http://snap.stanford.edu/data/egonets-Twitter.html
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
12 / 15
Twitter combined3 (Directed, Nodes: 81306, Edges:
1768149) (conti.)
barcode_N_1000_original_twitter_combined
dim = 1
barcode_N_1000_perturb_twitter_combined_p_0.2_q_0.005
dim = 1
4
30
x 10
12
30
25
10
25
20
8
20
4
x 10
8
7
6
Death Time
Death Time
5
15
15
6
10
4
10
5
2
5
0
0
0
4
3
2
1
0
5
10
15
Birth Time
20
25
30
0
5
10
15
Birth Time
20
25
30
0
Figure : Left: The persistent diagram of the Twitter combined dataset; Right:
The one after Erdős-Rényi type perturbation;
3
Minghaohttp://snap.stanford.edu/data/egonets-Twitter.html
Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
13 / 15
Twitter combined4 (Directed, Nodes: 81306, Edges:
1768149) (conti.)
barcode_N_1000_Jaccard_filtering_directly_twitter_combined_J_0.01
dim = 1
30
4
x 10
7
barcode_N_1000_Jaccard_filtering_after_perturb_twitter_combined_p_0.2_q_0.005_J_0.01
4
dim = 1
x 10
30
2.5
6
25
25
2
5
20
4
15
3
Death Time
Death Time
20
1.5
15
1
10
10
2
5
0.5
5
1
0
0
5
10
15
Birth Time
20
25
30
0
0
0
5
10
15
Birth Time
20
25
0
30
Figure : Left: The one using Jaccard-filtering directly; Right: The one using
Jaccard-filtering after perturbation;
4
Minghaohttp://snap.stanford.edu/data/egonets-Twitter.html
Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
14 / 15
The End
Minghao Tian (Final Report for CSE5559)
Jaccard filtering
May 2, 2016
15 / 15
© Copyright 2026 Paperzz