Document

Robust Local Community Detection:
On Free Rider Effect and Its Elimination
Yubao Wu1, Ruoming Jin2, Jing Li1, Xiang Zhang1
1Case
Western Reserve University
2Kent
State University
Generic Local Community Detection Problem
Input:
a) Graph 𝐺(𝑉, 𝐸)
b) A set of query nodes 𝑄
c) A goodness metric 𝑓 𝑆
A
Output: Subgraph 𝐺 𝑆 such that:
1) 𝑆 contains 𝑄 (𝑄 βŠ† 𝑆)
2) 𝑓 𝑆 is maximized
[1] M. Sozio, et al. KDD’10.
[2] W. Cui, et al. SIGMOD’14.
[3] L. Ma, et al. DaWak’13.
[4] B. Saha, et al. RECOMB’10.
[5] C. Tsourakakis, et al. SIGMOD’14.
[6] A. Clauset, PRE’05.
[7] F. Luo, et al. WIAS’08.
[8] R. Andersen, et al. FOCS’06.
Community Goodness Metrics
Intuitions
Internal
denseness
Internal
denseness &
external
sparseness
Boundary
sharpness
Goodness metrics
Ref.
Formulas 𝑓(𝑆)
Classic density
[1]
𝑒 𝑆 /|𝑆|
𝑒 𝑆 βˆ’ π›Όβ„Ž(|𝑆|)
concave β„Ž π‘₯
Edge-surplus
[2]
Minimum degree
[3,4]
minπ‘’βˆˆπ‘† 𝑀𝑆 (𝑒)
Subgraph modularity
[5]
𝑒 𝑆 /𝑒(𝑆, 𝑆)
Density-isolation
[6]
𝑒 𝑆 βˆ’ 𝛼 𝑒 𝑆, 𝑆 βˆ’ 𝛽|𝑆|
External conductance
[7]
𝑒 𝑆, 𝑆 /min{πœ™ 𝑆 , πœ™(𝑆)}
Local modularity
[8]
𝑒 𝛿𝑆, 𝑆 /𝑒(𝛿𝑆, 𝑉)
[1] B. Saha, et al. RECOMB’10.
[2] C. Tsourakakis, et al. SIGMOD’14.
[3] M. Sozio, et al. KDD’10.
[4] W. Cui, et al. SIGMOD’14.
β„Ž π‘₯ =
π‘₯
2
[5] F. Luo, et al. WIAS’08.
[6] K. J. Lang, CIKM’07.
[7] R. Andersen, et al. FOCS’06.
[8] A. Clauset, PRE’05.
Free Rider Effect
AβˆͺB AβˆͺC
Goodness metrics
A
Classic density
2.50
2.95
2.83
Edge-surplus
15.3
26.5
22.8
Minimum degree
4
4
4
Subgraph modularity
2.0
3.6
4.6
Density-isolation
-2.6
3.8
1.5
Ext. conductance
0.25
0.14
0.11
Local modularity
0.63
0.70
0.78
[1] B. Saha, et al. RECOMB’10.
[2] C. Tsourakakis, et al. SIGMOD’14.
[3] M. Sozio, et al. KDD’10.
[4] W. Cui, et al. SIGMOD’14.
[5] F. Luo, et al. WIAS’08.
[6] K. J. Lang, CIKM’07.
[7] R. Andersen, et al. FOCS’06.
[8] A. Clauset, PRE’05.
Free Rider Effect in Real Networks
(a) Co-author network
(b) Biological network
One existing method: classic density
Barna, Saha, et al. Dense subgraphs with restrictions and
applications to gene annotation graphs. RECOMB, 2010.
Query Biased Node Weighting
Node Weight:
πœ‹(𝑒) =
1
π‘Ÿ(𝑒)
π‘Ÿ 𝑒 : proximity value w.r.t. the query
Query biased density:
𝑒(𝑆)
𝜌(𝑆) =
πœ‹(𝑆)
πœ‹ 𝑆 =
π‘’βˆˆπ‘† πœ‹(𝑒)
: sum of node weights
Subgraph A becomes the
query biased densest subgraph
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
QDC Problem
Query biased densest connected subgraph (QDC) problem:
Input:
a) Graph 𝐺(𝑉, 𝐸)
b) A set of query nodes 𝑄
Output:
1)
2)
3)
Subgraph 𝐺 𝑆 such that:
𝑆 contains 𝑄 (𝑄 βŠ† 𝑆)
Query biased density 𝜌 𝑆 is maximized
𝐺[𝑆] is connected
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
QDC Problem and Two Related Problems
QDC
Input
Output
𝐺𝑆:
Complexity
QDC’
1) 𝐺(𝑉, 𝐸)
2) query 𝑄
QDC’’
1) 𝐺(𝑉, 𝐸)
2) query 𝑄
𝐺 𝑉, 𝐸
1) 𝑆 contains 𝑄
1) 𝑆 contains 𝑄
2) 𝜌 𝑆 is maximized 2) 𝜌 𝑆 is maximized 𝜌 𝑆 is maximized
3) 𝐺[𝑆] is connected
NP-hard
Polynomial
Polynomial
Optimal
Optimal
If 𝐺[𝑆] is connected
If 𝑆 contains 𝑄
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Finding the QDC’’
Finding the QDC’
1. Removing Low Degree Nodes
β€’ Reduce the search space
β€’ Retain the densest subgraph
Subgraph
contraction
2. Detect the Densest Subgraph
β€’ On the reduced search space
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Finding the QDC
Greedy Node Deletion
Local Expansion
1) Connect the query nodes with
a Steiner tree
2) Greedy local expansion
1) Delete low degree nodes
2) Maintain the connectivity
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Experimentsβ€”β€”Datasets
Dataset
# Nodes
# Edges
# Communities
Amazon
00,334,863
0,000,925,872
0,151,037
DBLP
00,317,080
0,001,049,866
0,013,477
Youtube
01,134,890
0,002,987,624
0,008,385
Orkut
03,072,441
0,117,185,083
6,288,363
LiveJournal
03,997,962
0,034,681,189
0,287,512
Friendster
65,608,366
1,806,067,135
0,957,154
[1] J. Yang and J. Leskovec. Defining and evaluating network
communities based on ground-truth. In ICDM, 2012.
[2] snap.stanford.edu
Experimentsβ€”β€”State-of-the-Art Methods
Classes
Abbr. Ref.
DS
Internal
denseness
Key Idea
[1] Densest subgraph with query constraint
OQC
[2] Optimal quasi-clique; edge-surplus
MDG
[3] Minimum degree
PRN
Internal
LS
denseness
& external EMC
sparseness
SM
[4] External conductance
Boundary
[8] Local modularity
LM
[5] Local spectral
[6] More internal edges than external edges
[7] Subgraph modularity
[1] B. Saha, et al. RECOMB’10.
[2] C. Tsourakakis, et al. SIGMOD’14.
[3] M. Sozio, et al. KDD’10.
[4] R. Andersen, et al. FOCS’06.
[5] M. W. Mahoney, et al. JMLR’12.
[6] G. W. Flake, KDD’00.
[7] F. Luo, et al. WIAS’08.
[8] A. Clauset, PRE’05.
Experimentsβ€”β€”Effectiveness Evaluat. Metrics
Metrics
F-score
Formulas
𝐹(𝑆, 𝑇) = 2 βˆ™
precision 𝑆, 𝑇 βˆ™ recall(𝑆, 𝑇)
precision 𝑆, 𝑇 + recall(𝑆, 𝑇)
𝑒(𝑆)
|𝑆|
Density
Community
𝑒 𝑆 β€² , 𝑆\𝑆 β€²
goodness Cohesiveness 𝑆min
β€² βŠ‚π‘† min{πœ™ (𝑆 β€² ), πœ™ (𝑆\𝑆 β€² )}
𝑆
𝑆
metrics
𝑒(𝑆)
Separability
𝑒(𝑆, 𝑆)
Consistency
1βˆ’
1
𝑆
𝑄
𝑄 β€² βŠ†π‘†, 𝑄′ = 𝑄
𝐹 𝑆, 𝑆 β€² βˆ’ 𝐹mean
[1] J. Yang and J. Leskovec. Dening and evaluating network communities
based on ground-truth. In ICDM, pages 745-754, 2012.
[2] Ma, Lianhang, et al. GMAC: A seed-insensitive approach to local
community detection. In DaWak, pages 297-308, 2013.
2
Effectiveness Evaluation β€”β€” F-Score
F-score
QDC
DS
LS
EMC
SM
LM
Amazon
0.83
0.52
0.54
0.46
0.69
0.66
0.61
0.60
0.58
DBLP
0.46
0.31
0.33
0.32
0.48
0.42
0.34
0.36
0.37
Youtube
0.43
0.23
0.22
0.17
0.26
0.24
0.21
0.21
0.22
Orkut
0.47
0.15
0.16
0.13
0.21
0.17
0.19
0.16
0.18
LiveJournal
0.64
0.48
0.47
0.40
0.52
0.51
0.47
0.48
0.49
Friendster
0.32
--
0.14
0.12
0.17
0.16
--
0.14
0.13
Avg. F-score
0.53
0.3
0.31
0.27
0.39
0.36
0.33
0.33
0.33
Avg. Precision 0.65
0.46
0.45
0.29
0.51
0.41
0.34
0.38
0.48
0.61
0.58
0.69
0.67
0.64
0.66
0.63
0.59
Avg. Recall
0.78
OQC MDG PRN
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Effectiveness Evaluationβ€”β€”Goodness Metrics
Community goodness metrics on LiveJournal graph
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Effectiveness Evaluationβ€”β€”Consistency
Consistency QDC
DS
OQC MDG PRN
LS
EMC
SM
LM
Amazon
0.94
0.77
0.76
0.58
0.79
0.69
0.74
0.67
0.61
DBLP
0.88
0.62
0.64
0.37
0.65
0.53
0.56
0.43
0.56
Youtube
0.85
0.61
0.54
0.46
0.71
0.41
0.57
0.37
0.36
Orkut
0.83
0.56
0.52
0.32
0.68
0.43
0.51
0.54
0.47
LiveJournal
0.93
0.74
0.67
0.43
0.84
0.64
0.73
0.58
0.52
Friendster
0.78
--
0.56
0.45
0.65
0.49
--
0.32
0.39
Average
0.87
0.64
0.62
0.44
0.72
0.53
0.61
0.49
0.49
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.
Conclusions
1) Free rider effect is a serious problem;
2) Query biased node weighting scheme can
effectively eliminate the free rider effect thus
improve the accuracy.
Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. Robust local
community detection: on free rider effect and its elimination.
PVLDB, 8(7):798-809, 2015.