Connection-Based Condition 1 - The University of Texas at Dallas

Lecture 6-1
Community Detection
Weili Wu Ding-Zhu Du
University of Texas at Dallas
[email protected]
Outline
 Community Structure
 Connection-Based Detection
 LP-formulation
2
Community
• People in a same
community share
common interests in
- clothes, music, beliefs,
movies, food, etc.
• Influence each other
strongly.
3
Community Structure
Community without overlap
Community with overlap
* same color, same community
4
Community Structure
In the same community,
• two nodes can reach
each other in three steps.
• A few of tied key
persons: C, D
• Member A reaches
Member B via A-C-D-B
5
Community Structure
For different communities,
• Two nodes may have distance more than three.
6
Community Structure
For two overlapping communities,
• Two nodes can reach each other by at most six
steps.
A
C
B
7
Question ?
How to find a Community?
The definition is ambiguous.
So, we can only do model-based detection.
8
Model-Based Detection
Community Detection
Accurate or not?
Formulation (Model)
Solve formulated problem
9
Model-Based Physics
The Real World
Accurate or not?
Newton Model
Solve physics problem
10
No Satisfied Community Model !
11
Question ?
How to find a Community?
A simplest way is
•Connection-Based Detection
12
Outline
 Community Structure
 Connection-Based Detection
 LP-formulation
13
Based Fact
• More connections inside each community.
• Less connections between different
communities.
• There are several
ways to understand
this property.
14
Connection-Based Condition 1
Consider a graph G  (V , E ) with adjacency matrix (aij ).
A node subset U is a community in weak sense if
L(U , U )  L(U , U )
where L(U , W ) 
(Radicchi et al. 2004)
a .
iU , jW
ij
(1) Each community has more connections inside
than connections to outside.
15
Connection-Based Condition 1
Inside red >
outside blue
+ outside green
(1) Each community has more connections inside
than connections to outside.
16
Connection-Based Condition 2
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition (V1 , V2 ,..., Vk ) of V , each Vs induces
a community in the most weak sense if
L(Vs , Vs )  max L(Vs , Vt )
t :t  s
where L(U , W ) 
a .
iU , jW
ij
(Hu et al. 2008)
(2) Each community has more connections inside
than connections to any other community.
17
Connection-Based Condition 2
Inside red >
outside blue
Inside red >
outside green
(2) Each community has more connections inside
than connections to any other community.
18
Connection-Based Condition 3
Consider a graph G  (V , E ) with adjacency matrix (aij ).
A node subset U is a community if for
any x  U ,
L ( x, U )  L ( x , U )
where L(U , W ) 
a .
iU , jW
ij
(3) Each node in a community has more connections
Inside than connections to outside.
19
Connection-Based Condition 3
At each red node
Inside red >
outside blue
+ outside green
(3) Each node in a community has more connections
Inside than connections to outside.
20
Connection-Based Condition 4
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition (V1 , V2 ,..., Vk ) of V , each Vs induces
a community if for any x  Vs
L( x, Vs )  max L( x, Vt )
t :t  s
where L(U , W ) 
a .
iU , jW
ij
(4) Each node in a community has more connections
Inside than connections to any other community.
21
Connection-Based Condition 4
At each red node
Inside red >
outside blue
Inside red >
outside green
(4) Each node in a community has more connections
Inside than connections to any other community.
22
Relationship of Conditions
(3)
(4)
(1)
(2)
Weak sense
Most weak sense
23
Max Community Partition
Given a graph G  (V , E ), find a maximum
partition (V1 ,..., Vk ) satisfying condition (1)
or (2) or (3) or (4).
Theorem (Lu et al. 2013)
For every i  1,2,3,4, the Max Community
Partition problem under condition (i) is
NP - hard.
24
Qualified Cut
Given a graph G  (V , E ), is there a subset S of V
such that ( S , S ) satisfies community condition
(1) (2)(or (3)  (4))?
Approx. for Max Community Partition
Apply the Qualified Cut to each part of current
partition until no part can be cut.
25
Outline
 Community Structure
 Connection-Based Detection
 LP-formulation
26
Indicator
x is an indicator of an event if x  0 or 1, and
x  1  the event occurs.
For example
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
yk : the kth community exists
27
n
y
max
k 1
n
s.t.
x
k 1
ik
k
 1, i  1,2,..., n;
zlk  xik , zlk  x jk , xik  x jk  1  zlk
m
n
n
4 zlk   xik aij  yk ,
l 1
j 1 i 1
n
1 n
xik  yk   xik ,

n i 1
i 1
zlk , xik , yk {0,1}
1  i, j , k  n, 1  l  m.
28
Linear Constraints
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
el  (vi , v j ) :
zlk  xik
(el  Vk  vi  Vk )
zlk  x jk
(el  Vk  v j  Vk )
xik  x jk  1  zlk (el  Vk  vi or v j  Vk )
29
Linear Constraints
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
Community condition (1) :
m
n
n
m
2 zlk    xik aij  2 zlk
l 1
j 1 i 1
l 1
where m  # of edges, n  # of nodes.
30
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
Community condition (1) :
m
n
n
4 zlk   xik aij  yk
l 1
j 1 i 1
where m  # of edges, n  # of nodes, and
n
n
1
xik  yk   xik .

n i 1
i 1
31
n
y
max
k 1
n
s.t.
x
k 1
ik
k
 1, i  1,2,..., n;
zlk  xik , zlk  x jk , xik  x jk  1  zlk
m
n
n
4 zlk   xik aij  yk ,
l 1
j 1 i 1
n
1 n
xik  yk   xik ,

n i 1
i 1
zlk , xik , yk {0,1}
1  i, j , k  n, 1  l  m.
32
References
1
Zaixin Lu et al., The maximum community partition problem in networks,
Discrete Mathematic s, Algorithms and Applicatio ns 5 (2013).
2 Xiangsun Z hang et al., A combinator ial model and algorithm for globally
searching community structure in complex networks, J Comb Optim 23 (2012) :
425 - 442.
33
THANK YOU!