Modularity Maximization - The University of Texas at Dallas

Lecture 6-2
Modularity Maximization
Ding-Zhu Du
University of Texas at Dallas
[email protected]
Model-Based Detections
•
•
•
•
•
Connection-based detection
Modularity maximization
Influence-based detection
Overlapping community detection
Hierarchy community detection
2
Model-Based Detection
Modularity Maximization
Is the most popular one
3
Outline
 Modularity Function
 Greedy
 Spectral Method and MP
Hybrid Method
4
Modularity Function
(Newman 2006)
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , define
ki k j 

1
Q

aij 
 Ci ,C j
2 | E | i , jV 
2 | E |
where ki is the degree of node i,
Ci is the community
containing i and  Ci ,C j is the Kronecher delta symbol.
This is the total difference of the fraction of the edges
within a community minus the expected number of
such fraction if edges were distribute d at random.
5
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , define
kj 
 aij
ki
Q  


 Ci ,C j
2 | E | 2 | E |
i , jV  2 | E |
where ki is the degree of node i.
If an edge was distribute d at random, then it has endpoint i
ki
with probabilit y
and has endpoint j with probabilit y
2| E |
kj
kj
ki
. Hence, it lies at (i, j ) with probabilit y

.
2| E |
2| E | 2| E |
6
Modularity Function
(Newman 2006)
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , define
a
 Ci ,C j
kj 
 ki
Q
 

 Ci ,C j
2| E |
i , jV  2 | E | 2 | E | 
where ki is the degree of node i,
Ci is the community
i , jV
ij
containing i and  Ci ,C j is the Kronecher delta symbol.
This is the total difference of the fraction of the edges
within a community minus the expected number of
such fraction if edges were distribute d at random.
7
Newman 2006
• M.E. J. Newman: Modularity and community
structure in networks, Proceedings of the
National Academy of Sciences, vol 103 no 23
(2006) pp. 8577-8582.
8
Modularity Function
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , define
ki k j 

1
Q

aij 
 Ci ,C j
2 | E | i , jV 
2 | E |
ki k j 

1


aij 

2 | E | Ci  C j 
2 | E |
in
out 2


(
2
|
E
|

|
E
1
Ci
Ci |)
in

2 | ECi | 


2 | E | Ci 
2| E |

 | E in |  2 | E in |  | E out |  2 
C
Ci
Ci
 
  i 

 
2
|
E
|
Ci  | E |

 

9
Modularity Function
(Newman 2006)
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition (V1 , V2 ,..., Vk ) of V , define
 L(V ,V )  L(V ,V )  L(V ,V )  2 
s
s
s
s
s
s
 
Q  
 
L(V , V )
s 1  L (V , V )

 

where L(U , W )   aij .
k
iU , jW
10
Modularity Function
(digraph)
Consider a directed graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , define
in out


k
1
i kj
Q
aij 
 Ci ,C j

2 | E | i , jV 
2 | E | 
where kiin and kiout are in - and out - degree of node i and  Ci ,C j is
the Kronecher delta symbol.
This is the total difference of the fraction of the edges within a
community minus the expected number of such fraction if edges
were distribute d at random.
11
Why call Modularity?
• Module = community in some complex
networks
• The function describes the quality of
modules.
12
Modularity Max is NP-hard
• U. Brandes, D. Delling, M. Gaertler, R. Gorke,
M. Hoefer, Z. Nikoloski, and D. Wagner: On
modularity clustering, IEEE Transactions on
Knowledge and Data Engineering (TKDE), vol
20, no 2 (2008) pp 172-188
13
Outline
 Modularity Function
 Greedy
 Spectral Method
Hybrid Method
14
Increment
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition C of V , the modularity function is
 | E in |  2 | E in |  | E out |  2 
C
Ci
Ci
 
Q   i 

 
2
|
E
|
Ci  | E |

 

When community Ci and C j are merged, the increment of Q is
 | ECi ,C j | | ECi || EC j
 Ci C j Q  2

2
2
|
E
|
4
|
E
|

|



15
Greedy Algorithm
input a graph G  (V , E );
U 1  {{v} | v  V };
for k  1 to n  1 do
choose Ci and C j from U k to maximize  Ci C j Q and
U k 1  (U k  {Ci , C j })  {Ci  C j };
k *  arg max Q(U k )
1 k  n
output U k *
16
Outline
 Modularity Function
 Greedy
 Spectral Method and MP
Hybrid Method
17
Qualified Cut
Given a graph G  (V , E ), find a subset S of V
to maximize Q ( S , S ).
Community Partition
Apply the Qualified Cut to each part of current
partition until value of Q cannot be increasd.
18
Quadratic Form
ki k j 

1
Q

aij 
 Ci ,C j
2 | E | i , jV 
2 | E |
ki k j 

1

(

aij 
 si s j  1)
4 | E | i , jV 
2 | E |
ki k j 

1


aij 
 si s j
4 | E | i , jV 
2 | E |
1 T

s Bs
4| E |
 1 if i is in group 1
si  
 - 1 if i is in group 2
19
Spectral Method
1 T
Q
s Bs
4| E |
achieves the maximum when s is parallel to
the eigenvecto r of the largest eigenvalue .
20
Linear Program
1
max
Bij (1  xij )

2 | E | i, j
s.t. xik  xij  x jk for all i, j , k
xij  {0,1} for all i, j
0 if i and j are in the same community
xij  
 1 if i and j are in different communitie s
21
Vector Program
1
max
Bij (1  si s j )

2 | E | i, j
s.t. si2  1 for all i
Semi-definite Program
22
Outline
 Modularity Function
 Greedy
 Spectral Method and MP
Hybrid Method
23
Resolution limit
• Misidentification: some derived communities
do not satisfy the weak community definition
or even the most weak community definition
• In other words, obtained communities may
have sparser connection within them than
between them.
24
Hybrid Detection:
a Possible Research Direction
25
Max Q s.t. condition (1)
•
•
•
•
•
This may give an improvement.
Is it possible to do?
(1) can be written as linear constraints
Q can be written as a quadratic function
Thus, Max Q s.t. (1) can be formulated as a
quadratic programming, which can be
transformed into a semi-definite programming
26
Linear Constraints
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
el  (vi , v j ) :
zlk  xik
(el  Vk  vi  Vk )
zlk  x jk
(el  Vk  v j  Vk )
xik  x jk  1  zlk (el  Vk  vi or v j  Vk )
27
Linear Constraints
xik : node vi belongs to the kth community Vk
zlk : edge el belongs to the kth community Vk
Community condition (1) :
m
n
n
m
2 zlk    xik aij  2 zlk
l 1
j 1 i 1
l 1
where m  # of edges, n  # of nodes.
28
Modularity Density
Modularity Density function (Li et al. 2008)
Consider a graph G  (V , E ) with adjacency matrix (aij ).
Given a partition (V1 , V2 ,..., Vk ) of V , define
 L(Vs ,Vs )  L(Vs , Vs ) 
D  

|
V
|
s 1 
s

where L(U , W )   aij .
k
iU , jW
29
Opt D s.t. condition (1)
•
•
•
•
•
This may give an improvement.
Is it possible to do?
(1) can be written as linear constraints
Q can be written as a fractional function
Thus, Max D s.t. (1) can be formulated as a
Geometric Programming.
30
Outline
 Community Structure
 Connection-Based Detection
 Influence-Based Detection
 Remarks
31
Remark 1
How to evaluate the method
for finding a community?
32
Clustering
33
Community Detection
34
Remark 2
How to do hierarchy
community detection?
35
Survey
• Introductory review: Communities in
networks by M. A. Porter, J.-P. Onnela, and P. J.
Mucha, Notices of the American Mathematical
Society 56, 1082 (2009)
• Comprehensive review: Community
detection in graphs by Santo Fortunato, Physics
Reports 486, 75 (2010)
36
THANK YOU!