3.3 Network-Centric Community Detection
A Unified Process
1
3.3 Network-Centric Community Detection
Comparison
– Spectral clustering essentially tries to minimize the number of edges
between groups.
– Modularity consider the number of edges which is smaller than
expected.
– The spectral partitioning is forced to split the network into
approximately equal-size clusters.
2
3.4 Hierarchy-Centric Community Detection
Hierarchy-centric methods
– build a hierarchical structure of communities based on network
topology
– two types of hierarchical clustering
• Divisive
• Agglomerative
Divisive Clustering
– 1. Put all objects in one cluster
– 2. Repeat until all clusters are singletons
• a) choose a cluster to split
what criterion?
• b) replace the chosen cluster with the sub-clusters
split into how many?
3
3.4 Hierarchy-Centric Community Detection
Divisive Clustering
– A Method: Cut the “weakest” tie
• At each iteration, find out the weakest edge.
This kind of edge is most likely to be a tie connecting two communities.
• Remove the edge.
Once a network is decomposed into two connected components, each component
is considered a community.
• Update the strength of links.
• This iterative process is applied to each community to find sub-communities.
4
3.4 Hierarchy-Centric Community Detection
Divisive Clustering
– “Finding and evaluating community structure in networks,” M.
Newman and M. Girvan, Physical Review, 2004
• find the weak ties based on “edge betweenness”
• Edge betweenness
the number of shortest paths between pair of nodes pass along the edge
utilized to find the “weakest” tie for hierarchical clustering
𝐶𝐵 𝑒(𝑣𝑖 , 𝑣𝑗 ) =
𝑣𝑠 ,𝑣𝑡 ∈𝑉,𝑠<𝑡
𝜎𝑠𝑡 𝑒(𝑣𝑖 , 𝑣𝑗 )
𝜎𝑠𝑡
0
𝐶𝐵 𝑒(𝑣𝑗 , 𝑣𝑖 )
𝑖𝑓 𝑖 < 𝑗
𝑖𝑓 𝑖 = 𝑗
𝑖𝑓 𝑖 > 𝑗
• where
𝜎𝑠𝑡 is the total number of shortest paths between nodes 𝑣𝑠 and 𝑣𝑡
𝜎𝑠𝑡 (𝑒(𝑣𝑖 , 𝑣𝑗 )) is the number of shortest paths between nodes 𝑣𝑠 and 𝑣𝑡 that pass
along the edge 𝑒(𝑣𝑖 , 𝑣𝑗 ).
5
3.4 Hierarchy-Centric Community Detection
Divisive Clustering
– The edge with higher betweenness tends to be the bridge between two
communities
– It is used to progressively remove the edges with the highest
betweenness.
6
3.4 Hierarchy-Centric Community Detection
Divisive Clustering
– “Finding and evaluating community structure in networks,” M.
Newman and M. Girvan, Physical Review, 2004
• Example
– Negatives for divisive clustering
• edge betweenness-based scheme requires high computation
• One removal of an edge will lead to the recomputation of betweenness for
all edges
7
3.4 Hierarchy-Centric Community Detection
Agglomerative Clustering
– begins with base (singleton) communities
– merges them into larger communities with certain criterion.
• One example criterion: modularity
Let 𝑒𝑖𝑗 be the fraction of edges in the network that connect nodes in community 𝑖
to those in community 𝑗
Let 𝑎𝑖 = 𝑗 𝑒𝑖𝑗 , then the modularity 𝑸 = 𝒊(𝒆𝒊𝒊 − 𝒂𝒊 𝟐 )
values approaching 𝑄 = 1 indicate networks with strong community structure
values for real networks typically fall in the range from 0.3 to 0.7
동일한 Community 안의 Edge 수
– 서로 다른 Community 들 간의 Edge 수
8
3.4 Hierarchy-Centric Community Detection
Agglomerative Clustering
– Two communities are merged if the merge results in the largest increase
of overall modularity
– The merge continues until no merge can be found to improve the
modularity.
Dendrogram according to Agglomerative Clustering based on Modularity
9
3.4 Hierarchy-Centric Community Detection
Agglomerative Clustering
– In the dendrogram, the circles at the bottom represent the individual
nodes of the network.
– As we move up the tree, the nodes join together to form larger and
larger communities, as indicated by the lines, until we reach the top,
where all are joined together in a single community.
– Alternatively, the dendrogram depicts an initially connected network
splitting into smaller and smaller communities as we go from top to
bottom.
– A cross section of the tree at any level, such the one indicated by a
dotted line, will give the communities at that level.
10
3.4 Hierarchy-Centric Community Detection
Divisive vs. Agglomerative Clustering
– Zachary's karate club study
Zachary observed 34 members of a karate club
over a period of two years. During the course
of the study, a disagreement developed
between the administrator (34) of the club and
the club's instructor (1), which ultimately
resulted in the instructor's leaving and starting
a new club, taking about a half of the original
club's members with him
11
3.4 Hierarchy-Centric Community Detection
Divisive vs. Agglomerative Clustering
– Divisive
• “Community structure in social and biological networks”, Michelle
Girvan, and M. E. J. Newman, 2001 Using edge-betweeness
– Agglomerative
• “Fast algorithm for detecting community structure in networks”, M. E.
J. Newman, 2003 Using modularity
Divisive
Agglomerative
12
Summary of Community Detection
Node-Centric Community Detection
– cliques, k-cliques, k-clubs
Group-Centric Community Detection
– quasi-cliques
Network-Centric Community Detection
– Clustering based on vertex similarity
– Latent space models, block models, spectral clustering, modulari
ty maximization
Hierarchy-Centric Community Detection
– Divisive clustering
– Agglomerative clustering
13
3.5 Community Evaluation
Here, we consider a “Social Network with Ground Truth”
– Community membership for each actor is known an ideal case
– For example,
• A synthetic networks generated based on predefined community
structures
L. Tang and H. Liu. “Graph mining applications to social network analysis.” In C.
Aggarwal and H.Wang, editors, Managing and MiningGraph Data, chapter 16,
pages 487.513.Springer, 2010b
• Some well-studied tiny networks like Zachary’s karate club with 34
members
M.Newman. “Modularity and community structure in networks.” PNAS,
103(23):8577.8582, 2006a.
Simple comparison between the ground truth with the identified
community structure
– Visualization
– One-to-one mapping
14
3.5 Community Evaluation
The number of communities after grouping can be different from
the ground truth
No clear community correspondence between clustering result
and the ground truth
How to measure the
clustering quality?
Each number denotes a node, and each circle or block denotes a community
1) Both communities {1, 3} and {2} map to the community {1, 2, 3} in the ground truth
2) The node 2 is wrongly assigned
Normalized Mutual Information (NMI) can be used
15
3.5 Community Evaluation
Entropy
– 확률변수의 불확실성을 측정하기 위한 것
– Measure of disorder
– The information volume contained in a random variable X (or in a
distribution X)
𝐻 𝑋 =−
𝑝 𝑥 𝑙𝑜𝑔𝑏 (𝑥)
𝑥∈𝑋
• X의 엔트로피는 X의 모든 가능한 결과값 x에 대해 x의 발생 확률과 그 확률
의 역수의 로그 값의 곱의 합
• 일반적으로 지수 b의 값으로서 2나 오일러의 수 e, 또는 10이 많이 사용된다.
b=2인 경우에는 엔트로피의 단위가 비트(bit)이며, b=e이면 네트(nat), 그리
고 b=10인 경우에는 디짓(digit)이 된다.
16
3.5 Community Evaluation
Entropy와 동전 던지기 [from wikipedia]
– 앞면과 뒷면이 나올 확률이 같은 동전을 던졌을 경우의 엔트로피를 생
각해 보자. 이는 H,T 두 가지의 경우만을 나타내므로 엔트로피는 1이다.
– 𝐻 𝑋 =−
1
1
1
1
𝑥∈𝑋 𝑝 𝑥 𝑙𝑜𝑔𝑏 𝑥 = −(2 × 𝑙𝑜𝑔2 2 + 2 × 𝑙𝑜𝑔2 2)=1
– 한편 공정하지 않는 동전의 경우에는 특정 면이 나올 확률이 상대적으
로 더 높기 때문에 엔트로피는 1보다 작아진다. 우리가 예측해서 맞출
수 있는 확률이 더 높아졌기 때문에 정보의 양, 즉 엔트로피는 더 작아
진 것이다. 동전던지기의 경우에는 앞,뒤 면이 나올 확률이 1/2로 같은
동전이 엔트로피가 가장 크다.
– 엔트로피를 불확실성(uncertainity)과
같은 개념이라고 인식할 수 있다.
– 불확실성이 높아질수록 정보의 양은
더 많아지고 엔트로피는 더 커진다.
17
3.5 Community Evaluation
Mutual Information (상호 정보량)
– It measures the shared information volume between two random
variables (or two distributions)
– 두 확률 변수 (또는 두 분포) X, Y가 얼마나 밀접한 관계가 있는지 또는
얼마나 서로간에 의존을 하는지를 측정
– 국문 참고 문헌
• http://shineware.tistory.com/7
• http://www.dbpia.co.kr/Journal/ArticleDetail/339089
18
3.5 Community Evaluation
Normalized Mutual Information (NMI, 정규화된 상호 정보량)
– It measures the shared information volume between two random
variables (or two distributions)
– 두 확률 변수 (또는 두 분포) X, Y가 얼마나 밀접한 관계가 있는지를 측정
– The values is between 0 and 1
Consider a partition as a random variable, we can compute the
matching quality between ground truth and the identified
clustering
19
3.5 Community Evaluation
NMI Example (1/2)
– Partition a (𝜋 𝑎 ): [1, 1, 1, 2, 2, 2]
– Partition b (𝜋 𝑏 ): [1, 2, 1, 3, 3, 3]
𝜋𝑎
𝜋𝑏
1, 2, 3
1, 3
4, 5, 6
2
4, 5,6
20
3.5 Community Evaluation
NMI Example (2/2)
𝜋𝑎
– Partition a (𝜋 𝑎 ): [1, 1, 1, 2, 2, 2]
– Partition b (𝜋 𝑏 ): [1, 2, 1, 3, 3, 3]
n ha
𝜋𝑏
nlb
1, 2, 3
1, 3
4, 5, 6
2
4, 5,6
nh ,l
=0.8278
21
3.5 Community Evaluation
Accuracy of Pairwise Community Memberships
– Consider all the possible pairs of nodes and check whether they reside
in the same community
– An error occurs if
• Two nodes belonging to the same community are assigned to different
communities after clustering
• Two nodes belonging to different communities are assigned to the same
community
– Construct a contingency table
22
3.5 Community Evaluation
Accuracy of Pairwise Community Memberships
1, 2,
3
4, 5,
6
Ground Truth
1, 3
2
4,
5, 6
Clustering Result
Accuracy = (4+9)/ (4+2+9+0) = 0.86
23
3.5 Community Evaluation
Accuracy of Pairwise Community Memberships
– Balanced Accuracy (BAC) = 1 – Balanced Error Rate (BER)
• 𝐵𝐴𝐶 =
• 𝐵𝐸𝑅 =
1
𝑎
𝑑
+
2 𝑎+𝑐
𝑏+𝑑
1 𝑐
𝑏
(
+
)
2 𝑎+𝑐
𝑏+𝑑
= 1 − 𝐵𝐸𝑅
• This measure assigns equal importance to “false positives” and
“false negatives”, so that trivial or random predictions incur an error
of 0.5 on average.
24
3.5 Community Evaluation
Accuracy of Pairwise Community Memberships
– Balanced Accuracy (BAC) = 1 – Balanced Error Rate (BER)
• 𝐵𝐴𝐶 =
• 𝐵𝐸𝑅 =
1
𝑎
𝑑
+
2 𝑎+𝑐
𝑏+𝑑
1 𝑐
𝑏
(
+
)
2 𝑎+𝑐
𝑏+𝑑
= 1 − 𝐵𝐸𝑅
𝐵𝐴𝐶 =
1 4
2 6
+
9
9
= 0.83
25
3.5 Community Evaluation
Evaluation without Ground Truth
– This is the most common situation
– Quantitative evaluation functions: modularity
• Once we have a network partition, we can compute its modularity
• The method with higher modularity wins
• modularity
Let 𝑒𝑖𝑗 be the fraction of edges in the network that connect nodes in community
𝑖 to those in community 𝑗
Let 𝑎𝑖 = 𝑗 𝑒𝑖𝑗 , then the modularity 𝑸 = 𝒊(𝒆𝒊𝒊 − 𝒂𝒊 𝟐 )
values approaching 𝑄 = 1 indicate networks with strong community structure
values for real networks typically fall in the range from 0.3 to 0.7
동일한 Community 안의 Edge 수
– 서로 다른 Community 들 간의 Edge 수
26
Book Available at
• Morgan & claypool Publish
ers
• Amazon
If you have any comments,
please feel free to
contact:
• Lei Tang, Yahoo! Labs,
[email protected]
• Huan Liu, ASU
[email protected]
27
© Copyright 2026 Paperzz