Lecture 18
Community structures
Slides modified from Huan Liu, Lei Tang, Nitin Agarwal
Community Detection
A community is a set of nodes between which the
interactions are (relatively) frequent
a.k.a. group, subgroup, module, cluster
Community detection
a.k.a. grouping, clustering, finding cohesive subgroups
Given: a social network
Output: community membership of (some) actors
Applications
Understanding the interactions between people
Visualizing and navigating huge networks
Forming the basis for other tasks such as data mining
2
Visualization after Grouping
4 Groups:
{1,2,3,5}
{4,8,10,12}
{6,7,11}
{9,13}
(Nodes colored by
Community Membership)
3
Classification
User Preference or Behavior can be represented as
class labels
• Whether or not clicking on an ad
• Whether or not interested in certain topics
• Subscribed to certain political views
• Like/Dislike a product
Given
A social network
Labels of some actors in the network
Output
Labels of remaining actors in the network
4
Visualization after Prediction
: Smoking
: Non-Smoking
: ? Unknown
Predictions
6: Non-Smoking
7: Non-Smoking
8: Smoking
9: Non-Smoking
10: Smoking
5
Link Prediction
Given a social network, predict which nodes are likely to
get connected
Output a list of (ranked) pairs of nodes
Example: Friend recommendation in Facebook
(2, 3)
(4, 12)
(5, 7)
(7, 13)
6
Viral Marketing/Outbreak Detection
Users have different social capital (or network values)
within a social network, hence, how can one make best
use of this information?
Viral Marketing: find out a set of users to provide
coupons and promotions to influence other people in the
network so my benefit is maximized
Outbreak Detection: monitor a set of nodes that can help
detect outbreaks or interrupt the infection spreading
(e.g., H1N1 flu)
Goal: given a limited budget, how to maximize the
overall benefit?
7
An Example of Viral Marketing
Find the coverage of the whole network of nodes with
the minimum number of nodes
How to realize it – an example
Basic Greedy Selection: Select the node that maximizes the
utility, remove the node and then repeat
• Select Node 1
• Select Node 8
• Select Node 7
Node 7 is not a node with
high centrality!
8
PRINCIPLES OF
COMMUNITY DETECTION
Communities
Community: “subsets of actors among whom there are
relatively strong, direct, intense, frequent or positive
ties.”
-- Wasserman and Faust, Social Network Analysis, Methods and Applications
Community is a set of actors interacting with each other
frequently
A set of people without interaction is NOT a community
e.g. people waiting for a bus at station but don’t talk to each
other
10
Example of Communities
Communities from
Facebook
Communities from
Flickr
11
Community Detection
Community Detection: “formalize the strong social
groups based on the social network properties”
Some social media sites allow people to join groups
Not all sites provide community platform
Not all people join groups
Network interaction provides rich information about the
relationship between users
Is it necessary to extract groups based on network topology?
Groups are implicitly formed
Can complement other kinds of information
Provide basic information for other tasks
12
Subjectivity of Community Definition
A densely-knit
community
Each component is
a community
Definition of a community
can be subjective.
13
Taxonomy of Community Criteria
Criteria vary depending on the tasks
Roughly, community detection methods can be divided
into 4 categories (not exclusive):
Node-Centric Community
Each node in a group satisfies certain properties
Group-Centric Community
Consider the connections within a group as a whole. The group
has to satisfy certain properties without zooming into node-level
Network-Centric Community
Partition the whole network into several disjoint sets
Hierarchy-Centric Community
Construct a hierarchical structure of communities
14
Node-Centric Community Detection
NodeCentric
HierarchyCentric
Community
Detection
NetworkCentric
GroupCentric
Node-Centric Community Detection
Nodes satisfy different properties
Complete Mutuality
cliques
Reachability of members
k-clique, k-clan, k-club
Nodal degrees
k-plex, k-core
Relative frequency of Within-Outside Ties
LS sets, Lambda sets
Commonly used in traditional social network analysis
16
Complete Mutuality: Clique
A maximal complete subgraph of three or more nodes all
of which are adjacent to each other
NP-hard to find the maximal clique
Recursive pruning: To find a clique
of size k, remove those nodes with
less than k-1 degrees
Normally use cliques as a core or
seed to explore larger communities
17
Geodesic
Reachability is calibrated by the
Geodesic distance
Geodesic: a shortest path between
two nodes (12 and 6)
Two paths: 12-4-1-2-5-6, 12-10-6
12-10-6 is a geodesic
Geodesic distance: #hops in geodesic
between two nodes
e.g., d(12, 6) = 2, d(3, 11)=5
Diameter: the maximal geodesic
distance for any 2 nodes in a network
Diameter = 5
#hops of the longest shortest path
18
Reachability: k-clique, k-club
Any node in a group should be
reachable in k hops
k-clique: a maximal subgraph in which
the largest geodesic distance between
any nodes <= k
A k-clique can have diameter larger
than k within the subgraph
e.g., 2-clique {12, 4, 10, 1, 6}
Within the subgraph d(1, 6) = 3
k-club: a substructure of diameter <= k
e.g., {1,2,5,6,8,9}, {12, 4, 10, 1} are 2-clubs
19
Nodal Degrees: k-core, k-plex
Each node should have a certain number of connections
to nodes within the group
k-core: a substracture that each node connects to at
least k members within the group
k-plex: for a group with ns nodes, each node should
be adjacent no fewer than ns-k in the group
The definitions are complementary
A k-core is a (ns-k)-plex
20
Within-Outside Ties: LS sets
LS sets: Any of its proper subsets has more ties to other
nodes in the group than outside the group
Too strict, not reasonable for network analysis
A relaxed definition is Lambda sets
Require the computation of edge-connectivity between any
pair of nodes via minimum-cut, maximum-flow algorithm
21
Recap of Node-Centric Communities
Each node has to satisfy certain properties
Complete mutuality
Reachability
Nodal degrees
Within-Outside Ties
Limitations:
Too strict, but can be used as the core of a community
Not scalable, commonly used in network analysis with small-size
network
Sometimes not consistent with property of large-scale networks
e.g., nodal degrees for scale-free networks
22
Group-Centric Community Detection
NodeCentric
HierarchyCentric
Community
Detection
NetworkCentric
GroupCentric
Group-Centric Community Detection
Consider the connections within a group as whole,
Some nodes may have low connectivity
A subgraph with Vs nodes and Es edges is a γ-dense
quasi-clique if
Recursive pruning:
Sample a subgraph, find a maximal γ-dense quasi-clique
the resultant size = k
Remove the nodes that
whose degree < kγ
all their neighbors with degree < kγ
24
Network-Centric Community Detection
NodeCentric
HierarchyCentric
Community
Detection
NetworkCentric
GroupCentric
Network-Centric Community Detection
To form a group, we need to consider the connections of
the nodes globally.
Goal: partition the network into disjoint sets
Groups based on
Node Similarity
Latent Space Model
Block Model Approximation
Cut Minimization
Modularity Maximization
26
Node Similarity
Node similarity is defined by how similar their interaction
patterns are
Two nodes are structurally equivalent if they connect to
the same set of actors
e.g., nodes 8 and 9 are structurally equivalent
Groups are defined over equivalent nodes
Too strict
Rarely occur in a large-scale
Relaxed equivalence class is difficult to compute
In practice, use vector similarity
e.g., cosine similarity, Jaccard similarity
27
Vector Similarity
1
2
3
4
1
5
6
7
8
9
10
11 12 13
a vector
5
1
structurally
equivalent
8
1
1
1
9
1
1
1
Cosine Similarity:
sim (5,8)
1
1
2 3
6
Jaccard Similarity:
J (5,8) |{1, 2|{,66},|13}| 1 / 4
28
Clustering based on Node Similarity
For practical use with huge networks:
Consider the connections as features
Use Cosine or Jaccard similarity to compute vertex similarity
Apply classical k-means clustering Algorithm
K-means Clustering Algorithm
Each cluster is associated with a centroid (center point)
Each node is assigned to the cluster with the closest centroid
29
Illustration of k-means clustering
Iteration 1
Iteration 2
Iteration 3
2.5
2.5
2.5
2
2
2
1.5
1.5
1.5
y
3
y
3
y
3
1
1
1
0.5
0.5
0.5
0
0
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2
-1.5
-1
-0.5
x
0
0.5
1
1.5
2
-2
Iteration 4
Iteration 5
2.5
2
2
2
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
0
-0.5
0
x
0.5
1
1.5
2
0
0.5
1
1.5
2
1
1.5
2
y
2.5
y
2.5
y
3
-1
-0.5
Iteration 6
3
-1.5
-1
x
3
-2
-1.5
x
-2
-1.5
-1
-0.5
0
x
0.5
1
1.5
2
-2
-1.5
-1
-0.5
0
0.5
x
30
Groups on Latent-Space Models
Latent-space models: Transform the nodes in a network into a
lower-dimensional space such that the distance or similarity between
nodes are kept in the Euclidean space
Multidimensional Scaling (MDS)
Given a network, construct a proximity matrix to denote the distance between
nodes (e.g. geodesic distance)
Let D denotes the square distance between nodes
S R nk denotes the coordinates in the lower-dimensional space
1
1
1
SS T ( I eeT ) D( I eeT ) ( D)
2
n
n
Objective: minimize the difference
Let
min || ( D) SS T || F
(the top-k eigenvalues of
), V the top-k eigenvectors
Solution:
Apply k-means to S to obtain clusters
33
MDS-example
1, 2, 3, 4,
10, 12
5, 6, 7, 8,
9, 11, 13
k-means
Geodesic Distance Matrix
1
2
3
4
5
6
7
8
9
10
1
0
1
1
1
2
2
3
1
1
2
2
1
0
2
2
1
2
3
2
2
3
11
12
13
4
2
2
4
3
3
3
1
2
0
2
3
3
4
2
2
3
5
3
3
4
1
2
2
0
3
2
3
2
2
1
5
2
1
3
3
0
1
2
2
2
2
6
2
2
3
2
1
0
1
1
1
1
7
3
3
4
3
2
1
0
2
2
2
8
1
2
2
2
2
1
2
0
2
2
9 10 11 12 13
1 2 4 2 2
2 3 4 3 3
2 3 5 3 3
2 1 4 1 3
2 2 3 3 3
1 1 2 2 2
2 2 1 3 3
2 2 3 3 1
0 2 3 3 1
2 0 3 1 3
4
1
3
3
3
3
2
2
2
1
3
3
3
3
1
3
3
1
3
1
3
0
4
4
4
0
4
4
4
0
MDS
S
-1.22
-0.88
-2.12
-1.01
0.43
0.78
1.81
-0.09
-0.09
0.30
2.85
-0.47
-0.29
-0.12
-0.39
-0.29
1.07
-0.28
0.04
0.02
-0.77
-0.77
1.18
0.00
2.13
-1.81
34
Block-Model Approximation
After
Reordering
Network Interaction Matrix
Block Structure
Objective: Minimize the difference between an interaction
matrix and a block structure
S is a
community
indicator matrix
Challenge: S is discrete, difficult to solve
Relaxation: Allow S to be continuous satisfying
Solution: the top eigenvectors of A
Post-Processing: Apply k-means to S to find the partition
35
Cut-Minimization
Between-group interactions should be infrequent
Cut: number of edges between two sets of nodes
Objective: minimize the cut
Limitations: often find communities of
only one node
Need to consider the group size
Two commonly-used variants:
Cut=2
Number of nodes
in a community
Cut =1
Number of
within-group
Interactions
36
Modularity Maximization
Modularity measures the group interactions compared
with the expected random connections in the group
In a network with m edges, for two nodes with degree di
and dj , expected random connections between them are
The interaction utility in a group:
To partition the group into
multiple groups, we maximize
Expected Number of
edges between 6 and 9
is
5*3/(2*17) = 15/34
38
Properties of Modularity
Properties of modularity:
Between (-1, 1)
Modularity = 0 If all nodes are clustered into one group
Can automatically determine optimal number of clusters
Resolution limit of modularity
Modularity maximization might return a community consisting
multiple small modules
40
Recap of Network-Centric Community
Network-Centric Community Detection
Groups based on
Node Similarity
Latent Space Models
Cut Minimization
Block-Model Approximation
Modularity maximization
Goal: Partition network nodes into several disjoint sets
Limitation: Require the user to specify the number of
communities beforehand
42
Hierarchy-Centric Community Detection
NodeCentric
HierarchyCentric
Community
Detection
NetworkCentric
GroupCentric
Hierarchy-Centric Community Detection
Goal: Build a hierarchical structure of communities
based on network topology
Facilitate the analysis at different resolutions
Representative Approaches:
Divisive Hierarchical Clustering
Agglomerative Hierarchical Clustering
44
Divisive Hierarchical Clustering
Divisive Hierarchical Clustering
Partition the nodes into several sets
Each set is further partitioned into smaller sets
Network-centric methods can be applied for partition
One particular example is based on edge-betweenness
Edge-Betweenness: Number of shortest paths between any pair of nodes that
pass through the edge
Between-group edges tend to have larger edge-betweenness
45
Divisive clustering on Edge-Betweenness
Progressively remove edges with the highest
3
betweenness
3
3
5
Remove e(2,4), e(3, 5)
5
Remove e(4,6), e(5,6)
Remove e(1,2), e(2,3), e(3,1)
4
4
root
V1,v2,v3
v1
v2
V4, v5, v6
v3
v4
v5
v6
46
Agglomerative Hierarchical Clustering
Initialize each node as a community
Choose two communities satisfying certain criteria and
merge them into larger ones
Maximum Modularity Increase
Maximum Node Similarity
root
V4, v5, v6
V1, v2, v3
v3
V1,v2
v1
v2
V1,v2
v4
(Based on Jaccard Similarity)
v5
v6
47
Recap of Hierarchical Clustering
Most hierarchical clustering algorithm output a binary
tree
Each node has two children nodes
Might be highly imbalanced
Agglomerative clustering can be very sensitive to the
nodes processing order and merging criteria adopted.
Divisive clustering is more stable, but generally more
computationally expensive
48
Summary of Community Detection
The Optimal Method?
It varies depending on applications, networks,
computational resources etc.
NodeCentric
Hierarchy
-Centric
Other lines of research
Communities in directed networks
Overlapping communities
Community evolution
Group profiling and interpretation
Community
Detection
GroupCentric
NetworkCentric
49
© Copyright 2026 Paperzz