Overcoming Resolution
Limits in MDL Community
Detection
L. Karl Branting
The MITRE Corporation
2nd SNA-KDD Workshop 24 Aug 2008
Outline
Utility functions in community detection
Resolution limits
MDL-based community detection
– Previous: RB and AP
– New: SGE
Experimental Evaluation
Lessons
2
2nd SNA-KDD Workshop 24 Aug 2008
Utility functions in community detection
Two components of community detection algorithms
– Utility function – quality criterion to be optimized
– Search strategy – procedure for finding optimal partition
Examples
– Garvin & Newman (2003)
Utility function: modularity
Search strategy: greedy divisive hierarchical clustering (iteratively
remove highest betweenness edge)
– Newman (2003)
Utility function: modularity
Search strategy: greedy agglomerative hierarchical clustering
(iteratively choose highest modularity merge)
– Tasgin & Bingol (2006)
Utility function: modularity
Search strategy: genetic algorithm
3
2nd SNA-KDD Workshop 24 Aug 2008
Utility functions in community detection
Other search strategies used with modularity
– Rattigan, Maier, Jensen (2007)
Utility function: modularity
Search strategy: Greedy divisive hierarchical clustering using a
Network Structured Index to approximation edge betweenness
– Donetti & Munoz (2004)
Utility function: modularity
Search strategy: greedy agglomerative hierarchical clustering with
spectral division
4
2nd SNA-KDD Workshop 24 Aug 2008
Utility functions in community detection
Statistical Approaches
– Zhang, Qiu, Giles, Foley, & Yen (2007)
Utility function: log-likelihood (LDA parameters)
Search strategy: fixed-point iteration
Compression-Based Approaches
– Rosvall & Bergstrom (2007)
Utility function: Minimum Description Length
Search strategy: simulated annealing
– Chakrabarti (2004)
Utility function: Minimum Description Length
Search strategy: exhaustive search for k, hill-climbing given k
Utility function implicit in search strategy
– Raghavan, Albert, & Kumara (2007) – marker passing
– Cliques, cores, etc.
5
2nd SNA-KDD Workshop 24 Aug 2008
Modularity
w( Dii ) li 2
l
l
1 i m
– W(Dii) = number of edges internal to group i
– li = number of edges incident to vertices in
group I
– l = total number of edges
Intuitive – expresses intuition that ratio of
internal to external edges is greater for
groups than for non-groups
Popular
Imperfect
– Fortunato & Barthelemy (2007) Resolution
limit: groups conflated if number of vertices
less than 2l
– Rosvall & Bergstrom (2007) Biased towards
same-sized groups
6
2nd SNA-KDD Workshop 24 Aug 2008
Resolution Limit
Ring graph R15,4
– 15 communities
– 4 nodes per
community
7
Community structure
that maximizes
modularity conflates
groups
2nd SNA-KDD Workshop 24 Aug 2008
Approaches to modularity’s resolution
limit
Apply recursively to large communities (Ruan & Zhang
2007)
Apply locally (Clauset 2005)
Choose a different utility function
8
2nd SNA-KDD Workshop 24 Aug 2008
Description Length
Utility of community structure is sum of bits needed to
represent
– Community structure +
– Graph given community structure
Search strategy attempts to minimize description length
There is no unique bit count
– Undecidability of Kolmogorov complexity
Previous approaches
– Rosvall & Bergstrom (2007): RB
Handles group size skew better than modularity
– Chakrabarti (2004): AP
– Comparison
Similar breakdown of bits
Different calculation
9
2nd SNA-KDD Workshop 24 Aug 2008
Components of Description
Components (details in paper)
1. Bits to represent number of nodes in graph
2.
3.
4.
5.
ignored because not specific to community structure
Bits to represent number of groups
Bits to represent mapping between nodes and groups
Bits needed for number of group-to-group edges
Bits needed for adjacencies between nodes
Purpose
– 2, 3, 4: represent group structure
– 1, 5: represent graph as a whole
10
2nd SNA-KDD Workshop 24 Aug 2008
Surprising Experimental Result
RB, AP, and modularity compared
as utility functions
– Applied to ring graphs Rm,c for 4 ≤
m ≤ 16 and 3 ≤ c ≤ 9
– Search strategy: greedy divisive
hierarchical clustering (iteratively
remove highest betweenness
edge)
Unsurprising result. Modularity
led to conflated groups for:
–
–
–
–
m > 8 and c = 3
m > 10 and c = 4
m > 11 and c = 5
m > 13 and c = 6,7
Surprising result.
– Both RB and AP conflated at least
one pair of groups in every Rm,c!
11
2nd SNA-KDD Workshop 24 Aug 2008
Hypothesis
Both RB and AP require at least one bit per pair of groups
in term 4
Perhaps this estimation causes group conflation
– Term 4 grows as the square of the number of groups
– If graph is sparse, conflating groups may save more in term 4
reduction than it costs in term 5 increase
Components
1. Bits to represent number of nodes in graph
2.
3.
4.
5.
ignored because not specific to community structure
Bits to represent number of groups
Bits to represent mapping between nodes and groups
Bits needed for number of group-to-group edges
Bits needed for adjacencies between nodes
12
2nd SNA-KDD Workshop 24 Aug 2008
SGE (Sparse Graph Encoding)
Components
1. Bits to represent number of nodes in graph
Ignored, as in RB and AP
2. Bits to represent number of groups
Follows RB
3. Bits to represent mapping between nodes and groups
Similar to AP
4. Bits needed for number of group to group edges
Split into 2 terms
-
Which pairs of groups are connected (much less than one bit per pair if
pairs sparsely or densely connected)
-
Number of edges between connected groups
Grows as number of connected pairs, not total number of pairs
5. Bits needed for adjacencies between nodes
Follows RB
13
2nd SNA-KDD Workshop 24 Aug 2008
Performance of SGE on Ring Graphs
Correct community structure found for every Rm,c for 4 ≤
m ≤ 16 and 3 ≤ c ≤ 9 except
– R4,3
– R13,3
Results confirm hypothesis that resolution limit in RB and
AP is result of over-counting term 4: the bits needed for
group-to-group edges
Significance
– Ring graphs rare in real world
– How does SGE compare on more realistic graphs?
14
2nd SNA-KDD Workshop 24 Aug 2008
Uniform random graph
Similar to graphs in
Rosvall & Bergstrom
(2007)
Test set
–
–
–
–
32 vertices
4 groups
average degree 6
size ratio
{1.0,1.25,1.5,1.75,2.0}
– Proportion internal
edges {0.6,0.75,0.9}
Example:
–
–
–
–
–
15
32 vertices
4 groups
average degree 6
size ratio 1.25
Proportion internal
edges
2nd 0.67
SNA-KDD Workshop 24 Aug 2008
Embedded Barabasi-Albert Graphs
Test set
– 4 communities
separately
generated by
preferential
attachment
– In each community
4 initial vertices
2-4 edges added
per time step
20 time steps
Example
– 4 communities
– 4 initial vertices
– 3 edges added per
time step
– 20 time steps
16
2nd SNA-KDD Workshop 24 Aug 2008
Evaluation Criteria
Rand index (Rand 1971)
Adjusted Rand index (Hubert & Arabie
1985)
F-measure – based on same-cluster pairs
– Recall =
| proposedPairs actualPairs |
| actualPairs |
– Precision =
– F-measure =
| proposedPairs actualPairs |
| proposedPairs |
2 * recall * precision
recall precision
17
2nd SNA-KDD Workshop 24 Aug 2008
Results: Uniform random graph
18
2nd SNA-KDD Workshop 24 Aug 2008
Results: Uniform random graph
19
2nd SNA-KDD Workshop 24 Aug 2008
Results: Uniform random graph
20
2nd SNA-KDD Workshop 24 Aug 2008
Results: Embedded Barabasi-Albert
21
2nd SNA-KDD Workshop 24 Aug 2008
Summary of Evaluation
Random graphs
– Community structure is weak
Group sizes are balanced – modularity is best
Group sizes are imbalanced – RS is best (as per Rosvall &
Bergstrom 2007)
– Community structure is strong
Group sizes are balanced – not much difference
Group sizes are imbalanced – modularity is particularly bad (as per
Rosvall & Bergstrom 2007), SGE slightly better than RS and AP
EBA graphs
– Sparse – AP and SGE weaker than modularity and RS
– Dense – essentially identical accuracy
22
2nd SNA-KDD Workshop 24 Aug 2008
Conclusion
Narrow
– Conflation of groups by MDL in sparse graphs (e.g., ring
graphs) can be avoided by adjusting group-to-group edge
counts.
– This change doesn’t hurt performance in more common types
of graphs.
– Compression-based clustering works well, but requires
tinkering
– Modularity detects weak structure well when graph not too big
and groups not too imbalanced
Broad
– Still unclear what utility function is best overall
– Needed: theory relating graph typology to utility functions
23
2nd SNA-KDD Workshop 24 Aug 2008
© Copyright 2026 Paperzz