Community structure in social and biological networks

COMMUNITY STRUCTURE IN SOCIAL
AND BIOLOGICAL NETWORKS
Michelle Girvan and M. E. J. Newman
December 7, 2001
TOPICS TO BE COVERED
Network
 Detecting community structure
 Traditional methods
 New approach
 Test of the methods

NETWORK

Many systems take the form of networks, sets of
nodes or vertices joined together in pairs by links
or edges.
NETWORK

Many systems take the form of networks, sets of
nodes or vertices joined together in pairs by links
or edges.
NETWORK
Many systems take the form of networks, sets of
nodes or vertices joined together in pairs by links
or edges.
 Social networks, internet, biological networks

DETECTING COMMUNITY STRUCTURE
---CLUSTERING

Cluster analysis seeks grouping of elements into
subsets based on similarity between pairs of
elements.
DETECTING COMMUNITY STRUCTURE
---CLUSTERING

Cluster analysis seeks grouping of elements into
subsets based on similarity between pairs of
elements.
DETECTING COMMUNITY STRUCTURE
---CLUSTERING

Cluster analysis seeks grouping of elements into
subsets based on similarity between pairs of
elements.
DETECTING COMMUNITY STRUCTURE
---STATISTICAL PROPERTY

Subsets of vertices within which vertex-vertex
connections are dense, but between which
connections are less dense.
DETECTING COMMUNITY STRUCTURE
---STATISTICAL PROPERTY

Subsets of vertices within which vertex-vertex
connections are dense, but between which
connections are less dense.
DETECTING COMMUNITY STRUCTURE
---PRACTICAL APPLICATIONS

Social networks


Citation network


Represent real social groupings
Represent related papers on a single topic
Web

Represent pages on related topics
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS

Calculates a weight Wij for every pair i, j of
vertices in the network
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS
Calculates a weight Wij for every pair i, j of
vertices in the network
 Join vertices together in order of their weights

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS
Calculates a weight Wij for every pair i, j of
vertices in the network
 Join vertices together in order of their weights

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS
Calculates a weight Wij for every pair i, j of
vertices in the network
 Join vertices together in order of their weights

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS
Calculates a weight Wij for every pair i, j of
vertices in the network
 Join vertices together in order of their weights

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS
Calculates a weight Wij for every pair i, j of
vertices in the network
 Join vertices together in order of their weights

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS (CONT.)

One possible way to calculate weight: the number
of node-independent paths between vertices
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS (CONT.)

One possible way to calculate weight: the number
of node-independent paths between vertices

Node-independent path: two paths which connect the
same pair of vertices are said to be node-independent
if they share none of the same vertices other than
their initial and final vertices
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS (CONT.)

One possible way to calculate weight: the number
of node-independent paths between vertices

Node-independent path: two paths which connect the
same pair of vertices are said to be node-independent
if they share none of the same vertices other than
their initial and final vertices
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS (CONT.)

One possible way to calculate weight: the number
of node-independent paths between vertices
Node-independent path: two paths which connect the
same pair of vertices are said to be node-independent
if they share none of the same vertices other than
their initial and final vertices
 This weight is also known as the minimum number of
vertices that need be removed from the graph in
order to disconnect i and j

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them

This number may be infinite
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them

This number may be infinite ---
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them

This number may be infinite ---

l: length of each path
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them

This number may be infinite ---
l: length of each path
 a: small, so that the weighted count of the number of
paths converges

DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Another possible way to calculate weight: count
the total number of paths that run between them

This number may be infinite ---
l: length of each path
 a: small, so that the weighted count of the number of
paths converges
a b c d e
 A: adjacency matrix of the network a 1 1 0 0 0

b 1 1 1 1 0
c
a
b
e
d
c 0 1 1 0 1
d 0 1 0 1 1
e 0 0 1 1 1
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Shortcoming

Separate single peripheral vertices from the
communities to which they should rightly belong
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Shortcoming

Separate single peripheral vertices from the
communities to which they should rightly belong
DETECTING COMMUNITY STRUCTURE
---TRADITIONAL METHODS(CONT.)

Shortcoming

Separate single peripheral vertices from the
communities to which they should rightly belong
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH

Instead of trying to construct a measure which
tells us which edges are most central to
communities, we focus instead on those edges
which are least central, the edges which are most
“between” communities.
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH

Instead of trying to construct a measure which
tells us which edges are most central to
communities, we focus instead on those edges
which are least central, the edges which are most
“between” communities.
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH
Instead of trying to construct a measure which
tells us which edges are most central to
communities, we focus instead on those edges
which are least central, the edges which are most
“between” communities.
 Construct communities by progressively
removing edges from the original graph

DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness

Number of shortest paths between pairs of vertices
that run along it
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness

Number of shortest paths between pairs of vertices
that run along it
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness
Number of shortest paths between pairs of vertices
that run along it
 The edges connecting communities will have high
edge betweenness

DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness
Number of shortest paths between pairs of vertices
that run along it
 The edges connecting communities will have high
edge betweenness

DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

Edge betweenness
Number of shortest paths between pairs of vertices
that run along it
 The edges connecting communities will have high
edge betweenness
 Separate communities by removing these edges

DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)
1. Calculate the betweenness for all edges in the
network.
 2. Remove the edge with the highest betweenness.
 3. Recalculate betweennesses for all edges
affected by the removal.
 4. Repeat from step 2 until no edges remain.

DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

1. Calculate the betweenness for all edges in the
network.
c
a
b
e
d
ab
4
bc
3
bd
3
ce
3
de
3
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

2. Remove the edge with the highest betweenness.
c
a
b
e
d
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

3. Recalculate betweennesses for all edges
affected by the removal.
c
a
b
e
d
bc
2
bd
2
ce
2
de
2
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)

4. Repeat from step 2 until no edges remain.
c
a
b
e
d
DETECTING COMMUNITY STRUCTURE
---NEW APPROACH (CONT.)
c
a
b
e
d
a
b
c
d
e
TESTS OF THE METHODS

The friendship network : nodes associated with the club
administrator’s faction are drawn as circles, those
associated with the instructor’s faction are drawn as
squares.
TESTS OF THE METHODS(CONT.)

Hierarchical tree calculated by using edge-independent
path counts, which fails to extract the known community
structure of the network
TESTS OF THE METHODS(CONT.)

Hierarchical tree showing the complete community
structure for the network calculated by using the algorithm
presented in this article. The initial split of the network
into two groups is in agreement with the actual factions
observed by Zachary, with the exception that node 3 is
misclassified.
Thank You