[PDF]

2011
2011
Ninth
IFIPIEEE/IFIP
Ninth International
International
Conference
Conference
on on
Embedded
Embedded
andand
Ubiquitous
Ubiquitous
Computing
Computing
Finding Community Structure in Complex Networks Using Parallel Approach
Zahra Masdarolomoor
Reza Azmi
Department of Computer
Engineering, Alzahra
University
Tehran, Iran
[email protected]
Department of Computer
Engineering, Alzahra
University
Tehran, Iran
[email protected]
Sadegh Aliakbary
Department of Computer
Engineering, Sharif
University
Tehran, Iran
[email protected]
algorithms[5], [6] and simulated annealing[7] are some
examples.
More explanations about existing methods are brought in
next section.
Abstract— Network analysis is an important term in different
scientific areas and finding the structure of communities is a
significant challenge in network analysis. A group of vertices with
high intra-connection and sparse inter-connection is called
community. In this paper, we propose a novel method for
community detection in networks, which works better in time
and precision compared to similar methods. The proposed
method is able to detect communities of a wide variety of
networks with different properties. This method is an
agglomerative parallel algorithm. Also it can find multiple
communities and exchange the nodes between detected
communities simultaneously. It has utilized local modularity for
constructing the communities. After all, genetic algorithm is used
to optimize the parameters of the proposed method. The
algorithm is evaluated by modularity metric and shows a
noticeable good precision. Also it has used simulated annealing to
maximize the modularity.
II.
INTRODUCTION
Analysis of network structure is an interesting point for
scientists in different areas such as computer and physics in
recent years. Many systems can be presented as networks.
Collaboration
networks,
Internet,
World-Wide-Web,
biological networks and social networks are just some
examples [1-4].
A network has two important components: vertices and
edges. Vertices are set of nodes in the graph, representing
entities like people/organizations in social networks or
computers/routers in the Internet. These nodes are connected
by links or edges, representing connection between people or
data.
One of the special interests in social network analysis is
finding community structure. Community is a group of
vertices that are tightly connected to each other and loosely
connected with other nodes. Community detection is the
process of network partitioning into similar groups or clusters.
Community detection has many applications including
realization of the network structure, detecting communities of
special interest (such as terrorists), graph visualization,
improving search engines, etc.
The problem of finding network communities has been
studied more in recent years. Spectral partitioning[1], [2],
divisive and agglomerative approaches[3], [4], evolutionary
978-0-7695-4552-3/11 $26.00 © 2011 IEEE
DOI 10.1109/EUC.2011.37
RELATED WORKS
Many methods are proposed to detect communities in
networks recent years. Spectral methods are based on the
analysis of the eigenvectors of matrices derived from the
networks. The quantity measured corresponds to the
eigenvalues of matrices associated with the adjacency matrix.
These methods have been discussed in a survey by Newman
[8].
Divisive approaches try to find the edges between
communities to omit them. After the edges between
communities have deleted, the communities remain. The
pioneer idea in community detection using this approach is
Girvan-Newman (GN) algorithm [9]. GN is a divisive method
which uses edge betweenness centrality as a metric to identify
the boundaries of communities. This metric detects the edges
between communities by counting the number of shortest
paths between two particular nodes that passes through a
special edge or node. This approach is successful for many
networks such as email messages, human and animal social
networks. But the cost of the algorithm is unsatisfactory:
O(m2n) on a network with m edges and n nodes or O(n3) in a
sparse graph (one in which m ~ n). So it fails on networks
with more than a few thousands nodes.
On the other hand, Agglomerative methods start with all
nodes disconnected and then apply some similarity
measurement to progressively join them and obtain to
communities.
Divisive algorithms usually offer a good precision
(according to modularity measurement) but an unsatisfactory
performance: Time complexity of divisive algorithms is
usually unsatisfactory and they fail in large networks. In
contrast agglomerative approaches can achieve to good results
in reasonable time. So we try to find a new agglomerative
method to community detection with better performance and
time complexity.
After that, Girvan and Newman proposed a new method
based on a quantity called modularity [4]. Modularity measure
(Q) is used to evaluate community-detection methods.
Keywords- community detection; parallel; genetic algorithm;
local modularity; modularity; agglomerative; simulated annealing;
I.
Nooshin Riahi
Department of Computer
Engineering, Alzahra
University
Tehran, Iran
[email protected]
482
475
474
better community structure and also a genetic algorithm
technique is applied to optimize parameters of the proposed
method.
Modularity is a real number (-1<Q<+1) while higher
modularity shows better community detection quality.
Among the current methods, extremal optimization [10] is
practically successful. Extremal optimization (EO) uses the
heuristic search for optimizing the value of the modularity Q.
The EO defines a new equation based on modularity to
partition the network. This new equation -called local
modularity- represents the contribution of individual vertex i
to modularity Q. The EO is a divisive method with time
complexity of O(n2logn). It divides the whole network in to
random community and exchanges the nodes between
communities by using local modularity. It has a good precise
but it can be faster.
Here we propose a new agglomerative method to parallel
community detection using local modularity. The proposed
method can detect a number of communities simultaneously. It
uses simulated annealing to achieve to better modularity.
III.
B. Local Modularity
A new equation is extracted from modularity – called local
modularity – which functions in vertices instead of
communities [10]. Local modularity expresses the contribution
of individual vertex i to the modularity Q.
The local modularity to each vertex i is given by
(4)
If ci is the community of vertex i,
is the number of
edges that vertex i belonging to community ci have with
vertices in the same community. Also ki=j Aij is the degree of
vertex i and Aij is the adjacency matrix of the network, and
APPLIED METRICS
A. Modularity
We evaluate our approach by modularity Q. So first we
explain modularity Q:
(5)
In (5) variable j is a node and ci is the community of node i.
kj is the degree of node j and M is the total number of edges in
the network. So
agglomerate the degree of the nodes inside
community ci and divide it by 2M. So we can say
nearly
shows the portion of a community in the entire network.
(1)
that i is a detected community. eii is a fraction of edges that
falls within community i. to explain ai first look at (2).
Local modularity is a great function to detect communities
progressively in the network. The most important feature of
local modularity is that after all communities are detected, the
summation of qi over all nodes in the network can achieve to
modularity.
(2)
(6)
In (2) i and j are community indexes. The summation of
two parts of (2) is achieved to ai.
So we try to find a novel idea to collect nodes with higher
values of local modularity which can lead us to higher
modularity and makes modularity maximize.
(3)
Local modularity has two input parameter: The node i and
the community of node i (ci). The output is a float number
explains the amount of dependency of the node i to community
ci. Actually qi of the nodes in the boundaries of communities
are small values (less than 1) and it is helpful to test these
nodes more to find their proper communities. So local
modularity is a good metric as a similarity measurement in our
agglomerative hierarchical approach.
ai is the fraction of all ends of edges that are attached to
vertices in community i.
Properties of modularity are that Q = (-1 , 1) and the values
close to 1 indicates good community detection. If Q = 0, it
shows random graph or all graph in one community. If Q is
close to -1, it means each vertex is in one community or no
particular community structure is detected. Q more than 0.3
shows good partitioning.
A novel approach is presented in this paper to detect
communities. This approach can detect multiple communities
simultaneously. The idea of this approach has come from
agglomerative approaches. Agglomerative approach tries to
collect similar nodes in a community. It starts with all vertices
disconnected
and then joins them based on a similarity
criterion. So a measurement is needed. Here local modularity
Finding a method to group nodes in the network in which
modularity maximizes is believed to be NP-hard. So recent
methods try to approximate it. They try to achieve a heuristic
search to detect communities in the network. Different heuristic
methods are available: simulated annealing, genetic algorithm,
greedy approaches and so forth. Here a simulated annealing
method is combined with our agglomerative method to get to
475
476
483
often in the middle of a community or between communities. If
a high-degree node is in the middle of community, it is selected
by this function. But if the high-degree node is in the
boundaries of communities, next criterions are solutions for
handling this problem. First condition tries to select a highdegree node.
plays the role of similarity measurement. The detail
explanation comes in next section.
IV.
THE PROPOSED APPROACH
The speed of the algorithm is important point in the large
networks like social networks. So a parallel algorithm would be
a proper solution in this way. Here we propose a novel parallel
method that can detects communities. Then we optimize the
parameters of our proposed method.
We explain our method in four principle stages. At the first
stage, it finds some primitive nodes and it assumes each of
them as a community. It means every one of these nodes
belongs to one community. Second and third stages are done
together. In these second stage communities are extended and
in the third one, some nodes are exchanged between detected
communities.
Principle stages of the proposed method are:
a)
Figure 1. The pseudo code of function create primary communities().
System creates some primitive communities.
Condition ii checks the single-node communities aren’t
connected to each other directly. If these primitive nodes have
direct link, maybe they are in the same community. This
condition makes us sure they aren’t in the same community.
b)
System extends the primitive communities
and it maybe adds new communities.
c)
System exchanges some nodes between
existing communities according to simulated annealing
approach.
Condition iii helps to condition ii for choosing nodes that
are far from each other and they aren’t in the same community
certainly. The parameter threshold1 in Fig. 1 is a kind of local
modularity. Local modularity of some nodes -those aren’t
directly connected to each other and they are far enough- are
negative values when they are in the same communities. We
assign a value between -1 and 0. When local modularity of
node i in community c is near to -1, the node i isn’t in
community c. In later stages, we optimize this parameter. It can
be a few possibilities that the chosen nodes meet all three
conditions but they are in the same community. It doesn’t
matter at all. In the next stages we check doubtful nodes to find
their proper communities.
d)
System optimizes the parameters of the
method according to genetic algorithm approach.
As it is expressed stages b and c executes simultaneously.
First we explain stage a in the next subsection.
A. Creates some primitive communities
The process of creating new communities is presented in
Fig. 1. Before running the function of Fig. 1, the system must
create first community. The first community is a single-node
community which possesses a node with maximum degree
among all nodes of network. The first single-node community
helps the function of Fig. 1 to find other single-node
communities or primitive communities.
The output of the function in Fig. 1 is a set of communities.
Each community in the set communities has one node, in other
words all communities of set communities are single-node.
This output is the start point of next stage. The communities
are extended during the process of next stage. We explain the
two later stages in next subsection.
The task of this function is to find some nodes to make
primitive communities. If these primitive communities aren’t
proper one, the system adds new communities or deletes
improper communities during the second and third stages.
B. Extending communities
The parallel part of the proposed method starts here. Each
community in the set communities begins to collect similar
nodes individually. It means each community runs one thread
to collect nodes. To find similar nodes of each community, a
measure is required. Local modularity plays the role of this
measure. The approach joins new nodes to each community
based on local modularity. It finds the node that has the
maximum local modularity when it is added to the community.
Multiple threads run to extract communities of the network.
Each thread finds one community-members and finishes when
all nodes of that community are detected.
The function create_primitive_communities() deducts three
conditions for finding primitive single-node communities.
These conditions are:
i. The selected node must have the high degree.
ii. The selected node must not have a directed link with
other single-node communities.
iii. The local modularity of selected node must be less
than threshold1 while it considers as a member of each
community in set communities.
To select an important node, we use degree measure. A
node with higher degree has more connections with other
nodes so it is a social and important node. These nodes are
The process of extending communities is showed in Fig. 2.
At each iteration of inner loop, all adjacent nodes of
476
477
484
community c are added to this community one by one and the
local modularity of them is calculated. Then the node with
maximum local modularity is candidate to join to community c.
Later another parameter is checked, because a parameter is
required to stop extending communities. This parameter is
called threshold2 which is a kind of local modularity. We
assign positive value -between 0 and 1- to threshold2. Local
modularity of candidate node is checked to be more than
threshold2. If this node passes this condition, it will join to
community c. threshold2 is another parameter to be optimized
in next stages. The advantage of this part of the method is that
it doesn’t search all nodes of the network, but just the nodes in
the adjacencies of community c.
The other advantage is that it isn’t invariant the number of
communities previously. It adds new communities or removes
communities dynamically during the execution of the method.
So when no community can be extended, the process of
creating a new community starts. It selects a node with
maximum degree among the nodes which aren’t assigned to
any community. Then it locates this new single-node
community to the set communities and starts over to add new
nodes to communities. The entire process will finish after all
nodes find their proper community.
There is a variable in the function of Fig. 2 is named
LM_Table. This variable is a container for holding local
modularities of covered nodes. When a node joins to a
community, its local modularity is saved to the LM_Table.
Then we apply LM_Table in the next stage of the proposed
method.
As it is expressed in the third stage of the proposed method,
we exchange nodes between communities to achieve to better
modularity. This exchange process is done using simulated
annealing (SA).
Figure 2. Parallel community detection pseudo code.
Next part explains the third stage of the proposed method
more.
The SA process helps the method to exchange nodes
between communities during the process of making new
communities. Maybe some nodes aren’t placed in their proper
communities. So exchanging nodes between communities will
help them to find their own communities.
C. Exchanging nodes according to simulated annealing
For a while, the process of extending the communities is
kept on. Later system starts to exchange nodes between
communities. Simulated annealing executes here.
Simulated annealing (SA) is a popular heuristic search. It
usually uses an exponential function as a probability function
to optimize a method. The principle feature of simulated
annealing is that it provides a means to escape local optima. In
our method we use SA in two parts. As Fig. 2 shows the SA
technique is applied to exchange some nodes between
communities during the function exchange_nodes(). The
exponential function of SA in our method has two input
parameter delta and T0. Delta is used to apply the changes of
the modularity in the SA function. The subtraction of
pre_sum_local_modularities and sum_local_modularities are
assigned to delta variable and determines the SA function.
These two variables aggregate the local modularities of nodes
to be applied to delta. T0 is a parameter in SA is names initial
temperature. The value of T0 is set to 1000.
Figure 3. Exchange_nodes() function pseudo code.
The function of exchanging node between communities is
showed in Fig. 3. Simulated Annealing is used in this function
477
478
485
too. LM_table is used in his function. The method finds node i
with minimum local modularity according to LM_Table or
chooses a random node and checks if it belongs to any other
communities. If the local modularity of node i in other
communities is more than its previous community, it will be
moved to the other community.
mutation is to change the values of some genes randomly. The
fitness function is modularity. The result of the entire method is
brought here.
V.
TEST ON SAMPLE GRAPH
For testing the approach we make a simple graph
containing 11 nodes and 13 edges. This graph has three
communities. At the stage of creating primitive communities,
the algorithm finds three nodes which one belongs to a distinct
community. Then the approach finds other nodes of
communities. The communities are detected by the approach
absolutely. It completely could find all the communities in this
graph. The modularity of this graph is Q = 0.429 which is the
maximum modularity of this graph.
According to SA technique sometimes our proposed
method exchanges random node instead of the minimum-localmodularity node. It helps the method not to get stuck in local
maximum. During the process of exchanging a node between
communities, the local modularity of some other nodes will
change. So the method change_local_modularity( i , c ) shows
in Fig. 3 will do this task. As you see in Fig. 3 when a node
moves to another community, local modularities of newcommunity members and previous-community members will
change. Fig. 4 shows the process of this function in detail. Next
part explains the details more.
VI.
TEST ON REAL NETWORKS
We run the approach on 5 variant datasets. First dataset is
well-known Zachary karate club[11], [12]. Here we use an
unweighted version of this network. This network has 34 nodes
and 78 edges. Fig. 5 presents the graph of this network.
D. Local modularity changes
When node i in community c moves to community d, the
local modularities of the nodes in both community c and d will
change. Fig. 3 shows the function that changes the local
modularities.
E. Optimizing the parameters accoding to Genetic Algorithm
We have two parameters which require optimization:
threshold1 and threshold2. Genetic Algorithm technique is
applied to optimize these parameters.
A genetic algorithm (GA) is a heuristic search that mimics
the process of natural evolution. It was formally introduced in
the United States in the 1970s by John Holland at University of
Michigan and it has been studied well, experimented and
applied in many fields in engineering worlds. When there is a
large area of solutions to search, GA helps to find the best
solution as soon as possible.
Figure 5. Zachary karate club network.
Different methods find different communities for this
network. Here we find 4 communities and modularity Q =
0.4197. This modularity is maximum modularity obtained for
this network. The primitive nodes obtained by the function are
1, 17, 25 and 34. Then the parallel community detection
function finds other nodes of communities as Fig. 5 shows.
Dendrogram is obtained for karate club is showed in Fig. 6. As
you see 4 different communities are obvious. Each community
is detected by one thread.
Figure 4. Change local modularity of nodes.
Figure 6. Dendogram obtained for Zachary karate club.
In this paper, we use GA to optimize our method
parameters. First the population is made. An individual of the
population consists of two genes: 1 and 2. 1 is threshold1
which is assigned a random value between -1 and 0 and 2 is
threshold2 which has a random value between 0 and 1. One
point cross over operates to change the individuals and the
The only paper has used local modularity is paper[10]. This
paper used extremal optimization (EO). We compare our
approach with their method for karate club dataset. The
modularity is calculated by EO is Q = 0.4188 while our
478
479
486
current state of the art methods. The time-cost of this method
is O(n2).
It is also possible to extend or improve the proposed
method. We hope to generalize the approach to handle
both weighted and directed graphs. Finally, the new
methods try to improve the speed of community detection,
because new networks have huge sizes. So we have a plan to
develop algorithms with even better performance to detect
communities.
approach achieved Q = 0.4197. The order of this approach is
O(n2), while EO got to result by the order O(n2log n).
We test our approach on four other datasets. The Jazz
musician network[13], C.elegance metabolic network[14], a
university Email network[15], a network of the users of pretty
good privacy(PGP)[16] are the tested datasets. These datasets
have different number of nodes and we test our approach in
different scale networks. The result of running the approach in
different datasets are presented in Table I. As you see the size
of networks are growing. We compare our method with four
other methods. Parallel Community Detection Using Local
Modularity (PCDULM) stands for proposed method. Fast
algorithm of Newman (N)[17] and CNM algorithm[3] are
different kinds of agglomerative approaches. Extremal
optimization proposed by Duch and Arenas (DA)[10] and the
pioneering algorithm of Girvan and Newman (GN)[9] are two
different divisive approaches to find community structure.
REFERENCES
[1]
[2]
[3]
As you see the proposed method has good results in
different datasets in comparison with different methods. The
advantage of the method is that it detects multiple communities
simultaneously while other methods aren’t parallel. Also the
order of the proposed method is not higher that other
community detection method.
[4]
[5]
[6]
TABLE I.
THE RESULT OF RUNNING THE APPROACH IN DIFFERENT
DATASETS.
[7]
[8]
[9]
[10]
VII.
CONCLUSIONS
[11]
In this paper we present a new parallel agglomerative
method for detecting communities in different types of
networks. We used local modularity as a similarity
measurement to join similar nodes in one community. No
knowledge is required about the number of communities and
the structure of the network before running the proposed
method. Our method detects multiple communities
simultaneously. So it has a good effect on the speed of the
method. The method can add new nodes to a community and
move some nodes from that community to the others
simultaneously. Simulated annealing technique is used in the
process of moving nodes to different communities. The
proposed method is named Parallel Community Detection
Using Local Modularity (PCDULM). The method is evaluated
by modularity measure. It is tested under some famous realworld networks and offered good results compared with
[12]
[13]
[14]
[15]
[16]
[17]
479
480
487
Z. Shi, Y. Liu, and J. Liang, “PSO-Based Community Detection in
Complex Networks,” in Knowledge Acquisition and Modeling, 2009.
KAM’09. Second International Symposium on, 2009, vol. 3, p. 114–
119.
M. E. J. Newman, “Detecting community structure in networks,” The
European Physical Journal B-Condensed Matter and Complex
Systems, vol. 38, no. 2, p. 321–330, 2004.
A. Clauset, M. E. J. Newman, and C. Moore, “Finding community
structure in very large networks,” Physical Review E, vol. 70, no. 6, p.
66111, 2004.
M. E. J. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Physical review E, vol. 69, no. 2, p. 26113,
2004.
C. Shi, Y. Wang, B. Wu, and C. Zhong, “A New Genetic Algorithm
for Community Detection,” Complex Sciences, p. 1298–1309, 2009.
C. Pizzuti, “Ga-net: A genetic algorithm for community detection in
social networks,” Parallel Problem Solving from Nature–PPSN X, p.
1081–1090, 2008.
R. Guimera and L. A. N. Amaral, “Functional cartography of
complex metabolic networks,” Nature, vol. 433, no. 7028, p. 895–900,
2005.
M. E. J. Newman, “Finding community structure in networks using
the eigenvectors of matrices,” Physical Review E, vol. 74, no. 3, p.
36104, 2006.
M. Girvan and M. E. J. Newman, “Community structure in social and
biological networks,” Proceedings of the National Academy of
Sciences of the United States of America, vol. 99, no. 12, p. 7821,
2002.
J. Duch and A. Arenas, “Community detection in complex networks
using extremal optimization,” Physical Review E, vol. 72, no. 2, p.
27104, 2005.
M. E. J. Newman and M. Girvan, “Community structure in social and
biological networks,” Proceedings of the National Academy of
Sciences, vol. 99, no. 12, p. 7821–7826, 2002.
W. W. Zachary, “An information flow model for conflict and fission
in small groups,” Journal of Anthropological Research, vol. 33, no. 4,
p. 452–473, 1977.
P. Gleiser and L. Danon, “Community structure in jazz,” Arxiv
preprint cond-mat/0307434, 2003.
H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabási,
“The large-scale organization of metabolic networks,” Nature, vol.
407, no. 6804, p. 651–654, 2000.
R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas,
“Self-similar community structure in a network of human
interactions,” Physical Review E, vol. 68, no. 6, p. 65103, 2003.
X. Guardiola, R. Guimera, A. Arenas, A. Diaz-Guilera, D. Streib, and
L. A. N. Amaral, “Macro-and micro-structure of trust networks,”
Arxiv preprint cond-mat/0206240, 2002.
M. E. J. Newman, “Fast algorithm for detecting community structure
in very large networks,” Phys Rev E, vol. 69, 2004.