2011 2011 Ninth IFIPIEEE/IFIP Ninth International International Conference Conference on on Embedded Embedded andand Ubiquitous Ubiquitous Computing Computing Finding Community Structure in Complex Networks Using Parallel Approach Zahra Masdarolomoor Reza Azmi Department of Computer Engineering, Alzahra University Tehran, Iran [email protected] Department of Computer Engineering, Alzahra University Tehran, Iran [email protected] Sadegh Aliakbary Department of Computer Engineering, Sharif University Tehran, Iran [email protected] algorithms[5], [6] and simulated annealing[7] are some examples. More explanations about existing methods are brought in next section. Abstract— Network analysis is an important term in different scientific areas and finding the structure of communities is a significant challenge in network analysis. A group of vertices with high intra-connection and sparse inter-connection is called community. In this paper, we propose a novel method for community detection in networks, which works better in time and precision compared to similar methods. The proposed method is able to detect communities of a wide variety of networks with different properties. This method is an agglomerative parallel algorithm. Also it can find multiple communities and exchange the nodes between detected communities simultaneously. It has utilized local modularity for constructing the communities. After all, genetic algorithm is used to optimize the parameters of the proposed method. The algorithm is evaluated by modularity metric and shows a noticeable good precision. Also it has used simulated annealing to maximize the modularity. II. INTRODUCTION Analysis of network structure is an interesting point for scientists in different areas such as computer and physics in recent years. Many systems can be presented as networks. Collaboration networks, Internet, World-Wide-Web, biological networks and social networks are just some examples [1-4]. A network has two important components: vertices and edges. Vertices are set of nodes in the graph, representing entities like people/organizations in social networks or computers/routers in the Internet. These nodes are connected by links or edges, representing connection between people or data. One of the special interests in social network analysis is finding community structure. Community is a group of vertices that are tightly connected to each other and loosely connected with other nodes. Community detection is the process of network partitioning into similar groups or clusters. Community detection has many applications including realization of the network structure, detecting communities of special interest (such as terrorists), graph visualization, improving search engines, etc. The problem of finding network communities has been studied more in recent years. Spectral partitioning[1], [2], divisive and agglomerative approaches[3], [4], evolutionary 978-0-7695-4552-3/11 $26.00 © 2011 IEEE DOI 10.1109/EUC.2011.37 RELATED WORKS Many methods are proposed to detect communities in networks recent years. Spectral methods are based on the analysis of the eigenvectors of matrices derived from the networks. The quantity measured corresponds to the eigenvalues of matrices associated with the adjacency matrix. These methods have been discussed in a survey by Newman [8]. Divisive approaches try to find the edges between communities to omit them. After the edges between communities have deleted, the communities remain. The pioneer idea in community detection using this approach is Girvan-Newman (GN) algorithm [9]. GN is a divisive method which uses edge betweenness centrality as a metric to identify the boundaries of communities. This metric detects the edges between communities by counting the number of shortest paths between two particular nodes that passes through a special edge or node. This approach is successful for many networks such as email messages, human and animal social networks. But the cost of the algorithm is unsatisfactory: O(m2n) on a network with m edges and n nodes or O(n3) in a sparse graph (one in which m ~ n). So it fails on networks with more than a few thousands nodes. On the other hand, Agglomerative methods start with all nodes disconnected and then apply some similarity measurement to progressively join them and obtain to communities. Divisive algorithms usually offer a good precision (according to modularity measurement) but an unsatisfactory performance: Time complexity of divisive algorithms is usually unsatisfactory and they fail in large networks. In contrast agglomerative approaches can achieve to good results in reasonable time. So we try to find a new agglomerative method to community detection with better performance and time complexity. After that, Girvan and Newman proposed a new method based on a quantity called modularity [4]. Modularity measure (Q) is used to evaluate community-detection methods. Keywords- community detection; parallel; genetic algorithm; local modularity; modularity; agglomerative; simulated annealing; I. Nooshin Riahi Department of Computer Engineering, Alzahra University Tehran, Iran [email protected] 482 475 474 better community structure and also a genetic algorithm technique is applied to optimize parameters of the proposed method. Modularity is a real number (-1<Q<+1) while higher modularity shows better community detection quality. Among the current methods, extremal optimization [10] is practically successful. Extremal optimization (EO) uses the heuristic search for optimizing the value of the modularity Q. The EO defines a new equation based on modularity to partition the network. This new equation -called local modularity- represents the contribution of individual vertex i to modularity Q. The EO is a divisive method with time complexity of O(n2logn). It divides the whole network in to random community and exchanges the nodes between communities by using local modularity. It has a good precise but it can be faster. Here we propose a new agglomerative method to parallel community detection using local modularity. The proposed method can detect a number of communities simultaneously. It uses simulated annealing to achieve to better modularity. III. B. Local Modularity A new equation is extracted from modularity – called local modularity – which functions in vertices instead of communities [10]. Local modularity expresses the contribution of individual vertex i to the modularity Q. The local modularity to each vertex i is given by (4) If ci is the community of vertex i, is the number of edges that vertex i belonging to community ci have with vertices in the same community. Also ki=j Aij is the degree of vertex i and Aij is the adjacency matrix of the network, and APPLIED METRICS A. Modularity We evaluate our approach by modularity Q. So first we explain modularity Q: (5) In (5) variable j is a node and ci is the community of node i. kj is the degree of node j and M is the total number of edges in the network. So agglomerate the degree of the nodes inside community ci and divide it by 2M. So we can say nearly shows the portion of a community in the entire network. (1) that i is a detected community. eii is a fraction of edges that falls within community i. to explain ai first look at (2). Local modularity is a great function to detect communities progressively in the network. The most important feature of local modularity is that after all communities are detected, the summation of qi over all nodes in the network can achieve to modularity. (2) (6) In (2) i and j are community indexes. The summation of two parts of (2) is achieved to ai. So we try to find a novel idea to collect nodes with higher values of local modularity which can lead us to higher modularity and makes modularity maximize. (3) Local modularity has two input parameter: The node i and the community of node i (ci). The output is a float number explains the amount of dependency of the node i to community ci. Actually qi of the nodes in the boundaries of communities are small values (less than 1) and it is helpful to test these nodes more to find their proper communities. So local modularity is a good metric as a similarity measurement in our agglomerative hierarchical approach. ai is the fraction of all ends of edges that are attached to vertices in community i. Properties of modularity are that Q = (-1 , 1) and the values close to 1 indicates good community detection. If Q = 0, it shows random graph or all graph in one community. If Q is close to -1, it means each vertex is in one community or no particular community structure is detected. Q more than 0.3 shows good partitioning. A novel approach is presented in this paper to detect communities. This approach can detect multiple communities simultaneously. The idea of this approach has come from agglomerative approaches. Agglomerative approach tries to collect similar nodes in a community. It starts with all vertices disconnected and then joins them based on a similarity criterion. So a measurement is needed. Here local modularity Finding a method to group nodes in the network in which modularity maximizes is believed to be NP-hard. So recent methods try to approximate it. They try to achieve a heuristic search to detect communities in the network. Different heuristic methods are available: simulated annealing, genetic algorithm, greedy approaches and so forth. Here a simulated annealing method is combined with our agglomerative method to get to 475 476 483 often in the middle of a community or between communities. If a high-degree node is in the middle of community, it is selected by this function. But if the high-degree node is in the boundaries of communities, next criterions are solutions for handling this problem. First condition tries to select a highdegree node. plays the role of similarity measurement. The detail explanation comes in next section. IV. THE PROPOSED APPROACH The speed of the algorithm is important point in the large networks like social networks. So a parallel algorithm would be a proper solution in this way. Here we propose a novel parallel method that can detects communities. Then we optimize the parameters of our proposed method. We explain our method in four principle stages. At the first stage, it finds some primitive nodes and it assumes each of them as a community. It means every one of these nodes belongs to one community. Second and third stages are done together. In these second stage communities are extended and in the third one, some nodes are exchanged between detected communities. Principle stages of the proposed method are: a) Figure 1. The pseudo code of function create primary communities(). System creates some primitive communities. Condition ii checks the single-node communities aren’t connected to each other directly. If these primitive nodes have direct link, maybe they are in the same community. This condition makes us sure they aren’t in the same community. b) System extends the primitive communities and it maybe adds new communities. c) System exchanges some nodes between existing communities according to simulated annealing approach. Condition iii helps to condition ii for choosing nodes that are far from each other and they aren’t in the same community certainly. The parameter threshold1 in Fig. 1 is a kind of local modularity. Local modularity of some nodes -those aren’t directly connected to each other and they are far enough- are negative values when they are in the same communities. We assign a value between -1 and 0. When local modularity of node i in community c is near to -1, the node i isn’t in community c. In later stages, we optimize this parameter. It can be a few possibilities that the chosen nodes meet all three conditions but they are in the same community. It doesn’t matter at all. In the next stages we check doubtful nodes to find their proper communities. d) System optimizes the parameters of the method according to genetic algorithm approach. As it is expressed stages b and c executes simultaneously. First we explain stage a in the next subsection. A. Creates some primitive communities The process of creating new communities is presented in Fig. 1. Before running the function of Fig. 1, the system must create first community. The first community is a single-node community which possesses a node with maximum degree among all nodes of network. The first single-node community helps the function of Fig. 1 to find other single-node communities or primitive communities. The output of the function in Fig. 1 is a set of communities. Each community in the set communities has one node, in other words all communities of set communities are single-node. This output is the start point of next stage. The communities are extended during the process of next stage. We explain the two later stages in next subsection. The task of this function is to find some nodes to make primitive communities. If these primitive communities aren’t proper one, the system adds new communities or deletes improper communities during the second and third stages. B. Extending communities The parallel part of the proposed method starts here. Each community in the set communities begins to collect similar nodes individually. It means each community runs one thread to collect nodes. To find similar nodes of each community, a measure is required. Local modularity plays the role of this measure. The approach joins new nodes to each community based on local modularity. It finds the node that has the maximum local modularity when it is added to the community. Multiple threads run to extract communities of the network. Each thread finds one community-members and finishes when all nodes of that community are detected. The function create_primitive_communities() deducts three conditions for finding primitive single-node communities. These conditions are: i. The selected node must have the high degree. ii. The selected node must not have a directed link with other single-node communities. iii. The local modularity of selected node must be less than threshold1 while it considers as a member of each community in set communities. To select an important node, we use degree measure. A node with higher degree has more connections with other nodes so it is a social and important node. These nodes are The process of extending communities is showed in Fig. 2. At each iteration of inner loop, all adjacent nodes of 476 477 484 community c are added to this community one by one and the local modularity of them is calculated. Then the node with maximum local modularity is candidate to join to community c. Later another parameter is checked, because a parameter is required to stop extending communities. This parameter is called threshold2 which is a kind of local modularity. We assign positive value -between 0 and 1- to threshold2. Local modularity of candidate node is checked to be more than threshold2. If this node passes this condition, it will join to community c. threshold2 is another parameter to be optimized in next stages. The advantage of this part of the method is that it doesn’t search all nodes of the network, but just the nodes in the adjacencies of community c. The other advantage is that it isn’t invariant the number of communities previously. It adds new communities or removes communities dynamically during the execution of the method. So when no community can be extended, the process of creating a new community starts. It selects a node with maximum degree among the nodes which aren’t assigned to any community. Then it locates this new single-node community to the set communities and starts over to add new nodes to communities. The entire process will finish after all nodes find their proper community. There is a variable in the function of Fig. 2 is named LM_Table. This variable is a container for holding local modularities of covered nodes. When a node joins to a community, its local modularity is saved to the LM_Table. Then we apply LM_Table in the next stage of the proposed method. As it is expressed in the third stage of the proposed method, we exchange nodes between communities to achieve to better modularity. This exchange process is done using simulated annealing (SA). Figure 2. Parallel community detection pseudo code. Next part explains the third stage of the proposed method more. The SA process helps the method to exchange nodes between communities during the process of making new communities. Maybe some nodes aren’t placed in their proper communities. So exchanging nodes between communities will help them to find their own communities. C. Exchanging nodes according to simulated annealing For a while, the process of extending the communities is kept on. Later system starts to exchange nodes between communities. Simulated annealing executes here. Simulated annealing (SA) is a popular heuristic search. It usually uses an exponential function as a probability function to optimize a method. The principle feature of simulated annealing is that it provides a means to escape local optima. In our method we use SA in two parts. As Fig. 2 shows the SA technique is applied to exchange some nodes between communities during the function exchange_nodes(). The exponential function of SA in our method has two input parameter delta and T0. Delta is used to apply the changes of the modularity in the SA function. The subtraction of pre_sum_local_modularities and sum_local_modularities are assigned to delta variable and determines the SA function. These two variables aggregate the local modularities of nodes to be applied to delta. T0 is a parameter in SA is names initial temperature. The value of T0 is set to 1000. Figure 3. Exchange_nodes() function pseudo code. The function of exchanging node between communities is showed in Fig. 3. Simulated Annealing is used in this function 477 478 485 too. LM_table is used in his function. The method finds node i with minimum local modularity according to LM_Table or chooses a random node and checks if it belongs to any other communities. If the local modularity of node i in other communities is more than its previous community, it will be moved to the other community. mutation is to change the values of some genes randomly. The fitness function is modularity. The result of the entire method is brought here. V. TEST ON SAMPLE GRAPH For testing the approach we make a simple graph containing 11 nodes and 13 edges. This graph has three communities. At the stage of creating primitive communities, the algorithm finds three nodes which one belongs to a distinct community. Then the approach finds other nodes of communities. The communities are detected by the approach absolutely. It completely could find all the communities in this graph. The modularity of this graph is Q = 0.429 which is the maximum modularity of this graph. According to SA technique sometimes our proposed method exchanges random node instead of the minimum-localmodularity node. It helps the method not to get stuck in local maximum. During the process of exchanging a node between communities, the local modularity of some other nodes will change. So the method change_local_modularity( i , c ) shows in Fig. 3 will do this task. As you see in Fig. 3 when a node moves to another community, local modularities of newcommunity members and previous-community members will change. Fig. 4 shows the process of this function in detail. Next part explains the details more. VI. TEST ON REAL NETWORKS We run the approach on 5 variant datasets. First dataset is well-known Zachary karate club[11], [12]. Here we use an unweighted version of this network. This network has 34 nodes and 78 edges. Fig. 5 presents the graph of this network. D. Local modularity changes When node i in community c moves to community d, the local modularities of the nodes in both community c and d will change. Fig. 3 shows the function that changes the local modularities. E. Optimizing the parameters accoding to Genetic Algorithm We have two parameters which require optimization: threshold1 and threshold2. Genetic Algorithm technique is applied to optimize these parameters. A genetic algorithm (GA) is a heuristic search that mimics the process of natural evolution. It was formally introduced in the United States in the 1970s by John Holland at University of Michigan and it has been studied well, experimented and applied in many fields in engineering worlds. When there is a large area of solutions to search, GA helps to find the best solution as soon as possible. Figure 5. Zachary karate club network. Different methods find different communities for this network. Here we find 4 communities and modularity Q = 0.4197. This modularity is maximum modularity obtained for this network. The primitive nodes obtained by the function are 1, 17, 25 and 34. Then the parallel community detection function finds other nodes of communities as Fig. 5 shows. Dendrogram is obtained for karate club is showed in Fig. 6. As you see 4 different communities are obvious. Each community is detected by one thread. Figure 4. Change local modularity of nodes. Figure 6. Dendogram obtained for Zachary karate club. In this paper, we use GA to optimize our method parameters. First the population is made. An individual of the population consists of two genes: 1 and 2. 1 is threshold1 which is assigned a random value between -1 and 0 and 2 is threshold2 which has a random value between 0 and 1. One point cross over operates to change the individuals and the The only paper has used local modularity is paper[10]. This paper used extremal optimization (EO). We compare our approach with their method for karate club dataset. The modularity is calculated by EO is Q = 0.4188 while our 478 479 486 current state of the art methods. The time-cost of this method is O(n2). It is also possible to extend or improve the proposed method. We hope to generalize the approach to handle both weighted and directed graphs. Finally, the new methods try to improve the speed of community detection, because new networks have huge sizes. So we have a plan to develop algorithms with even better performance to detect communities. approach achieved Q = 0.4197. The order of this approach is O(n2), while EO got to result by the order O(n2log n). We test our approach on four other datasets. The Jazz musician network[13], C.elegance metabolic network[14], a university Email network[15], a network of the users of pretty good privacy(PGP)[16] are the tested datasets. These datasets have different number of nodes and we test our approach in different scale networks. The result of running the approach in different datasets are presented in Table I. As you see the size of networks are growing. We compare our method with four other methods. Parallel Community Detection Using Local Modularity (PCDULM) stands for proposed method. Fast algorithm of Newman (N)[17] and CNM algorithm[3] are different kinds of agglomerative approaches. Extremal optimization proposed by Duch and Arenas (DA)[10] and the pioneering algorithm of Girvan and Newman (GN)[9] are two different divisive approaches to find community structure. REFERENCES [1] [2] [3] As you see the proposed method has good results in different datasets in comparison with different methods. The advantage of the method is that it detects multiple communities simultaneously while other methods aren’t parallel. Also the order of the proposed method is not higher that other community detection method. [4] [5] [6] TABLE I. THE RESULT OF RUNNING THE APPROACH IN DIFFERENT DATASETS. [7] [8] [9] [10] VII. CONCLUSIONS [11] In this paper we present a new parallel agglomerative method for detecting communities in different types of networks. We used local modularity as a similarity measurement to join similar nodes in one community. No knowledge is required about the number of communities and the structure of the network before running the proposed method. Our method detects multiple communities simultaneously. So it has a good effect on the speed of the method. The method can add new nodes to a community and move some nodes from that community to the others simultaneously. Simulated annealing technique is used in the process of moving nodes to different communities. The proposed method is named Parallel Community Detection Using Local Modularity (PCDULM). The method is evaluated by modularity measure. It is tested under some famous realworld networks and offered good results compared with [12] [13] [14] [15] [16] [17] 479 480 487 Z. Shi, Y. Liu, and J. Liang, “PSO-Based Community Detection in Complex Networks,” in Knowledge Acquisition and Modeling, 2009. KAM’09. Second International Symposium on, 2009, vol. 3, p. 114– 119. M. E. J. Newman, “Detecting community structure in networks,” The European Physical Journal B-Condensed Matter and Complex Systems, vol. 38, no. 2, p. 321–330, 2004. A. Clauset, M. E. J. Newman, and C. Moore, “Finding community structure in very large networks,” Physical Review E, vol. 70, no. 6, p. 66111, 2004. M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 26113, 2004. C. Shi, Y. Wang, B. Wu, and C. Zhong, “A New Genetic Algorithm for Community Detection,” Complex Sciences, p. 1298–1309, 2009. C. Pizzuti, “Ga-net: A genetic algorithm for community detection in social networks,” Parallel Problem Solving from Nature–PPSN X, p. 1081–1090, 2008. R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks,” Nature, vol. 433, no. 7028, p. 895–900, 2005. M. E. J. Newman, “Finding community structure in networks using the eigenvectors of matrices,” Physical Review E, vol. 74, no. 3, p. 36104, 2006. M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 12, p. 7821, 2002. J. Duch and A. Arenas, “Community detection in complex networks using extremal optimization,” Physical Review E, vol. 72, no. 2, p. 27104, 2005. M. E. J. Newman and M. Girvan, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences, vol. 99, no. 12, p. 7821–7826, 2002. W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of Anthropological Research, vol. 33, no. 4, p. 452–473, 1977. P. Gleiser and L. Danon, “Community structure in jazz,” Arxiv preprint cond-mat/0307434, 2003. H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabási, “The large-scale organization of metabolic networks,” Nature, vol. 407, no. 6804, p. 651–654, 2000. R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas, “Self-similar community structure in a network of human interactions,” Physical Review E, vol. 68, no. 6, p. 65103, 2003. X. Guardiola, R. Guimera, A. Arenas, A. Diaz-Guilera, D. Streib, and L. A. N. Amaral, “Macro-and micro-structure of trust networks,” Arxiv preprint cond-mat/0206240, 2002. M. E. J. Newman, “Fast algorithm for detecting community structure in very large networks,” Phys Rev E, vol. 69, 2004.
© Copyright 2026 Paperzz