Finding dense 2-clubs in an undirected graph Chen-Ying Lin1、Yu-Fen Zeng2 Department of Computer Science Information Engineering, Shu-Te University 1 [email protected]、[email protected] algorithm [6], which is using the concept of minimum vertex cover to find 2-clubs. In addition to the above algorithms, Hartung et al. [24] observed an unexpected behavior on finding k-clubs. That is, a large k-club is usually just a star graph formed by the maximum degree vertex with its adjacent vertices. This lead them question about the meaning of this structure. Even they adjust k from two up to six, the situation still remains. Hartung’s research gives us the idea of “Finding dense k-clubs”. But it is still not convinced us that a dense k-club is meaningful. After all, their experiment result shows that a large k-club is usually looked like a star graph. The last inspiration comes from the movie Suffragette [22]. The movie described some women fighting for the right to vote. We check the relationship between the suffragettes in the perspective of social network analysis. It is easy to find out that the leader of suffragettes, Emmeline Pankhurst(played by Meryl Streep), will have many edges. Because she is the spiritual leader of women’s suffrage movement. She is known by anyone who agrees on this movement — even they may not participate in this movement. For example, she has many followers in Facebook or Twitter nowadays. This result conforms to Hartung’s experiment. Therefore, she might be the center of a 2club. But if the purpose is to find out people who are actually planning or taking action, just like the British police trying to find out the rebels, the highly innerconnection vertex set may be needed, which is a dense structure. The reason why a dense structure was needed is that people who actually taking action are acquainted with each other. In this movie, for instance, Muad, Edith, Violet, Hugh. We then discover that this structure is similar to the small-world [15]. The small-world notion is originated from the small-world experiment conducted by a social psychologist, Stanley Milgram [25]. Milgram tried to estimate the scale of a social network. He sends packets to randomly selected individuals in Nebraska and Kansas. The packets included the description of study purpose, the target person’s name in Boston. In most case, the subjects did not know the target. So the subjects will pass the packet to their acquaintance that they thought they might know the target. Then they could send the packet to next person until the packet reached the target, or just failed (Since some people refused to pass the packet). By the result of this experiment, Milgram found out that the average length of shortest path between people was between five and six, which was the precedent of small-world theory. Watts and Strogatz[9] proposed a random graph Abstract In social network analysis(SNA), identifying community or organizations in a network is a popular issue. In graph theory perspective, that is to find a dense structure in a graph. There are many kinds of dense structure. For instance, clique, k-clique and kclub. A 2-club can be considered as a friends-offriends group. This structure plays an important role in SNA. In spite of finding large k-club or finding kclub faster, there are experiments show that a large 2club is just a maximum degree vertex with its adjacent vertices, which is just a star graph. Therefore, we enhance its structure by introducing the concept of “small-world” to 2-club, which is a dense 2-club. In this paper, we use average clustering coefficient to evaluate whether the result 2-club of these algorithms is “small-world enough”. We also propose a twophase heuristic algorithm “TRIM” to find a dense 2club based on the heuristic algorithm DROP. The experiment results show that our heuristic algorithm can improve the structure of 2-club. Comparing the experiment results, the average clustering coefficient of TRIM is better than DROP. Keywords: Social Network, K-club, Small-world, Heuristic algorithm, Average clustering coefficient. Introduction With the rise of social networking sites (e.g., Facebook; Twitter), SNA becomes a popular scientific theory. One of its main problem is to find a meaningful vertex group, such as k-clique, k-core, kplex, and k-club[20]. Some researchers use linear programming to solve the problems [11, 13, 16, 23]. Others develop heuristic algorithms. CONSTELLATION and DROP are well-known algorithms presented by Bourjolly et al [14]. CONSTELLATION is used to find k-clubs (k > 2). It first finds a vertex with maximum degree, then the closed neighborhood of this vertex become the first star. It then finds a vertex of the first star which has most outer-neighborhoods. CONSTELLATION find k-clubs by successively adding a vertex. On the contrary, DROP is successively removing a vertex from entire graph to get the target k-club. Specific steps will be described later. Another heuristic algorithm call IDROP (Iterative DROP) is bringing by Chang et al. [17] which is modified from DROP. IDROP finds k-neighborhood (i.e., neighborhood within k steps) of vertex by iterating through all vertices of G. Then implement DROP on that kneighborhood. The largest vertex set after DROP is the result. Yang et al. derived a VCOVER heuristic 415 2016 Conference on Information Technology and Applications in Outlying Islands 第十五屆離島資訊技術與應用研討會 Step 1. Data structure initialization Compute shortest chain lengths between all vertex pairs. Step 2. Termination check For each vertex i of V, compute qi: the number of vertices of V whose shortest chain to i has length at least k + 1. If qi = 0 for every vertex i, stop: V is a k-club. Step 3. Vertex removal Let W be the set of vertices for which q i is * maximized. Determine a vertex i ∈ W with least * degree in V. Remove i and its incident edges from the graph. Step 4. Data structure update Update shortest chain lengths. (Two vertices belonging to different connected components are said to be linked by an infinite length chain.) Go to Step 2. model with small-world properties including high clustering coefficient and short average path lengths. The clustering coefficient is a measurement that can measure the cliquishness of a neighborhood. It is a proportion of the most possible links and the actually existing links between the neighborhoods of a vertex. Definitions Given a undirected graph, a clique is a vertex set in which every vertex is adjacent to each other. In social network analysis, this structure is used to denote a friend group in which all the members are each other’s friends. However, in most cases, the relationship of people in real-life is not as tight as a clique. The k-clique[19] is therefore been derived to represent a friend group. A k-clique is a vertex subset in which every vertex is linked to each other by a chain of length at most k. For example, a 5-cycle is a 2clique, since every vertex can visit any other vertices in no more than two edges. From the perspective of the social network, a group is a 2-clique if any two people in the group are either friends already or having mutual friends. Although k-clique seems to meet the structure of the friends-of-friends group, there are still problems. The shortest path between any two vertices in the k-clique may pass through the vertex that is not in the k-clique. (For example, in a 5-cycle, 4 consecutive vertices form a 2-clique.) Some study, therefore, defined the self-contained structure. A kclub is a vertex set inducing a subgraph whose diameter is at most k. Therefore, in a 5-cycle, a vertex subset formed by consecutive 4 vertices is not a 2-club. Karp [21] showed that clique problem is NP-hard. Bourjolly at el. [14] described that finding maximum k-club is NP-hard. We use average clustering coefficient to measure the tightness of a vertex group. For a vertex x with n neighbors in graph G, the local clustering coefficient of x is the ratio of the number of edges between these neighbors to n(n-1)/2. And the average clustering coefficient is the mean of the local clustering coefficient of every vertex. For example, in wheel graph W6 (a 5 cycle with an additional vertexthe central, and this additional vertex is adjacent to all vertices in the 5 cycle), the local clustering coefficient of the central vertex is 5/10. The denominators 10 is the possible links between the neighborhoods of the central vertex. The numerator 5 is the actual links between the neighborhoods of the central vertex. And every vertex in the 5 cycle has local clustering coefficient 2/3. The average clustering coefficient of W6 is 0.6389, which is the mean of the local clustering coefficient of every vertex. Algorithm TRIM Since the star graph is not a dense 2-club. We improve DROP by first removing pendant vertex (i.e., the vertex with degree one ) on the graph one by one until the resulting graph has no such vertex. Then apply DROP on the modified graph to find 2-clubs. Phase 1. 1. G = (V, E) 2. L = |{u ∈ V : degreeG (u) = 1}| 3. Remove L from G 4. Update G 5. Repeat when L is not empty Phase 2. 1. Call DROP. Investor of KMT-invested enterprises It is known that the scale of KMT party assets is large. Its structure and behind relationship attract the attention of society. Some researches about this problem have been done [1, 2, 3, 8]. Beside implementing algorithms DROP and TRIM on random graphs, we also try to figure out this problem by using graph theory. To generate “Investor of KMT-invested enterprises graph ” (IKE graph for short), we collect KMT-invested enterprises from the book [4]. We then use the open data “Company Registration in Taiwan” provided by the department of commerce to gather the chairmans, directors, and supervisors of KMT-invested enterprises. Each investor is a vertex of IKE graph. Every chairman is adjacent to any directors and supervisors in the same enterprise. A single enterprise would generate a star graph. The IKE graph is present in Figure 1. The vertex number of IKE graph is 556, edge number is 595. We implement these two algorithms on IKE graph, and the results is described in the next paragraph. Algorithm DROP Because TRIM is a heuristic algorithm modified from DROP, we describe this algorithm first. The following is extracted from Bourjolly at el[14]. Consider G = (V, E). 416 Figure1. Investor of KMT-invested enterprises Implementation and Experiments The algorithm was implemented in Python with the graph library Networkx. [5] We use the following four well-known random graphs to test our algorithm which are offered by Networkx. fast_gnp_random_graph[26] dense_gnm_random_graph[7] erdos_renyi_graph[18] newman_watts_strogatz_graph[15] Because the generations of random graphs are different, we cannot make the random graphs and the IEK graph get the same number of vertex (or edge). Yet, we let them as close as possible. The parameters of each graph are: fast_gnp_random_graph(556,0.004), den-se_gnm_random_graph(556,595), erdos_renyi_gra-ph(556,0.00-38), newman_watts_strogatz_graph(556, 2, 0.08). For the meaning of parameter, please refer to Networkx documentation. The resulted average clustering coefficients of each graph implemented by DROP and TRIM are shown in Table1, their running time is shown in Table2. In addition, We implement algorithm TRIM in IKE graph, and the result is shown in Figure 2 and Figure 3. The number of vertex (edge) in the 2-club find by DROP is 56 (65). The number of vertex (edge) in the 2-club find by TRIM is 16 (25). Figure 2. Applying TRIM in IKE graph Figure 3. Applying DROP in IKE graph 417 2016 Conference on Information Technology and Applications in Outlying Islands 第十五屆離島資訊技術與應用研討會 interactive, even though they seem to not related to each other. The resulted average clustering coefficient of TRIM is larger than DROP by 0.5602. Compare to the random graphs, the resulted 2-club in IKE graph is denser. And since there are rare pendant vertices in the newman_watts_strogatz_graph, both algorithms get almost the same performance. For the running time, since the phase one is removing leaf vertex repeatedly, the number of leaves will decide the running time. Table 1. Average clustering coefficient Average clustering coefficient of resulted 2-club DROP TRIM 0 0.009 dense_gnm_random 0.0254 0.0055 erdos_renyi 0.0149 0.0754 0 0 0.0977 0.6039 fast_gnp_random newman_watts_strogatz IKE Conclusions In this paper, we introduce the meaning of dense 2-clubs. That is, a dense 2-club may be a group whose relationship are stronger than a sparse 2-club (for example, a star graph). We also derive a heuristic algorithm TRIM to find dense 2-clubs. This algorithm first removes every pendant vertex repeatedly until no pendant vertex remains. Then apply the known DROP algorithm to find 2-club. TRIM is implemented in python with graph library Networkx. Furthermore, we apply the small-world theory to 2-club. In the future, we hope to investigate more real-life topics and accomplish the implementation. And we will try to generalize the discussion of dense k-club for k>2. Table 2. Running time Time(second) DROP TRIM fast_gnp_random 59.0835 20.4529 dense_gnm_random 87.2739 22.7619 erdos_renyi 59.1993 20.869 194.7365 194.4183 64.4435 0.2105 Reference newman_watts_strogatz IKE [1] [2] [3] [4] Every experiment runs ten times and takes averaged. The experiment is running on a Macbook Pro Retina with Intel core i5 2.6GHz CPU. The higher average clustering coefficient and shorter running time are bolded. The experiment result shows that for random graphs, the average clustering coefficient of TRIM is from 0 to 0.0754, the average clustering coefficient of DROP is from 0 to 0.0254. The worst case (the lowest average clustering coefficient) of TRIM is as same as DROP. That is 0 of newman_watts_strogatz. The best case of DROP and TRIM are separately dense_gnm_random and erdos_renyi, which is 0.0254 and 0.0754. The average clustering coefficient of TRIM is larger than DROP by 0.05. In spite of that, it is obviously that DROP is still better than TRIM in IKE when finding large 2-clubs. However, comparing to the result of DROP and TRIM, the average clustering coefficient of the former is 0.0977, the latter is 0.6039. It means that the 2-club found by TRIM is more denser than 2-club found by DROP. The IKE graph is actually an intersection of many star graphs in some extent. The result 2-club found by TRIM retains more edges against the DROP. Therefore, it is easier to find out who are practically [5] [6] [7] [8] 418 李宗榮, "在國家權力與家族主義之間:企業 控制與台灣大型企業間網絡再探," 台灣社 會學, vol. 13, pp. 173–242, 2007. [Online]. 李福鐘, "威權體制下的國民黨黨營企業," 國史館學術集刊, vol. 18, pp. 189–220, 2008. [Online]. 曾詠悌, "以黨養黨─中國國民黨黨營事業 初期發展之研究(1945~1952)," 2004. 羅承宗, 黨產解密. 臺北市: 新台灣國策智 庫有限公司, 2011. A. Hagberg, D. A. Schult, and P. J. Swart, "Exploring network structure, dynamics, and function using NetworkX," in Proceedings of the 7th Python in Science Conference (SciPy 2008), 2008. [Online]. Available: http://conference.scipy.org/proceedings/SciP y2008/paper_2/full_text.pdf. C.-P. Yang, H.-C. Chen, S.-D. Hsiea, and B. Y. Wu, "Heuristic Algorithms for finding 2clubs in an undirected graph," NCS, Taiwan, 2009. D. E. Knuth, The art of computer programming, volume 2 (3rd ed.): Seminumerical algorithms. Addison-Wesley Longman Publishing Co., 1997. [Online]. Available: http://dl.acm.org/citation.cfm?id=270146. Accessed: Feb. 26, 2016. D. Fell, "Political and media liberalization and political corruption in Taiwan," The China Quarterly, vol. 184, pp. 875–893, Dec. 2005. [9] [10] [11] [12] [13] [14] [15] [16] [17] [Online]. Available: http://journals.cambridge.org/action/displayA bstract?fromPage=online&aid=358808&fileI d=S0305741005000548#fn1. Accessed: Mar. 6, 2016. D. J. Watts and S. H. Strogatz, "Collective dynamics of ‘small-world’ networks: Abstract: Nature," Nature, vol. 393, no. 6684, pp. 440– 442, Jun. 1998. [Online]. Available: http://www.nature.com/nature/journal/v393/n 6684/abs/393440a0.html. Accessed: Mar. 6, 2016. E. N. Gilbert, "Random graphs," The Annals of Mathematical Statistics, vol. 30, no. 4, pp. 1141–1144, Dec. 1959. F. D. Carvalho and T. M. Almeida, "Upper bounds and heuristics for the 2-club problem," European Journal of Operational Research, vol. 210, no. 3, pp. 489–494, May 2011. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0377221710008015. Accessed: Feb. 24, 2016. F. Larrión, M. A. Pizaña, and R. VillarroelFlores, "On self-clique graphs with triangular cliques," in Discrete Mathematics, Elsevier, 2016, vol. 339, no. 2, pp. 457–459. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0012365X15003039. Accessed: Feb. 26, 2016. J. Pattillo, N. Youssef, and S. Butenko, "On clique relaxation models in network analysis," European Journal of Operational Research, vol. 226, no. 1, pp. 9–18, Apr. 2013. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0377221712007679. Accessed: Feb. 24, 2016. J.-M. Bourjolly, G. Laporte, and G. Pesant, "Heuristics for finding k-clubs in an undirected graph," in Computers & Operations Research, Elsevier, 2000, vol. 27, no. 6, pp. 559–569. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0305054899000477. Accessed: Feb. 22, 2016. M. E. J. Newman and D. J. Watts, "Renormalization group analysis of the smallworld network model," Physics Letters A, vol. 263, no. 4-6, pp. 341–346, Dec. 1999. M. F. Pajouh and B. Balasundaram, "On inclusionwise maximal and maximum cardinality k-clubs in graphs," in Discrete Optimization, Elsevier, 2012, vol. 9, no. 2, pp. 84–97. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S1572528612000163. Accessed: Feb. 24, 2016. M.-S. Chang, L.-J. Hung, C.-R. Lin, and P.-C. [18] [19] [20] [21] [22] [23] [24] [25] [26] Su, "Finding large k-clubs in undirected graphs," Computing, vol. 95, no. 9, pp. 739– 758, Dec. 2012. P. Erdős and A. Rényi, "On random graphs," Publicationes Mathematicae Debrecen, vol. 6, p. Publicationes Mathematicae Debrecen, Vol. 6 (1959), pp. 290–297 Key: citeulike:1007174, 1959. R. D. Luce, "Connectivity and generalized cliques in sociometric group structure," Psychometrika, vol. 15, no. 2, pp. 169–190, Jun. 1950. R. J. Mokken, "Cliques, clubs and clans," Quality & Quantity, vol. 13, no. 2, pp. 161– 173, Apr. 1979. R. M. Karp, Reducibility among Combinatorial problems in Complexity of Computer Computations. 1972, pp. 85–103. S. Gavron, "Suffragette," 2015. [Online]. Available: http://www.imdb.com/title/tt3077214/. Accessed: Mar. 6, 2016. S. Hartung, C. Komusiewicz, A. Nichterlein, and O. Suchý, "On structural parameterizations for the 2-club problem," Discrete Applied Mathematics, vol. 185, pp. 79–92, Apr. 2015. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0166218X14005265. Accessed: Feb. 24, 2016. S. Hartung, C. Komusiewicz, and A. Nichterlein, "Parameterized Algorithmics and computational experiments for finding 2Clubs," Parameterized and Exact Computation, vol. 7535, pp. 231–241, 2012. S. Milgram, "The Small World Problem," Psychology Today, pp. 61–67, May 1967. V. Batagelj and U. Brandes, "Efficient generation of large random networks," Physical Review E, vol. 71, no. 3, Mar. 2005. 419 2016 Conference on Information Technology and Applications in Outlying Islands
© Copyright 2026 Paperzz