Finding dense 2-clubs in an undirected graph

Finding dense 2-clubs in an undirected graph
Chen-Ying Lin1、Yu-Fen Zeng2
Department of Computer Science Information Engineering, Shu-Te University
1
[email protected][email protected]
algorithm [6], which is using the concept of minimum
vertex cover to find 2-clubs.
In addition to the above algorithms, Hartung et
al. [24] observed an unexpected behavior on finding
k-clubs. That is, a large k-club is usually just a star
graph formed by the maximum degree vertex with its
adjacent vertices. This lead them question about the
meaning of this structure. Even they adjust k from two
up to six, the situation still remains. Hartung’s
research gives us the idea of “Finding dense k-clubs”.
But it is still not convinced us that a dense k-club is
meaningful. After all, their experiment result shows
that a large k-club is usually looked like a star graph.
The last inspiration comes from the movie Suffragette
[22]. The movie described some women fighting for
the right to vote. We check the relationship between
the suffragettes in the perspective of social network
analysis. It is easy to find out that the leader of
suffragettes, Emmeline Pankhurst(played by Meryl
Streep), will have many edges. Because she is the
spiritual leader of women’s suffrage movement. She
is known by anyone who agrees on this movement —
even they may not participate in this movement. For
example, she has many followers in Facebook or
Twitter nowadays. This result conforms to Hartung’s
experiment. Therefore, she might be the center of a 2club. But if the purpose is to find out people who are
actually planning or taking action, just like the British
police trying to find out the rebels, the highly innerconnection vertex set may be needed, which is a dense
structure. The reason why a dense structure was
needed is that people who actually taking action are
acquainted with each other. In this movie, for instance,
Muad, Edith, Violet, Hugh. We then discover that this
structure is similar to the small-world [15].
The small-world notion is originated from the
small-world experiment conducted by a social
psychologist, Stanley Milgram [25]. Milgram tried to
estimate the scale of a social network. He sends
packets to randomly selected individuals in Nebraska
and Kansas. The packets included the description of
study purpose, the target person’s name in Boston. In
most case, the subjects did not know the target. So the
subjects will pass the packet to their acquaintance that
they thought they might know the target. Then they
could send the packet to next person until the packet
reached the target, or just failed (Since some people
refused to pass the packet). By the result of this
experiment, Milgram found out that the average length
of shortest path between people was between five and
six, which was the precedent of small-world theory.
Watts and Strogatz[9] proposed a random graph
Abstract
In social network analysis(SNA), identifying
community or organizations in a network is a popular
issue. In graph theory perspective, that is to find a
dense structure in a graph. There are many kinds of
dense structure. For instance, clique, k-clique and kclub. A 2-club can be considered as a friends-offriends group. This structure plays an important role
in SNA. In spite of finding large k-club or finding kclub faster, there are experiments show that a large 2club is just a maximum degree vertex with its adjacent
vertices, which is just a star graph. Therefore, we
enhance its structure by introducing the concept of
“small-world” to 2-club, which is a dense 2-club. In
this paper, we use average clustering coefficient to
evaluate whether the result 2-club of these algorithms
is “small-world enough”. We also propose a twophase heuristic algorithm “TRIM” to find a dense 2club based on the heuristic algorithm DROP. The
experiment results show that our heuristic algorithm
can improve the structure of 2-club. Comparing the
experiment results, the average clustering coefficient
of TRIM is better than DROP.
Keywords: Social Network, K-club, Small-world,
Heuristic algorithm, Average clustering coefficient.
Introduction
With the rise of social networking sites (e.g.,
Facebook; Twitter), SNA becomes a popular scientific
theory. One of its main problem is to find a
meaningful vertex group, such as k-clique, k-core, kplex, and k-club[20]. Some researchers use linear
programming to solve the problems [11, 13, 16, 23].
Others
develop
heuristic
algorithms.
CONSTELLATION and DROP are well-known
algorithms presented by Bourjolly et al [14].
CONSTELLATION is used to find k-clubs (k > 2). It
first finds a vertex with maximum degree, then the
closed neighborhood of this vertex become the first
star. It then finds a vertex of the first star which has
most outer-neighborhoods. CONSTELLATION find
k-clubs by successively adding a vertex. On the
contrary, DROP is successively removing a vertex
from entire graph to get the target k-club. Specific
steps will be described later. Another heuristic
algorithm call IDROP (Iterative DROP) is bringing by
Chang et al. [17] which is modified from DROP.
IDROP finds k-neighborhood (i.e., neighborhood
within k steps) of vertex by iterating through all
vertices of G. Then implement DROP on that kneighborhood. The largest vertex set after DROP is
the result. Yang et al. derived a VCOVER heuristic
415
2016 Conference on Information Technology and Applications in Outlying Islands
第十五屆離島資訊技術與應用研討會
Step 1. Data structure initialization
Compute shortest chain lengths between all vertex
pairs.
Step 2. Termination check
For each vertex i of V, compute qi: the number of
vertices of V whose shortest chain to i has length at
least k + 1. If qi = 0 for every vertex i, stop: V is a
k-club.
Step 3. Vertex removal
Let W be the set of vertices for which q i is
*
maximized. Determine a vertex i ∈ W with least
*
degree in V. Remove i and its incident edges
from the graph.
Step 4. Data structure update
Update shortest chain lengths. (Two vertices
belonging to different connected components are
said to be linked by an infinite length chain.) Go to
Step 2.
model with small-world properties including high
clustering coefficient and short average path lengths.
The clustering coefficient is a measurement that can
measure the cliquishness of a neighborhood. It is a
proportion of the most possible links and the actually
existing links between the neighborhoods of a vertex.
Definitions
Given a undirected graph, a clique is a vertex set
in which every vertex is adjacent to each other. In
social network analysis, this structure is used to denote
a friend group in which all the members are each
other’s friends. However, in most cases, the
relationship of people in real-life is not as tight as a
clique. The k-clique[19] is therefore been derived to
represent a friend group. A k-clique is a vertex subset
in which every vertex is linked to each other by a chain
of length at most k. For example, a 5-cycle is a 2clique, since every vertex can visit any other vertices
in no more than two edges. From the perspective of
the social network, a group is a 2-clique if any two
people in the group are either friends already or having
mutual friends. Although k-clique seems to meet the
structure of the friends-of-friends group, there are still
problems. The shortest path between any two vertices
in the k-clique may pass through the vertex that is not
in the k-clique. (For example, in a 5-cycle, 4
consecutive vertices form a 2-clique.) Some study,
therefore, defined the self-contained structure. A kclub is a vertex set inducing a subgraph whose
diameter is at most k. Therefore, in a 5-cycle, a vertex
subset formed by consecutive 4 vertices is not a 2-club.
Karp [21] showed that clique problem is NP-hard.
Bourjolly at el. [14] described that finding maximum
k-club is NP-hard. We use average clustering
coefficient to measure the tightness of a vertex group.
For a vertex x with n neighbors in graph G, the local
clustering coefficient of x is the ratio of the number of
edges between these neighbors to n(n-1)/2. And the
average clustering coefficient is the mean of the local
clustering coefficient of every vertex. For example, in
wheel graph W6 (a 5 cycle with an additional vertexthe central, and this additional vertex is adjacent to all
vertices in the 5 cycle), the local clustering coefficient
of the central vertex is 5/10. The denominators 10 is
the possible links between the neighborhoods of the
central vertex. The numerator 5 is the actual links
between the neighborhoods of the central vertex. And
every vertex in the 5 cycle has local clustering
coefficient 2/3. The average clustering coefficient of
W6 is 0.6389, which is the mean of the local clustering
coefficient of every vertex.
Algorithm TRIM
Since the star graph is not a dense 2-club. We
improve DROP by first removing pendant vertex (i.e.,
the vertex with degree one ) on the graph one by one
until the resulting graph has no such vertex. Then
apply DROP on the modified graph to find 2-clubs.
Phase 1.
1. G = (V, E)
2. L = |{u ∈ V : degreeG (u) = 1}|
3. Remove L from G
4. Update G
5. Repeat when L is not empty
Phase 2.
1. Call DROP.
Investor of KMT-invested enterprises
It is known that the scale of KMT party assets is
large. Its structure and behind relationship attract the
attention of society. Some researches about this
problem have been done [1, 2, 3, 8]. Beside
implementing algorithms DROP and TRIM on
random graphs, we also try to figure out this problem
by using graph theory. To generate “Investor of
KMT-invested enterprises graph ” (IKE graph for
short), we collect KMT-invested enterprises from the
book [4]. We then use the open data “Company
Registration in Taiwan” provided by the department
of commerce to gather the chairmans, directors, and
supervisors of KMT-invested enterprises. Each
investor is a vertex of IKE graph. Every chairman is
adjacent to any directors and supervisors in the same
enterprise. A single enterprise would generate a star
graph. The IKE graph is present in Figure 1. The
vertex number of IKE graph is 556, edge number is
595. We implement these two algorithms on IKE
graph, and the results is described in the next
paragraph.
Algorithm DROP
Because TRIM is a heuristic algorithm modified
from DROP, we describe this algorithm first. The
following is extracted from Bourjolly at el[14].
Consider G = (V, E).
416
Figure1. Investor of KMT-invested enterprises
Implementation and Experiments
The algorithm was implemented in Python with
the graph library Networkx. [5] We use the following
four well-known random graphs to test our algorithm
which are offered by Networkx.
 fast_gnp_random_graph[26]
 dense_gnm_random_graph[7]
 erdos_renyi_graph[18]
 newman_watts_strogatz_graph[15]
Because the generations of random graphs are
different, we cannot make the random graphs and the
IEK graph get the same number of vertex (or edge).
Yet, we let them as close as possible. The parameters
of each graph are:
 fast_gnp_random_graph(556,0.004),
 den-se_gnm_random_graph(556,595),
 erdos_renyi_gra-ph(556,0.00-38),
 newman_watts_strogatz_graph(556,
2,
0.08).
For the meaning of parameter, please refer to
Networkx documentation. The resulted average
clustering coefficients of each graph implemented by
DROP and TRIM are shown in Table1, their running
time is shown in Table2. In addition, We implement
algorithm TRIM in IKE graph, and the result is shown
in Figure 2 and Figure 3. The number of vertex (edge)
in the 2-club find by DROP is 56 (65). The number of
vertex (edge) in the 2-club find by TRIM is 16 (25).
Figure 2. Applying TRIM in IKE graph
Figure 3. Applying DROP in IKE graph
417
2016 Conference on Information Technology and Applications in Outlying Islands
第十五屆離島資訊技術與應用研討會
interactive, even though they seem to not related to
each other. The resulted average clustering coefficient
of TRIM is larger than DROP by 0.5602. Compare to
the random graphs, the resulted 2-club in IKE graph is
denser. And since there are rare pendant vertices in the
newman_watts_strogatz_graph, both algorithms get
almost the same performance. For the running time,
since the phase one is removing leaf vertex repeatedly,
the number of leaves will decide the running time.
Table 1. Average clustering coefficient
Average clustering coefficient of resulted 2-club
DROP
TRIM
0
0.009
dense_gnm_random
0.0254
0.0055
erdos_renyi
0.0149
0.0754
0
0
0.0977
0.6039
fast_gnp_random
newman_watts_strogatz
IKE
Conclusions
In this paper, we introduce the meaning of dense
2-clubs. That is, a dense 2-club may be a group whose
relationship are stronger than a sparse 2-club (for
example, a star graph). We also derive a heuristic
algorithm TRIM to find dense 2-clubs. This algorithm
first removes every pendant vertex repeatedly until no
pendant vertex remains. Then apply the known DROP
algorithm to find 2-club. TRIM is implemented in
python with graph library Networkx. Furthermore, we
apply the small-world theory to 2-club. In the future,
we hope to investigate more real-life topics and
accomplish the implementation. And we will try to
generalize the discussion of dense k-club for k>2.
Table 2. Running time
Time(second)
DROP
TRIM
fast_gnp_random
59.0835
20.4529
dense_gnm_random
87.2739
22.7619
erdos_renyi
59.1993
20.869
194.7365
194.4183
64.4435
0.2105
Reference
newman_watts_strogatz
IKE
[1]
[2]
[3]
[4]
Every experiment runs ten times and takes
averaged. The experiment is running on a Macbook
Pro Retina with Intel core i5 2.6GHz CPU. The higher
average clustering coefficient and shorter running
time are bolded. The experiment result shows that for
random graphs, the average clustering coefficient of
TRIM is from 0 to 0.0754, the average clustering
coefficient of DROP is from 0 to 0.0254. The worst
case (the lowest average clustering coefficient) of
TRIM is as same as DROP. That is 0 of
newman_watts_strogatz. The best case of DROP and
TRIM are separately dense_gnm_random and
erdos_renyi, which is 0.0254 and 0.0754. The average
clustering coefficient of TRIM is larger than DROP by
0.05. In spite of that, it is obviously that DROP is still
better than TRIM in IKE when finding large 2-clubs.
However, comparing to the result of DROP and TRIM,
the average clustering coefficient of the former is
0.0977, the latter is 0.6039. It means that the 2-club
found by TRIM is more denser than 2-club found by
DROP. The IKE graph is actually an intersection of
many star graphs in some extent. The result 2-club
found by TRIM retains more edges against the DROP.
Therefore, it is easier to find out who are practically
[5]
[6]
[7]
[8]
418
李宗榮, "在國家權力與家族主義之間:企業
控制與台灣大型企業間網絡再探," 台灣社
會學, vol. 13, pp. 173–242, 2007. [Online].
李福鐘, "威權體制下的國民黨黨營企業,"
國史館學術集刊, vol. 18, pp. 189–220, 2008.
[Online].
曾詠悌, "以黨養黨─中國國民黨黨營事業
初期發展之研究(1945~1952)," 2004.
羅承宗, 黨產解密. 臺北市: 新台灣國策智
庫有限公司, 2011.
A. Hagberg, D. A. Schult, and P. J. Swart,
"Exploring network structure, dynamics, and
function using NetworkX," in Proceedings of
the 7th Python in Science Conference (SciPy
2008),
2008.
[Online].
Available:
http://conference.scipy.org/proceedings/SciP
y2008/paper_2/full_text.pdf.
C.-P. Yang, H.-C. Chen, S.-D. Hsiea, and B.
Y. Wu, "Heuristic Algorithms for finding 2clubs in an undirected graph," NCS, Taiwan,
2009.
D. E. Knuth, The art of computer
programming, volume 2 (3rd ed.):
Seminumerical algorithms. Addison-Wesley
Longman Publishing Co., 1997. [Online].
Available:
http://dl.acm.org/citation.cfm?id=270146.
Accessed: Feb. 26, 2016.
D. Fell, "Political and media liberalization and
political corruption in Taiwan," The China
Quarterly, vol. 184, pp. 875–893, Dec. 2005.
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[Online].
Available:
http://journals.cambridge.org/action/displayA
bstract?fromPage=online&aid=358808&fileI
d=S0305741005000548#fn1. Accessed: Mar.
6, 2016.
D. J. Watts and S. H. Strogatz, "Collective
dynamics of ‘small-world’ networks: Abstract:
Nature," Nature, vol. 393, no. 6684, pp. 440–
442, Jun. 1998. [Online]. Available:
http://www.nature.com/nature/journal/v393/n
6684/abs/393440a0.html. Accessed: Mar. 6,
2016.
E. N. Gilbert, "Random graphs," The Annals
of Mathematical Statistics, vol. 30, no. 4, pp.
1141–1144, Dec. 1959.
F. D. Carvalho and T. M. Almeida, "Upper
bounds and heuristics for the 2-club problem,"
European Journal of Operational Research,
vol. 210, no. 3, pp. 489–494, May 2011.
[Online].
Available:
http://www.sciencedirect.com/science/article/
pii/S0377221710008015. Accessed: Feb. 24,
2016.
F. Larrión, M. A. Pizaña, and R. VillarroelFlores, "On self-clique graphs with triangular
cliques," in Discrete Mathematics, Elsevier,
2016, vol. 339, no. 2, pp. 457–459. [Online].
Available:
http://www.sciencedirect.com/science/article/
pii/S0012365X15003039. Accessed: Feb. 26,
2016.
J. Pattillo, N. Youssef, and S. Butenko, "On
clique relaxation models in network analysis,"
European Journal of Operational Research,
vol. 226, no. 1, pp. 9–18, Apr. 2013. [Online].
Available:
http://www.sciencedirect.com/science/article/
pii/S0377221712007679. Accessed: Feb. 24,
2016.
J.-M. Bourjolly, G. Laporte, and G. Pesant,
"Heuristics for finding k-clubs in an
undirected graph," in Computers &
Operations Research, Elsevier, 2000, vol. 27,
no. 6, pp. 559–569. [Online]. Available:
http://www.sciencedirect.com/science/article/
pii/S0305054899000477. Accessed: Feb. 22,
2016.
M. E. J. Newman and D. J. Watts,
"Renormalization group analysis of the smallworld network model," Physics Letters A, vol.
263, no. 4-6, pp. 341–346, Dec. 1999.
M. F. Pajouh and B. Balasundaram, "On
inclusionwise maximal and maximum
cardinality k-clubs in graphs," in Discrete
Optimization, Elsevier, 2012, vol. 9, no. 2, pp.
84–97.
[Online].
Available:
http://www.sciencedirect.com/science/article/
pii/S1572528612000163. Accessed: Feb. 24,
2016.
M.-S. Chang, L.-J. Hung, C.-R. Lin, and P.-C.
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
Su, "Finding large k-clubs in undirected
graphs," Computing, vol. 95, no. 9, pp. 739–
758, Dec. 2012.
P. Erdős and A. Rényi, "On random graphs,"
Publicationes Mathematicae Debrecen, vol. 6,
p. Publicationes Mathematicae Debrecen, Vol.
6 (1959), pp. 290–297 Key: citeulike:1007174,
1959.
R. D. Luce, "Connectivity and generalized
cliques in sociometric group structure,"
Psychometrika, vol. 15, no. 2, pp. 169–190,
Jun. 1950.
R. J. Mokken, "Cliques, clubs and clans,"
Quality & Quantity, vol. 13, no. 2, pp. 161–
173, Apr. 1979.
R.
M.
Karp,
Reducibility
among
Combinatorial problems in Complexity of
Computer Computations. 1972, pp. 85–103.
S. Gavron, "Suffragette," 2015. [Online].
Available:
http://www.imdb.com/title/tt3077214/.
Accessed: Mar. 6, 2016.
S. Hartung, C. Komusiewicz, A. Nichterlein,
and
O.
Suchý,
"On
structural
parameterizations for the 2-club problem,"
Discrete Applied Mathematics, vol. 185, pp.
79–92, Apr. 2015. [Online]. Available:
http://www.sciencedirect.com/science/article/
pii/S0166218X14005265. Accessed: Feb. 24,
2016.
S. Hartung, C. Komusiewicz, and A.
Nichterlein, "Parameterized Algorithmics and
computational experiments for finding 2Clubs," Parameterized and Exact Computation,
vol. 7535, pp. 231–241, 2012.
S. Milgram, "The Small World Problem,"
Psychology Today, pp. 61–67, May 1967.
V. Batagelj and U. Brandes, "Efficient
generation of large random networks,"
Physical Review E, vol. 71, no. 3, Mar. 2005.
419
2016 Conference on Information Technology and Applications in Outlying Islands