The k-club problem: new results for k=3 Maria Teresa Almeida and Filipa Duarte de Carvalho CIO − Working Paper 3/2008 The k-club problem: new results for k=3 M. T. Almeida and F. D. Carvalho Instituto Superior de Economia e Gestão, Universidade Técnica de Lisboa CIO-Centro de Investigação Operacional-FC/UL January 2008 Abstract: Given an undirected graph G, the k-club problem consists of …nding a maximum cardinality subset of its nodes that induces a graph of diameter k. Such subsets, called k-clubs, are clique relaxations that represent dense substructures of G. They provide interesting information on cohesive subgroups in social networks, not revealed by cliques. They are also used by biologists to study protein interactions. The k-club problem is N P -hard for any …xed k. In this paper we present a new integer linear formulation for the k = 3 case and derive new classes of valid inequalities to strengthen its LP relaxation. Keywords: k-club; integer programming; valid inequalities; clique; social networks 1 1 Introduction Social and behavioral scientists use network representations to study linkages between groups or individuals in societies and organizations. Biologists use them to study interactions between proteins. In such studies it is important to identify dense structures i. e., subsets of nodes with a high density of interconnections. The highest density structure is the well known clique, [3], but it is considered overly restrictive in those contexts. Clique relaxations such as k-cliques and k-clubs represent cohesive subgroups and provide interesting information not revealed by cliques. A discussion of k-clubs and k-cliques in the context of social networks can be found in [1] and [7]. For a discussion in the context of biological networks the reader is referred to [2]. Given an undirected graph G, a k-club is a subset of nodes of G that induces a subgraph of diameter k. For k = 1 a k-club and a k-clique are a clique. For k > 1 a k-club is a k-clique but the converse is not true. A k-clique is a set S of nodes such that every pair is linked by a chain of length at most k in G, but not necessarily in the subgraph induced by S. For S to be a k-club, nodes outside S may not be used to link pairs of nodes in S. A classical example that illustrates the di¤erence between k-clubs and k-cliques is given in [1]. The k-club problem consists of …nding a maximum cardinality k-club of a graph. The k-club problem was proven to be N P -hard by Bourjolly et al. [4]. Bourjolly et al. [5] stated some properties of k-clubs and developed three heuristic procedures for the identi…cation of large cardinality k-clubs. In [4] the k-club problem was formulated as an integer linear program and a simpli…ed version of it for the case k = 2 was presented. An exact enumerative algorithm was also developed to solve the k-club problem. Balasundaram et al. [2] studied the 2-club polytope and established some polyhedral results, [8]. More recently, Carvalho and Almeida [6] presented new families of 2 valid inequalities for the 2-club polytope and derived conditions for them to de…ne facets. This paper is organized as follows. In section 2 we introduce de…nitions and notation and in section 3 we review the integer linear programming formulation for the k-club problem presented in [4]. In section 4 we show that the 3-club problem is polynomialy solvable in a special class of graphs, and present a new integer linear programming formulation followed by new classes of valid inequalities. 2 De…nitions and Notation For each node i 2 V , the set of nodes linked to i by an edge in E will be represented by Ni and called the set of its neighbours. The degree of node i, degG (i), is the cardinality of Ni . The distance between two nodes i and j, distG (i; j), is the number of edges in a shortest chain linking i to j in G. The diameter of G = (V; E), diam(G), is the maximum distance between two nodes of G. A subset of nodes, I V , such that distG (i; j) > 3 for all i; j 2 I, is called a 3-independent set. The spacing between any two edges e1 = (i1 ; j1 ) and e2 = (i2 ; j2 ) is de…ned as spacG (e1 ; e2 ) = If D max i;j2fi1 ;i2 ;j1 ;j2 g fdistG (i; j)g E is a set of edges such that spacG (e; f ) > 3 for all e; f 2 D, then D will be called a 3-spac set. 3 The spacing between a node i and an edge e = (u; v) is de…ned as spacG (i; e) = max fdistG (i; u); distG (i; v)g If S is a subset of nodes of G = (V; E), the subgraph induced by S will be denoted G[S] and the set of edges in E with both ends in S will be represented by E(S; S). If diam(G) k then the optimal solution of the k-club problem is the whole set V and the problem is trivial. Throughout the remainder of the paper we will assume that diam(G) > k. 3 Chain Formulation for the k-Club Problem, [4] A pair of nodes, i and j, may belong to a k-club in G = (V; E) if and only if there is a chain of length at most k linking i and j such that every node in the chain belongs to the k-club. The k-club problem was formulated as an integer linear problem by Bourjolly et al. [4] as follows. For any two nodes i; j 2 V , let Cijk be the set of all chains of length at most k linking i and j and denote by Vt the vertex set of a chain t. For every i 2 V , let xi be a binary variable equal to 1 if and only if i belongs to the solution and let yt be a binary variable associated with chain t 2 C, C = [i;j2V Cijk . 4 CHAIN F ORM U LAT ION M AX Z= P xi i2V s:t: xi + xj 1 xi + xj 1+ P if (i; j) 2 = E and Cijk = ? yt k t2Cij yt xr if (i; j) 2 = E and Cijk 6= ? for all t 2 C and r 2 Vt xi 2 f0; 1g i2V yt 2 f0; 1g t2C Note that, along with a variable associated with every node of graph G, there is one variable associated with every chain, with at most k edges, linking each pair of nonadjacent nodes. For k = 2, each chain t 2 C has a single internal node that can be used to represent the chain. In this case the yt variables are not needed as shown in [4]. For k = 3, we present in section 4:2 an alternative formulation with variables associated with the edges, rather than with the chains of length at most k, which reduces the total number of variables needed in the model. 4 The 3-Club Problem This section is devoted to the 3-club problem. In section 4:1 we show that the problem is polynomialy solvable in a special class of graphs. In section 4:2 we present a new integer linear programming model alternative to the chain formulation. In sections 4:3 and 4:4 we present new valid inequalities. 5 4.1 A polynomial case If G is a tree, any feasible solution for the 3-club problem must be the node set of a subtree with diameter 3. In such a subtree there is at least one chain with three edges and the total number of nodes in the subtree is given by the sum of the degrees of that chain’s internal nodes. In this case, to determine an optimal solution for the 3-club problem one only needs to identify an edge e = (i ; j ) 2 E such that e = arg max fdegG (i) + degG (j) : e = (i; j) 2 Eg. 4.2 Neighbourhood formulation The chain formulation for the 3-club problem may have a very large number of variables, due to the number of chains in C = [i;j2V Cij3 it has to deal with. To reduce the number of variables, one may interpret a 3-club as a subset S of nodes such that for any pair i; j 2 S, at least one of the following conditions holds: i and j are linked by an edge (i; j) 2 E; there is a node r in S linked to nodes i and j, i.e., such that (i; r) 2 E and (r; j) 2 E; there are in S two nodes, p and q, such that p is a neighbour of i, q is a neighbour of j and (p; q) 2 E. Let us associate a binary variable xi with each node i 2 V and a variable zij with each edge (i; j) 2 E. A 3-club in G = (V; E) is a subset of nodes represented by a point in the subset of ZjV j+jEj de…ned by 6 xi + xj xi + xj i; j 2 V : distG (i; j) > 3 1 P 1+ xr + r2(Ni \Nj ) P zpq p2Ni ;q2Nj (p;q)2E (1) i; j 2 V : (i; j) 2 = E; distG (i; j) 3 0 (2 ) zij xi (i; j) 2 E (3) zij xj (i; j) 2 E (4) xi 2 f0; 1g i2V (5) zij 2 f0; 1g (i; j) 2 E (6) Constraints (1) state that, if the distance between two nodes i and j is greater than 0 3, then at most one of the nodes i or j may belong to a 3-club. Constraints (2 ) impose that two nonadjacent nodes i and j may not be both in a 3-club unless either a common neighbour is in the 3-club or a pair of neighbours p and q of i and j, respectively, linked by an edge are in the 3-club. Constraints (3) and (4) guarantee that the end nodes of an edge (i; j) are both in a 3-club whenever the corresponding edge variable zij is equal to 1. Whenever xp = xq = 1 and (p; q) 2 E the edge variable zpq may be either 0 or 1, unless there are two other nodes, a node i 2 Np nNq and a node j 2 Nq nNp , such that xi = xj = 1 and i p q j is the only way of linking i and j with no more than 3 edges. As a consequence, a 3-club in G may be represented by more than one point. To obtain a one-to-one representation, constraints zij xi + xj 1 (i; j) 2 E (7) will be included in the formulation. On the other hand, by conditions (3) (4), if zpq = 1 then xp = xq = 1. Therefore, 0 in conditions (2 ), if either p or q is in (Ni \ Nj ) variable zpq plays no role and may be 7 dropped. Dropping it, reduces the density of the coe¢ cient matrix and strengthens the formulation from an LP point of view. 0 Conditions (2 ) will be replaced with xi + xj 1+ P xr + r2(Ni \Nj ) P i; j 2 V : (i; j) 2 = E; distG (i; j) zpq (p;q)2Eij where Eij = f(p; q) 2 E : p 2 (Ni nNj ); 3 (2) q 2 (Nj nNi )g. 8 > < 1; if node i is in the 3-club xi = > : 0; otherwise De…ning 8 > < 1; if edge (i; j) links nodes in the 3-club zij = > : 0; otherwise A maximum cardinality 3-club is an optimal solution of the following integer problem N EIGHBOU RHOOD F ORM U LAT ION P (N ) M AX Z = xi i2V s:t: (1) (7) In the remainder of the paper, constraints (1) will be called node packing constraints and constraints (2) will be called neighbourhood constraints. Setting all xi variables to 0:5 and all zij variables to 0 yields a feasible solution for the linear programming relaxation of (N ). The linear optimum is therefore greater than or equal to half the number of nodes in G. This means that the integrality gaps tend to be quite large for sparse graphs and that tighter formulations are needed to solve the 3-club problem through LP methods. In section 4:3 we deduce lifted forms of constraints (1) and (2) and in section 4:4 we derive another family of valid inequalities - platform inequalities. 8 4.3 Lifted node packing and neighbourhood constraints Given two nodes i; j 2 V , if distG (i; j) > 3 then at most one of them may be in a 3-club. This condition is imposed by the node packing constraints (1). Node packing constraints may be generalized to include more than two nodes. LEM M A 1 If I V is a 3-independent set, then the multi-node packing inequality P xi 1 (8) i2I is valid for the 3-club problem. P roof: Immediate from the de…nition of a 3-independent set. A rationale similar to the one used to derive inequalities (8) may be used to deduce valid inequalities over the variables associated with the edge set of G. Given two edges, (p; q) and (r; s), if the distance between two of the nodes in fp; q; r; sg is greater than 3, then at most three of these nodes may be in the 3-club. This leads to the inequality zpq + zrs 1. Inequalities of this kind may be generalized to include edge sets with more than two elements. LEM M A 2 If D is a 3-spac set, then the edge packing inequality P ze 1 (9) e2D 9 is valid for the 3-club problem. P roof: Immediate from the de…nition of a 3-spac set. Balasundaram et al. [2] proved that, if I is a maximal 2-independent set, the packing P constraint xi 1 de…nes a facet of the 2-club polytope. By contrast, a packing inequali2I ity (8) de…ned by a maximal 3-independent node set I may be dominated as illustrated by the following example. Consider a graph G with 8 nodes and edge set E = f(i; i + 1); i = 1; :::; 7g. Node set I = f1; 8g is a maximal 3-independent set in G and the corresponding node packing inequality is dominated by the valid inequality x1 + x8 + z45 1. Conditions for a 3-independent set I and a 3-spac set D to de…ne a valid generalizedpacking inequality are presented next. LEM M A 3 Let I be a 3-independent set in G and let D be a 3-spac set. If spacG (i; e) > 3 for all i 2 I and all e 2 D, then the generalized-packing inequality P i2I xi + P ze 1 (10) e2D is valid for the 3-club problem. 10 P roof: By lemma 1, P 1. By lemma 2, xi i2I P ze e2D 1. If i 2 I and e = (u; v) 2 D then either distG (i; u) > 3 or distG (i; v) > 3. If xi = 1 then xu xv = 0 and zuv = 0. If ze = 1 then xu = xv = 1 and xi = 0. Lemmas 1, 2 and 3 may be used to generate lifted versions of neighbourhood constraints, as follows. LEM M A 4 Let a, b 2 V be a pair of nodes associated with a neighbourhood constraint (2) and let I V be a 3-independent set such that min fdistG (i; a); distG (i; b)g > 3 for all i 2 I. Then xa + xb + P xi 1+ i2I P xr + r2(Na \Nb ) P zpq (11) (p;q)2Eab is valid for the 3-club problem. P roof: A 3-club may contain at most one node of I. If it has a node of I then it cannot include neither a nor b. A Chvátal-Gomory deduction of inequality (11) is obtained combining the neighbourhood constraint (2) for the pair fa; bg and packing inequalities (8) for the 3-independent sets (I [ fag) and (I [ fbg), with coe¢ cients 0:5 and rounding. LEM M A 5 Let a, b 2 V be a pair of nodes associated with a neighbourhood constraint (2) and let D E be a 3-spac set such that min fspacG (e; a); spacG (e; b)g > 3 for all e 2 D. Then 11 xa + xb + P ze 1+ e2D P xr + r2(Na \Nb ) P zpq (12) (p;q)2Eab is valid for the 3-club problem. P roof: By lemma 2, P ze e2D 1. For any e = (u; v) 2 D, if ze = 1 then nodes u and v must be in the 3-club. But a 3-club that includes nodes u and v cannot include neither a nor b. A Chvátal-Gomory deduction of inequality (12) is obtained combining the neighbourhood constraint (2) for the pair fa; bg and the generalized packing inequalities (10) for D and the 3-independent sets I = fag and I = fbg with coe¢ cients 0:5 and rounding. A more general version of lifted neighbourhood constraints may be obtained with variables representing nodes in a 3-independent set I and variables representing edges in a 3-spac set D. P ROP OSIT ION 1 Let a, b 2 V be a pair of nodes associated with a neighbourhood constraint (2), let I be a 3-independent set in G and let D be a 3-spac set. If min fdistG (i; a); distG (i; b)g > 3 for all i 2 I, min fspacG (e; a); spacG (e; b)g > 3 for all e 2 D and spacG (i; e) > 3 for all i 2 I and all e 2 D, then the generalized neighbourhood constraint for the pair a; b 2 V xa + xb + P i2I xi + P ze e2D 1+ P r2(Na \Nb ) is valid for the 3-club problem. 12 xr + P (p;q)2Eab zpq (13) P roof: By lemma 3, P xi + i2I If P xi = 1 then i2I If P P ze 1. e2D P ze = 0 and (13) reduces to (11). e2D ze = 1 then e2D P xi = 0 and (13) reduces to (12). i2I A Chvátal-Gomory deduction of inequality (13) is obtained combining inequalities (11), (12) and the generalized packing inequality (10) with coe¢ cients 0:5 and rounding. Consider an inequality (13). If there is an edge f = (u; v) 2 D such that distG (i; u) > 3 for all nodes i 2 (I [ fa; bg) then inequality P xa + xb + i2(I[fug) xi + P e2(Dnff g) ze 1+ P r2(Na \Nb ) is also valid for the 3-club problem. As xu xr + P zpq (14) (p;q)2Eab zf , inequality (14) dominates (13). This dominance indicates that maximal 3-independent sets shall be used in the generation of generalized neighbourhood constraints. In lemma 6 we characterize, for maximal 3-independent sets, the edges associated with ze variables that can be included in a generalized neighbourhood constraint. LEM M A 6 Consider a pair of nodes, a and b, a maximal 3-independent set I and a 3-spac set D in the conditions stated in proposition 1. For each edge e = (u; v) 2 D there are nodes ; ! 2 (I [ fa; bg) such that (i) distG (u; ) = distG (v; !) = 3 13 (ii) distG (u; !) = distG (v; ) = 4 P roof: In the conditions of proposition 1, for all j 2 (I [ fa; bg), spac (e; j) > 3, i.e., max fdistG (j; u) ; distG (j; v)g 4. As I is maximal and u; v 2 = (I [ fa; bg), min fdistG (j; u) : j 2 (I [ fa; bg)g min fdistG (j; v) : j 2 (I [ fa; bg)g 3 and 3. As u and v are linked by an edge of D, min fdistG (j; u) : j 2 (I [ fa; bg)g = 3 and min fdistG (j; v) : j 2 (I [ fa; bg)g = 3. Let = arg min fdistG (j; u) : j 2 (I [ fa; bg)g ! = arg min fdistG (j; v) : j 2 (I [ fa; bg)g : Then distG (u; !) = distG (v; ) = 4 The dominance of (14) over (13) and lemma 6 suggest a two-stage procedure to obtain a generalized neighbourhood inequality. Given a pair of nodes, a and b, associated with a neighbourhood constraint (2), in the …rst stage one identi…es a maximum cardinality _ 3-independent set in the conditions of lemma 4, say I; in the second stage, one selects a _ maximum cardinality 3-spac set in the conditions of lemma 6, for that particular set I. If a and b are 2 nodes associated with a packing constraint (1) a similar deduction shows that, if I and D are a maximal 3-independent set and a 3-spac set in the conditions of proposition 1, then the inequality xa + xb + P i2I xi + P ze 1 (130 ) e2D is also valid for the 3-club problem. Note that (130 ) is the same as (10) for the 3independent set (I [ fa; bg). 14 4.4 Platform inequalities Each node packing constraint and each neighbourhood constraint, in formulation (N ), imposes conditions for a pair of nodes. From them it is possible to deduce conditions for some triplets of nodes as shown in the following proposition. P ROP OSIT ION 2 Let R = fa; b; cg be a set of three nodes in G such that E(R; R) = ?. Then the platform inequality xa + xb + xc 1+ P t xt t2V with t and pq = + P pq zpq (15) (p;q)2E 8 > > 2 if t 2 (Na \ Nb \ Nc ) > > < = 1 if t 2 ((Na \ Nb ) [ (Nb \ Nc ) [ (Na \ Nc ))n(Na \ Nb \ Nc ) (16) > > > > : 0 otherwise 8 > > 1 if > > < (p; q) 2 (Eab [ Ebc [ Eac ) (17) > > > > : 0 otherwise is valid for the 3-club problem. P roof: Nodes a, b and c may be in a 3-club, S, if there is in S another node adjacent to all of them or if for each pair, fa; bg, fb; cg and fa; cg, there is either a common neighbour or a pair of neighbours linked by an edge. 15 A Chvátal-Gomory deduction of a platform inequality is obtained combining the constraints in (N ) for the pairs fa; bg, fb; cg and fa; cg with coe¢ cients 0:5 and rounding. Note that, if every pair is associated with a node packing constraint, (15) is a multinode packing inequality that dominates each of the three original node packing constraints (1). If only one of the pairs is associated with a neighbourhood constraint (2) then the platform inequality dominates that constraint (2). Platform inequalities may be lifted adapting the procedure described in the previous section for the neighbourhood constraints. Lemmas 7 and 8 state results similar to those in lemmas 4 and 5 considering a triplet fa; b; cg associated with a platform inequality instead of a pair fa; bg associated with a neighbourhood constraint. LEM M A 7 Let R = fa; b; cg be a set of three nodes in G such that E(R; R) = ? and let I V be a 3-independent set such that min fdistG (i; v); v 2 fa; b; cgg > 3 for all i 2 I. Then the inequality xa + xb + xc + P i2I with t and pq xi 1+ P t2V t xt + P pq zpq (18) (p;q)2E de…ned as in (16) and (17) is valid for the 3-club problem. P roof: Similar to lemma 4. 16 LEM M A 8 Let R = fa; b; cg be a set of three nodes in G such that E(R; R) = ? and let D E be a 3-spac set such that min fspacG (e; v); v 2 fa; b; cgg > 3 for all e 2 D. Then the inequality xa + xb + xc + P ze 1+ e2D with t and pq P t xt t2V + P pq zpq (19) (p;q)2E de…ned as in (16) and (17) is valid for the 3-club problem. P roof: Similar to lemma 5. Again, as for the neighbourhood constraints, a more general version of lifted platform inequalities may be obtained with variables representing nodes in a 3-independent set I and variables representing edges in a 3-spac set D. P ROP OSIT ION 3 Let R = fa; b; cg be a set of three nodes in G such that E(R; R) = ?, let I be a 3-independent set in G and let D be a 3-spac set. If min fdistG (i; v) : v 2 fa; b; cgg > 3 for all i 2 I, min fspacG (e; v) : v 2 fa; b; cgg > 3 for all e 2 D and spacG (i; e) > 3 for all i 2 I and e 2 D, then the lifted version of the platform inequality for nodes a; b; c 2 V xa + xb + xc + P i2I with t and pq xi + P ze 1+ e2D P t2V t xt + P pq zpq (20) (p;q)2E de…ned as in (16) and (17) is valid for the 3-club problem. 17 P roof: P P By lemma 3, xi + ze 1. i2I e2D P P If xi = 1 then ze = 0 and (20) reduces to (18). i2I e2D P P If ze = 1 then xi = 0 and (20) reduces to (19). e2D i2I A Chvátal-Gomory deduction of inequalities (20) is obtained combining (18), (19) and the generalized-packing inequality (10) for I and D with coe¢ cients 0:5 and rounding. The two-stage lifting procedure described in the previous section may be adapted to generate generalized-packing inequalities substituting fa; bg by fa; b; cg. 18 References [1] R. D. Alba, A graph-theoretic de…nition of a sociometric clique, Journal of Mathematical Sociology 3 (1973) 113-126. [2] B. Balasundaram, S. Butenko, S. Trukhanov, Novel approaches for analyzing biological networks, Journal of Combinatorial Optimization 10 (2005) 23-39. [3] I. M. Bomze, M. Budinich, P.M. Pardalos, M. Pelillo, The maximum clique problem, in D.-Z. Du and P.M. Pardalos (Eds.), Handbook of Combinatorial Optimization. Dordrecht, The Netherlands, Kluwer Academic Publishers, 1999, 1-74. [4] J.-M. Bourjolly, G. Laporte, G. Pesant, An exact algorithm for the maximum k-club problem in an undirected graph, European Journal of Operational Research 138 (2002) 21-28. [5] J.-M. Bourjolly, G. Laporte, G. Pesant, Heuristics for …nding k-clubs in an undirected graph, Computers & Operations Research 27 (2000) 559-569. [6] F. D. Carvalho, M. T. Almeida, Strong valid inequalities for the 2-club problem, Centro de Investigação Operacional, Working Paper 2/2008 (available at http://cio.fc.ul.pt). [7] R. J. Mokken, Cliques, clubs and clans, Quality and Quantity 13 (1979) 161-173. [8] G. L. Nemhauser, L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley, New York, 1988. 19
© Copyright 2026 Paperzz