Social Networks 31 (2009) 190–203 Contents lists available at ScienceDirect Social Networks journal homepage: www.elsevier.com/locate/socnet Group betweenness and co-betweenness: Inter-related notions of coalition centrality夽 Eric D. Kolaczyk a,∗ , David B. Chua b , Marc Barthélemy c,d a Dept. of Mathematics and Statistics, Boston University, 111 Cummington Street, Boston, MA 02215, USA State Street Bank, Boston, MA, USA Commissariat à l’Energie Atomique, Centre d’Etudes de Bruyères Le Châtel, Bruyères Le Châtel, France d Centre d’Analyse et Mathématique Sociales, École des Hautes Études en Sciences Sociales, Paris, France. b c a r t i c l e Keywords: Centrality i n f o a b s t r a c t Vertex betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertex in a network in terms of its ‘control’ on the flow of information along geodesic paths throughout the network. Two natural ways to extend vertex betweenness centrality to sets of vertices are (i) in terms of geodesic paths that pass through at least one of the vertices in the set, and (ii) in terms of geodesic paths that pass through all vertices in the set. The former was introduced by Everett and Borgatti [Everett, M., Borgatti, S., 1999. The centrality of groups and classes. Journal of Mathematical Sociology 23 (3), 181–201], and called group betweenness centrality. The latter, which we call co-betweenness centrality here, has not been considered formally in the literature until now, to the best of our knowledge. In this paper, we show that these two notions of centrality are in fact intimately related and, furthermore, that this relationship may be exploited to obtain deeper insight into both. In particular, we provide an expansion for group betweenness in terms of increasingly higher orders of co-betweenness, in a manner analogous to the Taylor series expansion of a mathematical function in calculus. We then demonstrate the utility of this expansion by using it to construct analytic lower and upper bounds for group betweenness that involve only simple combinations of (i) the betweenness of individual vertices in the group, and (ii) the co-betweenness of pairs of these vertices. Accordingly, we argue that the latter quantity, i.e., pairwise co-betweenness, is itself a fundamental quantity of some independent interest, and we present a computationally efficient algorithm for its calculation, which extends the algorithm of Brandes [Brandes, U., 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 163] in a natural manner. Applications are provided throughout, using a handful of different communication networks, which serve to illustrate the way in which our mathematical contributions allow for insight to be gained into the interaction of network structure, coalitions, and information flow in social networks. © 2009 Elsevier B.V. All rights reserved. 1. Introduction In social network analysis, the problem of determining the importance of actors in a network has been studied for a long time (see, for example, Wasserman and Faust, 1994). It is in this context that the concept of the centrality of a vertex in a network emerged. There are numerous measures that have been proposed to quantify centrality, which differ both in the nature of the underlying notion of vertex importance that they seek to capture and in the manner in which that notion is encoded through some functional of the network. See Borgatti and Everett (2006), for example, for a recent review and categorization of centrality measures. 夽 Part of this work supported by NSF grant CCR-0325701 and ONR awards N0001403-1-0043 and N00014-06-1-0096. ∗ Corresponding author. E-mail address: [email protected] (E.D. Kolaczyk). 0378-8733/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.socnet.2009.02.003 Paths – as the routes by which flows (e.g., of information or commodities) travel over a network – are fundamental to the functioning of many networks. Therefore, not surprisingly, a number of centrality measures quantify importance with respect to the sharing of paths in the network. One popular measure is betweenness centrality. First introduced in its modern form by Freeman (1977), betweenness centrality is essentially a measure of how many geodesic (i.e., ‘shortest’) paths pass through a given vertex. In other words, in a social network for example, the betweenness centrality measures the extent to which an actor “lies between” other actors in the network, with respect to the network path structure. As such, it is a measure of the control that actor has over the flow of information in the network. The standard betweenness centrality is defined with respect to individual vertices. As a result, while this quantity can be used to produce an ordering of the vertices in terms of their individual importance, it is not clear a priori just how much insight it provides into the manner in which the vertices together exert influence E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 191 Fig. 1. Graph representation of the physical topology of the Abilene network. Nodes represent regional network aggregation points (so-called ‘Points-of-Presence’ or PoP’s), and are labeled according to their metropolitan area, while the edges represent systems of optical transportation technologies and routing devices. upon the network. Understanding behavior of this latter kind can be important in presenting an appropriately more nuanced view of the roles of the different vertices, beyond their individual importance, such as through their roles as members of potential ‘groups’ or ‘coalitions’. The interaction of network structure, information flow, and the selection (if imposed) or formation (if autonomous) of influential subsets of vertices is an area of substantial current research interest in the overall network-oriented literature. Recent work of this nature can be found in regards to topics as diverse as the spread of epidemics in population networks and rumors and information in social networks (e.g., Barrat et al., 2008, Chapters 9 & 10), the effect of affiliation networks of interest groups on political processes (e.g., Dominguez, 2008), and the study of coalition formation in multi-agent systems in economics and computer science (e.g., Merida-Campos and Willmott, 2007). Numerous additional references may be found in those just cited. The general concepts and tools introduced in this paper are broadly relevant to work in this area, in that they pertain to the important issue of how to quantify and interpret betweenness centrality for collections of more than one vertex. There are two ways in which one might naturally extend vertex betweenness centrality to sets of vertices. The first is to define the betweenness of a set in terms of geodesic paths that pass through at least one of the vertices in the set, and the second, in terms of geodesic paths that pass through all vertices in the set. The former notion was introduced by Everett and Borgatti (1999), and called group betweenness centrality. The latter, which we call co-betweenness centrality in this paper, has not been considered formally in the literature until now, to the best of our knowledge. The first would arguably seem to be of more immediate interest for applications (e.g., see Everett and Borgatti, 1999 for relevant discussion). The primary contribution of this paper, however, is to show that these two notions are in fact intimately related, and that furthermore, this relationship provides interesting insight into the nature of each. In particular, we develop a precise mathematical characterization of this relationship and then use it to show how the betweenness of a group of an arbitrary number of vertices can be bounded, both above and below, by quantities involving only (i) the betweenness of the individual vertices, and (ii) the co-betweenness of pairs of these vertices. These bounds are found frequently to be quite tight in the network datasets we examine and, in general, their width can reveal information concerning higher-order aspects of network path structure. We therefore argue that pairwise cobetweenness, which is critical to the construction of these bounds, is itself a quantity of some fundamental interest, and we present an algorithm for its efficient calculation across all pairs of vertices in a network. The organization of this paper is as follows. In Section 2, we briefly review necessary notation and terminology, and then illustrate the basic relationship between group betweenness and co-betweenness centralities in the case of groups of m = 2 vertices. The general case of groups of m ≥ 2 vertices is addressed in Section 3, wherein we provide an expression relating the two notions of centrality, we develop our bounds, and we discuss some of the implications of these bounds. The computation of pairwise co-betweenness values is discussed in Section 4, where we sketch our proposed algorithm. The concepts introduced throughout Sections 2 and 3 are motivated and illustrated in the context of an Internet communication network. In Section 5, we provide further illustration using two social networks. Some additional discussion is provided in Section 6. Finally, a formal description of our algorithm for computation of pairwise co-betweenness, as well as pseudo-code, may be found in Appendix A. 2. Preliminary material and results 2.1. Background Let G = (V, E) denote an undirected graph with nv vertices in V and ne edges in E. For convenience, and without loss of generality, we will assume G to be connected. Recall that a path on G, from a vertex v0 to another vertex v , is an alternating sequence {v0 , e1 , v1 , . . . , v−1 , e , v } of vertices and edges, where the endpoints of ei are {vi−1 , vi }, such that no edges or vertices are repeated. The length of this path is said to be . A geodesic path (also often called a ‘shortest’ path) between two vertices u, v ∈ V is a path whose length is a minimum, among all paths between u and v. The length of a geodesic path between two vertices is called their geodesic distance. In the case that the graph G is weighted, i.e., there is a collection of edge weights {we }e ∈ E , where we ≥ 0, geodesic paths may be instead defined as paths for which the total sum of edge weights is a minimum. In this paper, we will restrict our exposition primarily to the case of unweighted graphs, but extensions to weighted graphs are straightforward. For additional background of this type, see, for example, the textbook Clark and Holton (1991). Let st denote the total number of geodesic paths that connect vertices s and t (with ss ≡ 1). Similarly, let st (v) denote the number of geodesic paths between s and t that also pass through vertex v, in the sense that v is an interior vertex on the path. Betweenness centrality of a vertex v is defined as a weighted sum of the number 192 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Table 1 Pairs of vertices in Abilene with the top 20 betweenness values B(u,v). of geodesic paths through v, B(v) = st (v) s,t ∈ V \{v} st . (1) Note that this definition excludes the geodesic paths that start or end at v. However, in a connected graph we will have st (v) = st whenever s = v or t = v, so the exclusion amounts to removing a constant term that would otherwise be present in the betweenness centrality of every vertex. Sometimes the betweenness B(v) is normalized, in the form B̃(v) = 2B(v)/[(nv − 1)(nv − 2)], so as to restrict its range to between 0 and 1. As an illustration, which we will use throughout this and the next section, consider the network in Fig. 1. This is the Abilene network, an Internet network that is part of the Internet2 project,1 a research project devoted to the development of the ‘next generation’ Internet. It serves as a so-called ‘backbone’ network for universities and research labs across the United States, in a manner analogous to the federal highway system of roads. The information traversing this network takes the form of so-called ‘packets’, and the packets flow between origins and destinations on this network along paths strictly determined according to a set of underlying routing protocols. Due to the nature of these protocols, it is common to assume, as a first approximation, that (i) information flows in this network with respect to a set of geodesic paths and (ii) there is exactly one geodesic path for each vertex pair.2 The vertices in Fig. 1 correspond to metropolitan regions, and have been laid out roughly with respect to their true geographical locations. Note that the latter of the two assumptions above implies that the betweenness B(v) of any given vertex v ∈ V will be exactly equal to the number of geodesic paths through v. We will find this fact convenient for the purposes of illustration, although it is not necessary for (nor utilized in) our general development. Intuitively, and according to earlier work on centrality in spatial networks (Barrat et al., 2005), one might suspect that vertices near the central portion of the network, such as Kansas City or Indianapolis, have larger betweenness, being likely forced to support most of the flows of communication between east and west. Examination of the underlying routing information and the paths induced by this information show this to be the case. Until recently standard algorithms for computing betweenness centralities B(v) for all vertices in a network had O(n3v ) running times, which was a stumbling block to their application in largescale network analyses. Faster algorithms now exist, such as those introduced in Brandes (2001), which have running time of O(nv ne ) on unweighted networks and O(nv ne + n2v log nv ) on weighted networks, with an O(nv + ne ) space requirement. These improvements derive from exploiting a clever recursive relation for the partial (v)/st . We make use of similar techniques in the sums t ∈ V st development of our own algorithms here. 2.2. Betweenness and co-betweenness for pairs of vertices We motivate our study in this paper of higher-order notions of betweenness by first examining in some detail the case of m = 2 vertices. For two vertices u, v ∈ V , their individual betweenness values quantify the extent to which each is passed through by geodesic paths in G. Similarly, the group betweenness of the pair quantifies the extent to which either is passed through. In general, however, 1 http://www.internet2.edu. Technically, the Abilene network is more accurately described by a directed graph. But, given the fact that routing is typically symmetric in this network, we follow the Internet2 convention of displaying Abilene using an undirected graph. In addition, although the uniqueness of geodesic paths in this network necessarily implies that it is actually a weighted graph, we will not emphasize this fact here. 2 Rank Vertex pair u, v C(u, u) C(v, v) C(u, v) B(u, v) B̃(u, v) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Indianapolis/Houston Kansas City/Atlanta Indianapolis/Los Angeles Kansas City/Houston Kansas City/Los Angeles Kansas City/Washington Indianapolis/Sunnyvale Kansas City/New York Kansas City/Sunnyvale Kansas City/Chicago Indianapolis/Washington Indianapolis/Atlanta Indianapolis/Kansas City Indianapolis/Denver Indianapolis/Seattle Kansas City/Seattle Indianapolis/New York Kansas City/Denver Chicago/Atlanta Chicago/Sunnyvale 38 36 36 36 36 36 32 34 32 32 32 30 32 32 32 32 30 30 22 18 4 6 4 4 4 4 12 10 12 18 4 6 32 10 0 0 10 10 6 12 0 0 0 0 0 0 4 6 6 14 0 0 30 8 0 0 8 10 0 2 42 42 40 40 40 40 40 38 38 36 36 36 34 34 32 32 32 30 28 28 0.583 0.583 0.556 0.556 0.556 0.556 0.556 0.528 0.528 0.500 0.500 0.500 0.472 0.472 0.444 0.444 0.444 0.417 0.389 0.389 this latter quantity will not necessarily be equal simply to the sum of the former two quantities, as this sum will over-count geodesic paths that pass through both vertices. Some correction is therefore necessary in relating the betweennesses of two vertices to their combined group betweenness, as we describe next. Note, however, that rather than individual vertex betweennesses B(u) and B(v), we will instead use slightly modified versions of these quantities, for reasons that will become immediately apparent. Formally, we express the group betweenness of u and v as B(u, v) = C{u,v} (u, u) + C{u,v} (v, v) − C{u,v} (u, v), where C{u,v} (i1 , i2 ) = s,t ∈ V \{u,v} st (i1 , i2 ) , st (2) (3) for i1 , i2 ∈ {u, v}, and st (i1 , i2 ) is the number of geodesic paths between vertices s and t that pass through both i1 and i2 . Defined in analogy to (1), we call the quantity in (3) the co-betweenness of i1 and i2 , with respect to {u, v}. The subscript {u, v} indicates that only paths between vertices s, t ∈ V \ {u, v} are included in the sum, while the argument (i1 , i2 ) indicates that we are counting paths passing through both i1 and i2 . Although somewhat redundant here, the purpose of this convention will become clear below, where we generalize to m ≥ 2. When the context allows, we will sometimes abbreviate the quantity in (3) as C(i1 , i2 ). Eq. (2) is just a re-expression of the group betweenness centrality defined in Everett and Borgatti (1999), for a group of size m = 2. Following these authors, we also define the normalized form of this measure as B̃(u, v) = 2B(u, v)/[(nv − 2)(nv − 3)]. Each of the three components in (2) can be seen as having an important role to play in defining the group betweenness of the pair u, v. However, note that the quantities C{u,v} (u, u) and C{u,v} (v, v) are not actually equal to the betweennesses B(u) and B(v), since the latter include paths with one end at v or u, respectively, while the former do not. Nevertheless, without loss of generality, we will refer to both the former and the latter quantities as vertex betweenness centralities. To illustrate the role of these various components, consider Table 1, in which we list the pairs of vertices u, v in the Abilene network with the top twenty betweenness values B(u, v). We see that the group betweenness is largest for pairs of a particular nature, involving one of the vertices central to the northern east-west route across the United States and one of the vertices along the southern east-west route. In particular, the six largest values of B(u, v) all involve combinations of either Indianapolis or Kansas City with one E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Fig. 2. Histogram of second-order group betweenness values B(u, v) for all pairs of vertices in the Abilene network. of Washington, Atlanta, Houston, or Los Angeles. Further examination of Table 1 reveals that in fact all but the last two entries involve either Indianapolis or Kansas City (or, in one instance, both). These two vertices have very large vertex betweenness, and clearly this is the main factor in the high group betweenness for these first eighteen entries. In fact, pairing either of these two vertices with Seattle, which is essentially peripheral to the network, with no betweenness itself with respect to the underlying routing protocols, is still sufficient to achieve a high second-order group betweenness. When we first see both Indianapolis and Kansas City absent, in the nineteenth and twentieth entries of the table, the nearby Chicago vertex has assumed a similar role. It is also informative to examine the full set of group betweenness values B(u, v), which we show in Fig. 2, in the form of a histogram for all 55 distinct pairs of vertices in Abilene. It is evident that there are three small clusters of values, in the high, medium, and low ranges, against an otherwise fairly uniform background. The high values correspond to the first ten or so values in Table 1 and, as we have just observed, are driven by the inclusion of either Indianapolis or Kansas City. The medium values exclude these two vertices, and instead tend to include Denver, Houston, and New York, either together or with some of the vertices on the southern 193 east-west route. The low values involve only these vertices on the southern east-west route, with Seattle as well, in some cases. Now consider the co-betweennesses C(u, v), which are shown in Table 1 as well. We have also displayed the co-betweenness values visually in Fig. 3 using a graph, where each vertex v is again placed roughly with respect to its actual geographic location, but is now drawn in proportion to its betweenness B(v). Edges between pairs of vertices u, v now represent non-zero co-betweenness C(u, v) for the pair, and are drawn with a thickness in proportion to their value. A number of interesting features are evident from this graph. First, we see that, as was noted earlier, the more centrally located vertices tend to have the largest betweenness values. Second, it is these vertices that typically are involved with the larger co-betweenness values. Since the paths going through both a vertex u and a vertex v are a subset of the paths going through either one or the other, this tendency for large co-betweenness to associate with large betweenness is not a surprise. Third, the co-betweenness values tend to be smaller between vertices separated by a larger geographical distance, which again seems intuitive. Somewhat more surprising perhaps, however, is the manner in which the network becomes disconnected. The Seattle vertex is now isolated, as the underlying routing protocols send no paths through that vertex, only to and from. Additionally, the vertices Houston, Atlanta, and Washington now form a separate component in this graph, indicating that information is routed on paths passing through both the first two and the last two, but not through all three, and also not through any of these and some other vertex. This observation suggests that these three vertices, as a group, are somewhat more marginal in the network, with respect to the flow of information. 3. Betweenness for sets of vertices In this section we present our main results on group betweenness and co-betweenness, for sets of vertices of arbitrary size m ≥ 2. We first develop an expansion for group betweenness, generalizing the expression in (2), in terms of co-betweenness values of increasing orders. Based on this expansion, we then construct a set of lower and upper bounds for group betweenness involving only vertex betweenness and pairwise co-betweenness. 3.1. An expansion for group betweenness Our expression for group betweenness in (2), for the case of m = 2 vertices, explicitly incorporates the counting principle of inclusion–exclusion. Seen from this perspective, C{u,v} (u, v) is the Fig. 3. Graph representation of the betweenness and co-betweenness values for the Abilene network. Vertices are in proportion to their betweenness. The width of each link is drawn in proportion to the co-betweenness of the two vertices incident to it. 194 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 key new quantity, where its importance is in correcting for doublecounting in the vertex betweennesses C{u,v} (u, u) and C{u,v} (v, v) of paths that pass through both u and v. The same principle of inclusion–exclusion can be used to produce an analogous expression for the betweenness centrality of subsets A ⊂ V of an arbitrary number m = |A| of vertices. To see this, following Everett and Borgatti (1999), we first define the group betweenness of the set of vertices A as ∗ (A) st B(A) = s,t ∈ /A:s = / t st , (4) ∗ (A) is the number of geodesic paths between s and t that where st pass through at least one of the vertices in A. The normalized version of B(A) is B̃(A) = 2B(A)/[(nv − m)(nv − m − 1)]. Next, we note that we ∗ (A) in the form can express st ∗ st (A) = m (−1)j−1 st (ij ), (5) ij ⊆A j=1 where ij = {i1 , . . . , ij } denotes a subset of j vertices in A and st (ij ) is the number of geodesic paths between s and t that pass through all of the vertices in ij . This expression is simply the result of applying the inclusion—exclusion principle in a standard fashion.3 Finally, we extend our notion of co-betweenness to more than two vertices. Specifically, for a subset ij ⊆ A, we define the j-th order co-betweenness of ij , with respect to A, as CA (ij ) = st (ij ) s,t ∈ /A:s = / t st . (6) This value captures the number of geodesic paths in the network that pass through all of the j vertices in ij ⊆ A. Note that, with respect to this notation, the expression in (6) reduces to that in (3) when A = {u, v}, j = 2, and i2 = {u, v}. Now, given the definitions above, the expression for group betweenness in (4) may be written alternatively as B(A) = m j=1 (−1)j−1 CA (ij ). (7) ij ⊆A That is, the group betweenness B(A) can be re-expressed in an inclusion–exclusion manner, with respect to terms of increasingly higher orders of co-betweenness among the elements of A. Note that this formulation of B(A) reduces to that in (2), when A = {u, v}. Formula (7) provides us with a type of expansion for group betweenness, similar in spirit to a Taylor series expansion for a mathematical function in calculus. This perspective in turn suggests the potential for and usefulness of studying group betweenness using principles of, for example, truncation and approximation. For example, for an arbitrary group A, we might ask what the relative contributions are of co-betweenness values at increasingly higher orders. In the case m = 2, for the Abilene network, we saw that the vertex betweenness (i.e., first-order co-betweenness under our notational convention) plays a primary role, and the pairwise cobetweenness, a secondary and corrective role. In general, for m ≥ 2, 3 The inclusion–exclusion principle states that the total number of elements in the union of a finite number of sets of finite cardinality may be enumerated as a summation of terms with alternating signs. The first term is obtained by adding (i.e., inclusion) the cardinalities of the sets. The second term is a correction to the first term for ‘double counting’ of elements shared by pairs of sets, by appropriate subtraction (i.e., exclusion). The third term is a further correction for excessive subtraction in the second step, adding back any elements that were subtracted out more times than necessary due to their being shared by triples of sets. And so on and so forth. m since there are terms in (7) involving subsets of j vertices in A, j clearly for many j there are a substantial number of terms. But if the magnitude of these individual terms is small enough, their overall contribution may still end up being small. Note, for example, as an extreme case, that CA (ij ) = 0 for all j > max , where max is the length of the longest geodesic path in G (i.e., the so-called diameter of G). Since max = O(log nv ) in many networks (e.g., networks possessing the ‘small-world’ property), this fact suggests that the number of relevant orders in the expansion (7) may grow quite slowly with the number of vertices. In fact, we show next that it is possible to produce useful bounds for the group betweenness B(A) involving only vertex betweenness and pairwise co-betweenness, i.e., involving only terms of first and second order in (7). Furthermore, in Section 4, we present an algorithm for the efficient numerical computation of these quantities. Specifically, our algorithm allows for the computation of secondorder co-betweenness C(u, v) for all u, v ∈ V × V in at worst O(n3v ) time, and on sparse graphs we have witnessed typical runs times much closer to O(n2v ) in practice.4 Taken together, these two contributions allow for a novel characterization of B(A), in terms of its lower- and higher-order components, in a computationally efficient manner. 3.2. Lower and upper bounds for group betweenness The bounds we present here are derived using Bonferroni-like inequalities (e.g., Galambos and Simonelli (1996)), such as underlie so-called ‘Bonferroni corrections’ used in statistics, which allow for the calibration of a collection of statistical tests of, say, m hypotheses H1 , . . . , Hm , so that the probability of falsely rejecting any of these hypotheses is controlled at a certain level. The most familiar version of these types of corrections is based upon a bound of the form E )≤ Pr(∪m i=1 Hi m Pr(EHi ), (8) i=1 where EHi denotes the event that Hi is falsely rejected. This bound derives from a truncation of an exact expression for the probability on the left-hand side, and this expression is identical in form to that in (7), but with the components CA (ij ) replaced by probabilities Pr(EHi ∩ · · · ∩ EHi ). Although used less commonly, various other 1 j bounds – both lower and upper – have been obtained by more subtle truncations incorporating higher-order terms, with truncations up to second-order being most common. The same ideas may be used to bound B(A). Specifically, note ∗ (A)/ may be interpreted as the probability that that the ratio st st a randomly selected geodesic path between s and t passes through at least one vertex in A. Since this is the probability of a union of events, it can be bounded above as in (8), where the probabilities on the right-hand side are of the form st (i)/st , for i ∈ A. That is, ∗ (A) st (i) st ≤ . st st (9) i∈A Furthermore, a standard extension of (8) similarly yields that ∗ (A)/ st st can be bounded below by the right-hand side in (9), less a summation of all probabilities of the form st (i1 , i2 )/st , for i1 , i2 ∈ A. Applying this idea for each pair (s, t), summing over all 4 This algorithm was used to produce all of the numerical output for the examples presented in the previous section, in the context of the Abilene network. E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 pairs, and collecting terms appropriately, yields the bounds CA (i) − i∈A CA (i1 , i2 ) ≤ B(A) ≤ (i1 ,i2 ) ∈ A:i1 <i2 CA (i). (10) i∈A Now, generally speaking, the lower bound in (10) can be expected to be somewhat reasonable in practice. But the upper bound will likely be rather rough – this is certainly the case for the analogous inequality (8) used in statistics. However, techniques for deriving improved Bonferroni inequalities can be brought to bear here, using precisely the same logic as above. For example, direct application of Corollary 1 of Worsley (1982) yields that ∗ (A) st st ≤ st (i) i∈A st − m−1 st (ij , ij+1 ) j=1 st , (11) where i1 , . . . , im is a given ordering of the vertices in A. Again, applying this bound to each pair (s, t), summing over all pairs, and collecting terms appropriately, yields the alternative upper bound B(A) ≤ CA (i) − i∈A m−1 CA (ij , ij+1 ). (12) j=1 The bound in (12) differs from the upper bound in (10) by a correction factor composed of a certain subset of pairwise co-betweenness values. Coupled with the lower bound in (10), which involves a correction composed of all possible pairwise cobetweennesses, we have a pair of lower and upper bounds for the group betweenness B(A) that can be trivially computed in O(m2 ) and O(m) operations, respectively, given the values CA (i) and CA (i1 , i2 ). And these latter values can all be computed using a minor modification of the algorithm for computing co-betweennesses C(u, v) that we mentioned earlier, and describe below in Section 4. A simple measure of the accuracy of our bounds is provided by their difference, which is the width of the interval they form. To express this width succinctly, we define in association with the set A a graph HA = (VA , EA ), where VA = A and EA contains an edge between i1 , i2 ∈ VA if and only if CA (i1 , i2 ) = / 0. Note that HA is just a sub-graph of the type of overall co-betweenness graph we introduced in Section 2.2, in Fig. 3, within the context of our discussion of the Abilene network. In particular, it is the sub-graph induced by the m vertices in A. Let Ebnd ⊆ EA be those edges for which terms A CA (ij , ij+1 ) were used in the upper bound (12). Then the width of the interval formed by this bound and the lower bound in (10) is given by W= CA (i1 , i2 ). (13) (i1 ,i2 ) ∈ EA \Ebnd A In other words, the width is determined by the co-betweenness values for those pairs of vertices not used in constructing the upper bound. Hence, an interesting question is how best to select the ver. Intuitively, Ebnd should involve pairs with tex pairs {i1 , i2 } ∈ Ebnd A A large co-betweenness. However, recall from the construction of our also must involve only pairs adjacent to each other bounds that Ebnd A under some ordering i1 , . . . , im of the m vertices in A, and therefore not all possible combinations of pairs {i1 , i2 } are available to us. For small m, we can in principle create a list of all such orderings, evaluate the width (13) corresponding to each, and select that which yields a minimum width. For example, if m = 3 and A = {1, 2, 3}, there are only three unique orderings to consider. In general, however, there will be m!/2 unique orderings (i.e., since the left-right direction of the list is unimportant upon summation), growing quickly in m. In fact, formally, the problem of minimizing the width (13) is equivalent to that of finding the longest path in 195 Table 2 Upper and lower bounds on the normalized betweenness B̃(u, v, w) for the best triple (u, v, w) obtained by joining one additional vertex to each of the first, tenth, and twentieth best pairs (u, v) in Table 1. Vertex triple (u, v, w) Lower bound Upper bound Indianapolis/Houston/Sunnyvale Kansas City/Chicago/Atlanta Kansas City/Chicago/Houston Kansas City/Chicago/Los Angeles Chicago/Sunnyvale/Atlanta 0.714 0.643 0.643 0.643 0.607 0.714 0.643 0.643 0.643 0.607 HA or, if HA is not connected, the union of longest paths on the component sub-graphs, where the length of each edge {i1 , i2 } ∈ EA is CA (i1 , i2 ). And this problem is known to be NP-hard, since the Hamiltonian path problem is a special case. Algorithms for producing approximate solutions of this problem exist, although they vary in their accuracy, particularly in connection with the density of HA , with sparse graphs being more difficult. See Karger et al. (1997). We have not explored the issue of finding a good approximation algorithm for when m is large. As a final note, we point out that it is possible to have one or both of our lower and upper bounds actually achieve the value B(A). For example, if CA (ij ) = 0 for j ≥ 3, then it follows trivially, by (7) and (10), that B(A) and the lower bound will be equal. In our numerical work we have also encountered cases where the upper bound equals B(A)– in fact, this situation occurred quite frequently. Furthermore, in many cases we found that the upper and lower bounds were equal, indicating that not only were the co-betweenness terms involving three or more vertices all zero, but also that apparently the co-betweenness terms for all pairs {i1 , i2 } ∈ EA not used in the upper bound were zero, since the sum in (13) was zero. 3.3. Illustration: Abilene In order to illustrate the nature and utility of the results developed in this section, we examine groups A of size m = 3 in the Abilene network of Fig. 1. Consider the pairs of vertices with the first, tenth, and twentieth highest betweenness ranking in Table 1. Suppose that for each pair u, v we wish to add one additional vertex w, chosen so as to maximize the overall ‘control’ of the three vertices over traffic in the network, i.e., to maximize B(u, v, w). The results are shown in Table 2, presented in terms of the normalized betweenness B̃(u, v, w). Here, since there are only three vertices involved, the second term in the upper bound in (12) was chosen in the form CA (i1 , i2 ) + CA (i2 , i3 ), where i1 , i2 , and i3 are that permutation of A = {u, v, w} which maximizes this sum. Two points are interesting to note. First, in all three cases, there is a reasonable sense of geographical dispersion of the vertices across the continental United States. Second, in all three cases the lower and upper bounds are equal and, therefore, these values are just B̃(A). Of course, it certainly is not the case that all triples of vertices (u, v, w) necessarily will have equal lower and upper bounds. Rather, it is a question of the location of the vertices relative to each other and within the network as a whole. When the bounds are equal, this suggests a relatively non-redundant role for each of the vertices in the group. When they are not only equal but also high, this suggests good coverage or ‘control’ as well. For example, consider the Chicago/Sunnyvale pair, for which we show in Table 3 the values of B̃(A) for the addition of each possible third vertex. Most vertices besides Atlanta also yield equal lower and upper bounds, but with values lower than the 0.607 associated with Atlanta. The exceptions are Indianapolis and Kansas City, whose values are not only lower than the maximum, but also form non-trivial intervals, i.e., (0.429, 0.464) and (0.536, 0.571), respectively. These latter two vertices lying as they do between Chicago and Sunnyvale on the northern east-west route, clearly 196 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Table 3 Upper and lower bounds on the normalized betweenness B̃(u, v, w) for triples (u, v, w) involving Chicago, Sunnyvale, and one other vertex in the Abilene network, for all possible choices of additional vertex. Third vertex w Lower bound Upper bound Atlanta Kansas City Houston Denver Indianapolis Washington Seattle Los Angeles New York 0.607 0.571 0.536 0.500 0.464 0.429 0.357 0.321 0.321 0.607 0.607 0.536 0.500 0.500 0.429 0.357 0.321 0.321 C(u, v) = 4. Computation of pairwise co-betweenness We discuss here the calculation of the pairwise co-betweenness values C(u, v) in (3), and the closely related values CA (u, v), for all pairs (u, v). At first glance, it would appear that an algorithm of O(n4v ) running time is necessary, given that the number of vertex pairs grows as the square of the number of vertices. Such an implementation would render the notion of pairwise co-betweenness infeasible to implement in any but graphs of relatively modest size. However, exploiting ideas similar to those underlying the algorithms of Brandes (2001) for calculating the vertex betweennesses B(v), a decidedly more efficient implementation may be obtained. We describe the main ideas briefly here in this section. Details may be found in Appendix A. Our algorithm for computing co-betweenness involves a threestage procedure for each vertex v ∈ V . In the first stage, we perform a breadth-first traversal of the graph G, to quickly compute intermediary quantities such as sv , the number of geodesic paths from a source s to each other vertex v in the network; in the process we form a directed acyclic graph that contains all geodesic paths leading from vertex s. In the second stage, we iterate through each vertex in order of decreasing distance from s and compute a score ıs (v) for each vertex. This score essentially captures the dependency of s on v, in the sense of its contribution to co-betweennesses involving v. These contributions are then aggregated in a depth-first traversal of the directed acyclic graph, which is carried out in the third and final stage. In order to compute the number of geodesic paths sv in the first stage, we note that the number of geodesic paths from s to a vertex v is the sum of all geodesic paths to each parent of v in the directed acyclic graph rooted at s, the set of vertices which we denote ps (v), namely, st . (14) t ∈ ps (v) In the case of an undirected graph, this can be computed in the course of a breadth-first search with a running time of O(ne ). In the second stage, we compute ıs (v) using the recursive relation established in Theorem 6 of Brandes (2001), ıs (v) = sv w ∈ cs (v) sw (1 + ıs (w)), ıs (v) s ∈ V \{u,v} admit redundancies in the paths controlled. Nevertheless, it is interesting to note that these two intervals are still quite tight. In particular, they are tight enough to conclude, for example, that the betweenness resulting from the addition of Indianapolis is strictly less than that resulting from the addition of Kansas City. In fact, our bounds permit a complete ordering of all possible third vertices, as shown in Table 3. sv = where cs (v) denotes the set of child vertices of v in the directed acyclic graph rooted at s. Finally, in the third stage, we compute the co-betweennesses by interpreting the relation (15) sv sv (u) (16) as assigning a contribution of ıs (v)/sv to C(u, v) for each of the sv (u) geodesic paths to v that pass through u. We accumulate these contributions at each step of the depth-first traversal when we visit a vertex v by adding ıs (v)/sv to C(u, v) for every ancestor u of the current vertex v. Our proposed algorithms exploit recursions analogous to those of Brandes (2001) to produce run-times that are in the worst case O(n3v ), but in empirical studies were found to vary like O(nv ne + 2+p 2+p nv log nv ) in general, or O(nv log nv ) in the case of sparse graphs. Here p is related to the total number of geodesic paths in the network and seems to lie comfortably between 0.1 and 0.5 in our experience. In the case of unique geodesic paths, it may be shown rigorously that the running time reduces to O(nv ne + n2v log nv ), and O(n2v log nv ) if the network is sparse as well as ‘small-world’ (i.e., with diameter of size O(log nv )). See Appendix A for details. On a final note, we point out that to compute co-betweenness values CA (i1 , i2 ), for a given set A, it is sufficient to make two simple changes to our algorithm, to adjust for the fact that the elements of A are not allowed to serve as end-points of paths in our calculations. First, in the third stage, the summation in (16) is restricted to be only over s ∈ V \ A. Second, the contribution to the recursive sum in (15) is modified to be ıs (w), rather than 1 + ıs (w), if w ∈ A. Otherwise, m the algorithm remains unchanged and the at-most relevant 2 values will be included among the n2v values output—the rest may be discarded. Therefore, to produce the bounds described in Section 3.2, once the relevant co-betweenness values are calculated, requires only O(m) operations for the upper bound, and O(m2 ), for the lower bound. A more refined algorithm might try to economize on just which co-betweenness values are computed, given the set A, although we have not explored this possibility. We comment more on this issue in Section 6. 5. Additional illustrations We provide in this section additional illustrations of the results developed in the previous sections, using two other networks. In both cases, the data were obtained in studies investigating the flow of information among actors in a social network. 5.1. Michael’s strike network The goal of our first illustration is to provide additional insight into the behavior and potential usage of our bounds. For this purpose we use the strike dataset of Michael (1997), which is also analyzed in detail in Chapter 7 of de Nooy et al. (2005). New management took over at a forest products manufacturing facility, and this management team proposed certain changes to the compensation package of the workers. The changes were not accepted by the workers, and a strike ensued, which was then followed by a halt in negotiations. At the request of management, who felt that the information about their proposed changes was not being communicated adequately, an outside consultant analyzed the communication structure among 24 relevant actors. The social network in Fig. 4 represents the communication structure among these actors, with an edge between two actors indicating that they communicated at some minimally sufficient level of frequency about the strike. Three subgroups are present E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Fig. 4. Original strike-group communication network of Michael (1997). Three subgroups are represented in this network: younger, Spanish-speaking employees (black vertices), younger, English-speaking employees (gray vertices), and older, English-speaking employees (white vertices). The two union negotiators, Sam and Wendle, are indicated by asterisk next to their names. Edges indicate that the two incident actors communicated at some minimally sufficient level of frequency about the strike. in the network: younger, Spanish-speaking employees (black vertices), younger, English-speaking employees (gray vertices), and older, English-speaking employees (white vertices). In addition, the two union negotiators, Sam and Wendle, are indicated by asterisks next to their names. It is these last two that were responsible for explaining the details of the proposed changes to the employees. When the structure of this network was revealed, two additional actors – Bob and Norm – were approached, had the changes explained to them, which they then discussed with their colleagues, and within 2 days the employees requested that their union representatives re-open negotiations. The strike was resolved soon thereafter. The formation of an appropriate coalition of actors was fundamental to resolving the strike. That Bob and Norm were approached is not entirely surprising, from the perspective of network structure. Both serve as cut-vertices in Fig. 4, in that the removal of either would disconnect the graph. In addition, both have high vertex betweenness centralities, as shown in Fig. 5. Similar to Fig. 3, vertices in Fig. 5 (now arranged in a circular layout) are drawn in proportion to their betweenness, and edges, to the co-betweenness of their incident vertices, as calculated with respect to the original graph in Fig. 4. Bob and Norm clearly have the largest betweenness values, followed by Alejandro (who we remark also is a cut-vertex in Fig. 4, but as part of a much smaller sub-network). As for the two union representatives, their vertex betweenness values suggest that Sam also plays a non-trivial role in facilitating communication, but that Wendle is not well-situated in this regard. In fact, Wendle is not even connected to the main component of the co-betweenness graph in Fig. 5, since his vertex betweenness in the original graph – and hence his co-betweenness with any other vertex – is zero (as is also true for six other actors). The coalition formed by Bob, Norm, Sam, and Wendle has a normalized group betweenness of B̃(A) = 0.7702. If instead of Wendle, we include Alejandro, which might seem more reasonable, given the discussion above, this value increases only slightly to 0.7807. However, consider the lower and upper bounds for these numbers, where the lower bound is given by the left-hand side of (10), and the upper bound, by the optimal choice of the right-hand side of (12), 197 Fig. 5. Co-betweenness for the strike-group communication network. Actors located apart from the network, in the corners, are isolated under this representation, as they have zero betweenness and hence no co-betweenness with any other actors. (Note: Isolated vertices are drawn to have unit diameter, and not in proportion to their (zero) betweenness.) obtained through exhaustive search. For the coalition that includes Wendle, these bounds are 0.7123 and 0.7702, respectively, while for the coalition that includes Alejandro, they are 0.5018 and 0.7807, respectively. Both of these bounds are somewhat loose, but the latter has almost five times the width of the former (i.e., 0.0579 vs 0.2789). The cause of this difference lies, of course, in the lower bound, since the upper bound in both cases is exactly equal to the actual value of the normalized group betweenness. Recall that the lower bound too will equal this value only if the co-betweennesses CA (ij ) are equal to zero for all subsets of size j = 3 or greater. But examination of the original network, in Fig. 4, shows that this is not the case for either coalition. In particular, the triple of actors {Bob, Norm, Sam}, which is common to both coalitions, has a number of geodesic paths that pass through it, from Xavier to all of the Spanish-speaking employees and many of the younger, Englishspeaking employees. Hence, this triple has a non-trivial third-order co-betweenness. Moreover, when Wendle is replaced by Alejandro, geodesic paths from Wendle to these same actors (except Alejandro now) also contribute to the third-order co-betweenness of {Bob, Norm, Sam}. Furthermore, there will now be a non-zero co-betweenness as well for the quadruple {Bob, Norm, Sam, Alejandro}, based on geodesic paths from Wendle and Xavier to the other Spanish-speaking employees. It is the influence of these additional higher-order co-betweenness terms on the pairwise cobetweenness terms in equation (13) that increases the width of our bounds. Or, put another way, the relative redundancy of these actors as coalition members is reflected in the relative widths of our bounds. So a comparison of just the group betweenness values for our two coalitions suggests that they are roughly equivalent. On the other hand, a comparison of not only these values, but also their bounds, is sufficient to highlight differences in higher-order co-betweenness of coalition members, without actually computing these higher-order values. As a side note, we mention that the wide bounds in this example are primarily a function of the choice of vertices in our coalition, rather than a characteristic of the network as a whole. For most other choices of coalitions of size m = 3 or 4 that we examined, the bounds were of a similarly narrow width to that observed in our illustra- 198 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Fig. 6. Karate club network of Zachary (1977). The gray vertices represent members of one of the two smaller clubs and the white vertices represent members who went to the other club. The edges are drawn with a width proportional to the number of situations in which the two members interacted. tions on the Abilene network, and frequently had a width of zero. 5.2. Zachary’s karate club network The goal of our second illustration is to provide some additional intuition into why pairwise co-betweenness values, when appropriately combined, can successfully summarize group betweenness values. For this purpose, we use the karate club dataset of Zachary (1977). Over the course of a couple of years in the 1970s, Zachary collected information from the members of a university karate club, including the number of situations (both inside and outside of the club) in which interactions occurred between members. During the course of this study, there was a dispute between the club’s administrator and the principal karate instructor. As a result, the club eventually split into two smaller clubs of approximately equal size—one centered around the administrator and the other centered around the instructor. Fig. 6 displays the network of social interactions between club members. The gray vertices represent members of one of the two smaller clubs and the white vertices represent members who went to the other club. The edges are drawn with a width proportional to the number of situations in which the two members interacted. The graph clearly shows that the original club was already polarized into two groups centered about actors 1 and 34, who were the key players in the dispute that split the club in two. In Fig. 7 is shown a visualization of the vertex betweenness and pairwise co-betweenness values, similar to those in Figs. 3 and 5, where the layout is done using an energy minimization algorithm. After actor 1, actor 34 has the largest vertex betweenness. Now suppose that actor 34 wishes to form a coalition but, due to the dispute, refuses to do so with actor 1. If 34 wishes to join with an actor in the same sub-network (i.e., white vertices), either of actors 32 or 33 would seem to be logical choices, based on their similarly large vertex betweennesses. However, actor 34 has a substantially larger co-betweenness with actor 32 (i.e., 35.96) than with actor 33 (i.e., 0.4190), which suggests that {34, 33} will be a stronger coalition than {34, 32}. This is confirmed by calculating the normalized group betweenness, which is 0.4710 and 0.4125, respectively. On the other hand, if 34 wishes to join with an actor in the other sub-network (i.e., gray vertices), then actor 3 seems the logical choice, and the coalition {34, 3} yields a normalized group betweenness of 0.4732. Fig. 7. Co-betweenness for the karate club network. Actors in the upper-left and lower-right corners, separated from the connected component, are isolated due to zero betweenness. The two actors in the lower right-hand corner (i.e., a5 and a11) have non-zero betweenness, but are bridges, in the sense that they only serve to connect to other vertices, and hence have zero co-betweenness. (Note: The vertices for actors with zero betweenness are drawn to have unit diameter, for purposes of visibility.) E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 So the coalitions {34, 33} and {34, 3} are about equally strong. If actor 34 desired a larger coalition, it would seem natural to consider combining these two smaller coalitions to obtain {34, 33, 3}, and doing so would yield a normalized group betweenness of 0.5818. But it turns out that the alternative coalition {34, 32, 3} has an even higher normalized group betweenness of 0.5984, despite the fact that actor 32 was less preferable to actor 33 when paired with actor 34 alone. That such might be the case is suggested by the pattern of pairwise co-betweenness values for these actors in Fig. 7. Actor 3 shares a non-trivial fraction of geodesic paths with actor 34, as indicated by the rather thick line connecting them in this figure. And actor 33, in turn, shares just slightly fewer geodesic paths with actor 3. Since actors 33 and 34 themselves share few geodesic paths, their two links to actor 3 indicate the sharing of two largely distinct subsets of geodesic paths, thereby diminishing the contribution of actor 3 ‘two-fold’ in some (rough) sense. On the other hand, while actors 3 and 32 both share geodesic paths with actor 34, they do not share any with each other. So while the contribution of actor 32, when joined with actor 34 alone, is less than that of actor 33, its contribution when joined with the coalition {34,3} ultimately surpasses that of actor 33. While admittedly the relative strength of these coalitions is something that could be inferred alternatively to some extent by examination of the original network, in Fig. 6, and comparing the location of the actors relative to each other within that network, the above illustration is intended to demonstrate that it is possible to reason quite effectively from the pairwise co-betweenness graph alone. The bounds for group betweenness proposed in this paper may be thought of as similarly utilizing this same information, but in a more formal manner. 6. Discussion Expansions are a common tool for representing the structure of complex mathematical objects. And they can be especially useful when lower-order truncations are found to be accurate. For example, the Taylor series expansion, and its corresponding first- or second-order (i.e., linear or quadratic) approximations, is arguably one of the most standard calculus tools used. Similarly, while probability distributions are known to have various representations in terms of their complete set of moments, it is models based on first- and second-order moments (i.e., means and covariances) that underlie the vast majority of statistical modeling done in practice. Here in this paper, we have shown that similar principles of expansion and truncation can be brought to bear on the study of the betweenness centrality of groups of vertices in a graph, and we have demonstrated the relevance of our results to the control of information flow by coalitions of actors in social networks. Our work makes clear the intrinsic nature of group betweenness centrality. In particular, group betweenness is not simply a trivial sum of the betweennesses of its individual actors, but rather is a quantity that incorporates the co-betweennesses of all subgroups of two or more actors. Nevertheless, we have also demonstrated that it is possible to characterize the betweenness centrality of a group of actors with sometimes remarkable accuracy using the cobetweenness of no more than pairs of actors. More generally, the accuracy with which we can characterize group betweenness centrality in this manner provides direct insight into the composition of the group and the relative redundancy of the actors, with respect to the ‘control’ the group exerts over the flow of information over the network. Specifically, greater accuracy implies more complementary roles, while lesser accuracy implies more redundant roles. Such insight into the relative redundancy of actors in a group can in turn be important, for example, in evaluating the robustness of potential coalitions. 199 The idea that vertex betweenness and pairwise co-betweenness together can provide significant insight into network information flow, and its control, is further reinforced by the following interesting connection with the statistical modeling of network flows. Recall the Abilene network described in Section 2, and suppose that xs,t is a measure of the information (e.g., Internet packets) flowing between vertices s and t in the network. Similarly, let yv be the total information flowing through vertex v. Next, define x to be the np × 1 vector of values xs,t , where np is the total number of pairs of vertices exchanging information, and y, to be the nv × 1 vector of values yv . And suppose, without loss of generality, that geodesic paths in the network are unique (as is effectively the case in the Abilene network, for example). Then a common expression modeling the relation between these two quantities is simply y = Rx, where R is an nv × np matrix (i.e., the so-called ‘routing matrix’) of 0’s and 1’s, indicating through which vertices each given routed path goes. If we now consider x as a random variable, with uncorrelated elements and sharing a common variance, then its covariance matrix is simply proportional to the np × np identity matrix. The elements of y, however, will be correlated, and their covariance matrix takes the form ∝ RRT , by virtue of the linear relation between y and x. Importantly, note that the diagonal elements of RRT are the vertex betweennesses C(u, u) and, furthermore, the off-diagonal elements are the co-betweennesses C(u, v). In other words, it is the firstand second-order co-betweenness values that are captured in the covariance matrix of the quantity of information flowing through the vertices, under this simple model for network information flow. This example also serves to reinforce another point of our work, although one that admittedly we have made only indirectly throughout: that pairwise co-betweenness is a quantity that potentially is itself of fundamental interest, much like vertex betweenness. It remains to explore in greater depth the implications of this assertion. For example, following the tendencies in the statistical physics literature on complex networks (Albert and Barabási, 2002; Pastor-Satorras and Vespignani, 2004), it can be of interest to explore the statistical properties of co-betweenness in large-scale networks. Some work in this direction may be found in Chua (2006), where co-betweenness and functions thereof were examined in the context of standard network models. The most striking properties discovered were certain basic scaling relations with distance between vertices. In a related direction, see also Chua et al. (2006), where an analytical result is given relating edge betweenness to the eigen-values of an edge pairwise cobetweenness ‘covariance’ matrix, defined in analogy to the matrix described above. On a side note, it should be mentioned that extensions of cobetweenness to contexts other than that of an undirected graph are certainly possible. For example, we have also developed the analogous quantities and algorithms for pairwise vertex co-betweenness on weighted graphs (which were used in the computations for the examples involving the Abilene network) and for pairwise edge co-betweenness on unweighted and weighted graphs. Details may be found in Chua (2006). The extension to directed graphs should also be straightforward, although we have not implemented it. In terms of future work, it would also be of interest to explore the implications of our expansion for group betweenness and the accuracy of our bounds in networks of various topologies and various sizes. Such an exploration would be particularly relevant in the context of coalition formation, and would facilitate a study of the relationship between coalition robustness and redundancy of group members, as referred to above. The work presented here is intended to serve as a foundation in this regard, as it clearly enables such further explorations. In particular, we note that it is in larger networks that groups of non-trivial size m can be examined and, similarly, in which higher orders of co-betweenness will potentially become 200 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 more relevant (i.e., recall our discussion of the truncation induced by the diameter of a graph, at the end of Section 3.1). An interesting question left unaddressed by our work is whether or not the structure inherent in the inclusion–exclusion formula (7) can be exploited to develop efficient algorithms for exact computation of group betweenness. Certainly the fact that our upper bound was so often observed in our numerical work to achieve the actual group betweenness value suggests that this may be so. The recent work of Puzis et al. (2007) is possibly relevant in this regard. These authors provide a fast algorithm for successive computation of group betweenness centrality, consisting of two stages. The first stage, which they call a pre-processing stage, essentially computes relevant second-order co-betweenness quantities for all pairs of vertices, although the details of this stage are not given explicitly and the notion of co-betweenness itself receives no special attention. In the second stage, a post-processing step is applied to these quantities, which computes the betweenness of a group A of size m recursively from the betweennesses of successive subgroups of size j ≤ m, starting with j = 2. This latter stage takes O(m3 ) time, which is more expensive than the O(m) time required for our upper bound, and the O(m2 ) time required for our lower bound. However, in general, of course, the computations in both their method and ours are dominated by the initial pre-processing stage for sufficiently small m. Puzis et al. (2007) offer some computational tricks for avoiding the calculation of unnecessary co-betweenness values, which could be incorporated into our method as well. Appendix A A.1. Derivation of key expressions and st (u, v) = sv vt 0 if d(s, t) = d(s, v) + d(v, t), otherwise, ıst (v) = t ∈ V \{v} st st (u, v)=su uv vt =sv (u) vt = . (23) sv (u) sv vt =ısv (u) st (v), sv (24) and st (u, v) ısv (u) st (v) = = ısv (u) ıst (v). st st ıst (u, v) = (25) These two relations allow us to show that ıs (u, v) = ıst (u, v) (26) t ∈ V\{u,v} ısv (u) ıst (v) by (25) (27) since ısu (v) = 0 by (19) (28) t ∈ V\{u,v} = ısv (u) ıs (v) = ıs (v) sv (u) sv by (24) (29) We use this result to re-express the co-betweenness defined in (3) as ıst (u, v) s,t ∈ V\{u,v} = s ∈ V\{u,v} = (30) ıst (u, v) (31) t ∈ V\{u,v} ıs (u, v) (32) s ∈ V\{u,v} ıs (v) sv sv (u). (33) Lastly, to establish the recursive relation in (15), note that for a child vertex w ∈ cs (v) every path to v gives rise to exactly one path to w by following the edge (v, w). This means that sw (v) = sv d(s, u) ≤ d(s, v). ısw (v) = (20) st (v) ıst (v) = Note that unlike Brandes (2001), we exclude t = v from the sum in Eq. (23). Two relations that follow immediately from these definitions, combined with Eqs. (17) and (18), are and that st (u, v) ıst (u, v) = , st (21) (22) t ∈ V \{v} For the sake of notational simplicity, we will assume, without loss of generality, that for the remainder of this discussion. The remaining quantities we need to introduce are notions of the path-dependency of vertices. In the spirit of Brandes (2001), we define the “dependency” of vertices s and t on the vertex pair (u, v) as . st (v) , st ıs (v) = = (19) st and the dependency of s alone on v as (17) if d(s, t) = d(s, u) + d(u, v) + d(v, t), if d(s, t) = d(s, v) + d(v, u) + d(u, t), (18) otherwise. t ∈ V \{u,v} Similarly, we define the pair-wise dependency of s and t on a single vertex v as s ∈ V\{u,v} su uv vt sv vu ut 0 st (u, v) ıst (u, v) = t ∈ V \{u,v} C(u, v) = Central to our algorithm are the expressions in Eqs. (15) and (16), the derivations for which we present here. Before doing so, however, we need to introduce some definitions and relations. Let d(s, t) be the geodesic distance between two vertices s and t. Note that a simple combinatorial argument shows that st (v) = ıs (u, v) = = This appendix contains details specific to the proposed algorithm for computing co-betweenness, including a derivation of key expressions, a rough analysis of algorithmic complexity, and pseudo-code. Actual software implementing our algorithm, written in the Matlab software environment, is available at (http://math.bu.edu/people/kolaczyk/software.html) and we define the dependency of s alone on the pair of vertices (u, v) as for w ∈ cs (v), sw (v) s v = sw sw for w ∈ cs (v). (34) (35) Also note that for t = w we have ıst (w) = 1. (36) This allows us to decompose ıs (v) in essentially the same manner as Brandes (2001), namely, ıs (v) = t ∈ V\{v} ıst (v) (37) E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 = ıst (v, w) (38) ıst (v, w) (39) t ∈ V\{v} w ∈ cs (v) = w ∈ cs (v) t ∈ V\{v} = ısw (v) ıst (w) by (25) (40) w ∈ cs (v) t ∈ V\{v} = sv w ∈ cs (v) = sw sv w ∈ cs (v) sw 1+ ıst (w) by (35) and (36) (41) t ∈ V\{v,w} (1 + ıs (w)). (42) A.2. Algorithmic complexity Standard breadth-first search results put the running time for the first stage of our algorithm at O(ne ), and since we touch each edge at most twice when we compute the dependency scores ıs (v), the running time for the second stage is also O(ne ). Since we repeat each stage for each vertex in the network, the first two stages have a running time of O(nv ne ). The running time for the depth-first traversal, that occurs during the third stage, depends on the number and length of all geodesic paths in the network. Overall, we visit every geodesic path once and compute a co-betweenness contribution for each edge of every geodesic path. For ‘small-world’ networks, i.e., networks with an O(log nv ) diameter, we must compute O( · log nv ) contributions, where uv u,v ∈ V is the total number of geodesic paths in the network. So the overall running time for the algorithm is O(nv ne + evidence suggests that the upper bound for log nv ). Empirical the average (1/|V|) u ∈ V uv ranges from n0.19 to n0.32 for comv v mon random graph models, and at worst has been seen to reach n0.62 in the case of a network of airports. (In the latter case, there v were extreme fluctuations in (1/|V|) u ∈ V uv so the total number of geodesic paths, , might be much smaller than nv (nv − 1) times this 2+p upper bound.) This suggests a running time of O(nv ne + nv log nv ), though it is an open question to show this rigorously. In the case of sparse networks, where ne ∼nv , this reduces to a running time of 2+p O(nv log nv ). A.3. Pseudo-code where the last equality is due to the fact that since w is a child of v we have sv (w) = 0 and thus ısv (w) = 0. = 201 (43) Here we provide pseudo-code for the computation of the vertex co-betweenness in the case of an undirected graph with no edge weights. The main function listed in Algorithm 1 loops over each vertex s ∈ V and performs the three stages of the co-betweenness algorithm described in Section 4. The three functions called in the main loop carry one of the three stages in the co-betweenness computation that were described. Pseudo-code for BF-count-paths, which carries out the breadth-first traversal used in the first stage, is presented in Algorithm 2. The computation of dependency scores ıs (v) is handled in the second stage by score-vertices, which is described in Algorithm 3, and the third stage in which the contributions to the co-betweenness are accumulated is detailed in the pseudo-code for DF-visitgiven in Algorithm 4. Algorithm 1. The main function for computing the vertex cobetweenness for all vertices. 202 E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Algorithm 2. Breadth-first traversal of the graph starting at s. Used in the first stage to compute intermediary quantities needed for the computation of the co-betweenness. Here s (v) is the sv that appeared earlier. Algorithm 3. Computation of the vertex scores ıs (v) defined in (23). Used in the second stage of the computation. E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203 Algorithm 4. The depth-first traversal of the third stage, used to accumulate the vertex co-betweenness contributions. References Albert, R., Barabási, A.-L., 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47–97. Barrat, A., Barthélemy, M., Vespignani, A., 2005. The effects of spatial constraints on the evolution of weighted complex networks. Journal of Statistical Mechanics, 05003. Barrat, A., Barthélemy, M., Vespignani, A., 2008. Dynamical Processes on Complex Networks. Cambridge University Press, Cambridge. Borgatti, S., Everett, M., 2006. A graph-theoretic perspective on centrality. Social Networks 28, 466–484. Brandes, U., 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 163. Chua, D.B., 2006. Statistical analysis for whole networks. PhD thesis, Department of Mathematics and Statistics, Boston University. 203 Chua, D.B., Kolaczyk, E.D., Crovella, M., 2006. Network kriging. IEEE Journal of Selected Areas in Communications 24, 2263–2272. Clark, J., Holton, D.A., 1991. A First Look at Graph Theory. World Scientific. de Nooy, W., Mrvar, A., Batagelj, V., 2005. Exploratory Social Network Analysis with Pajek. Cambridge University Press, Cambridge, UK. Dominguez, C.B.K., 2008. Party coalitions and interest group networks. In: Paper prepared for delivery at the Annual Meeting of the American Political Science Association, Boston, MA, August 28–September 1, 2008. Everett, M., Borgatti, S., 1999. The centrality of groups and classes. Journal of Mathematical Sociology 23 (3), 181–201. Freeman, L.C., 1977. A set of measures of centrality based on betweenness. Sociometry 40, 35–41. Galambos, J., Simonelli, I., 1996. Bonferroni-type Inequalities with Applications. Springer, New York. Karger, D., Motwani, R., Ramkumar, G., 1997. On approximating the longest path in a graph. Algorithmica 18, 82–98. Merida-Campos, C., Willmott, S., 2007. Exploring social networks in request for proposal dynamic coalition formation problems. Lecture Notes in Computer Science 4696, 143–152. Michael, J., 1997. Labor dispute reconciliation in a forest products manufacturing facility. Forest Products Journal 47, 41–45. Pastor-Satorras, R., Vespignani, A., 2004. Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge University Press, Cambridge. Puzis, R., Elovici, Y., Dolev, S., 2007. Fast algorithm for successive computation of group betweenness centrality. Physical Review E 76, 056709. Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and applications. Cambridge University Press, Cambridge. Worsley, K.J., 1982. An improved Bonferroni inequality and applications. Biometrika 69 (2), 297–302. Zachary, W., 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473.
© Copyright 2026 Paperzz